The Ace Virtual Machine

COMP 3100/3800/3204: Software Engineering
Specification of the Ace Virtual Machine

This document describes the Ace virtual machine, its instructions, its memory architecture and to some degree its assembly language. Only students of Comp3100/3800 will implement the machine, but students taking Comp3204 will need to use this as a reference for their assembler and compiler projects. Those in Comp3100/3800 should also read the assembler format document, even if not building an assembler. It contains information on how the virtual machine's input files are formatted.

Students of Comp3100 should implement the code-space, integer, floating-point and string instructions of the virtual machine. Those in Comp3800 and interested others should attempt either to implement the array instructions as well, or else try to implement a just-in-time compiler (JIT) for the virtual machine (warning: a JIT is more work than implementing the array instructions).

Registers

There are 64 registers, r0 through to r63. All the registers behave the same, except that r62 (the frame pointer) and r63 (the stack pointer) are handled specially by the subroutine call and ret instructions, and the push and pop instructions. The name fp is equivalent to r62 and sp is equivalent to r63, and those registers can be manipulated normally, as well as via special instructions.

The program counter, pc, is not addressable and is not visible as a register.

The registers are signed integers 32 bits long.

Address Spaces

There are two address spaces: code space and memory space. Memory space is broken into four data types: integers, floating point numbers, strings and dynamic arrays.

Code space stores the instructions for the program, and can only be addressed by the branching and subroutine instructions. The representation of instructions within code space is not specified.

Memory space consists of an array of cells, with addresses 0, 1, 2, ... and upwards. Each cell is capable of holding any one of the four basic data types in the machine. The basic data types are integer, floating-point number, string, and dynamic array. One cell can hold one of these types of data at any one time but may be reused to hold a different type of data at a later time.

The integers are all 32-bit signed quantities, equivalent to long in C.

The floating-point numbers are 64-bit IEEE floating point numbers, equivalent to double in C.

The strings are sequences of ASCII characters, and have arbitrarily variable length. A pointer from a memory cell points to the sequence of characters, which exists outside memory space. The characters within a string cannot be directly addressed, except via special string instructions. The representation of the sequence of characters is not specified.

The dynamic arrays are sequences of memory cells, and have arbitrarily variable length. A pointer from a memory cell points to the sequence of cells in the array, which exists outside memory space. The cells within an array have the same properties as any other memory cell (thus, they can hold any of the four basic data types) but they cannot be directly addressed, except via special array instructions. The representation of the sequence of cells in an array is not specified. An array is not constrained to only holding one kind of data at a time; thus, two cells within the same array are free to hold different data types.

Each instruction in the machine operates on one or more of these basic types, according to the operand addressing modes stored with the instruction.

Instruction Layout

Instructions are laid out in a fixed format within a 4-byte, 32-bit word. The high byte encodes the operation code (opcode); the remaining bytes encode the operands:

opcode src1 src2 dst

The opcode is stored as a single byte.

Code space instructions have a single operand spanning all three operand fields. Other instructions have three operands: two source operands src1 and src2 and one destination operand dst. Not all operands are meaningful for all instructions, and src1 may be a large literal, occupying the space for both src1 and src2. Instructions with a large literal operand are written in the assembler as having only two operands, e.g.

iadd 1234, r6

but behave as though they have three operands, with dst acting also as src2, and the large literal value acting as src1.

iadd 1234, r6, r6

The following sections describe how the operands are encoded and how the instructions specified in the opcode interpret those operands.

Operand Modes

There are four addressing modes, encoded in two bits at the top of each operand (except for large literal and code space instructions, which are explained later).

Value	Syntax	Name
0	`4`	small literal
1	`1234`	large literal
2	`r4`	register
3	`[r4]`	memory cell

Small literal mode refers to the integer value contained in the current operand byte. Small literal mode operands are always signed integers, but because they must fit within one operand can only have a small range of possible values.

Large literal mode refers to the signed integer value stored in the current and subsequent operand. It uses two operands to store a larger than normal value, as discussed in detail in the next section on operand encoding.

Memory cell mode references a cell in memory by using a register as an index into the virtual machine's memory. The type of data found at that cell is implied by the instruction, and the machine may also keep track of what type each cell contains.

Operand Encoding

An operand is encoded as follows:

m1 m0 v5 v4 v3 v2 v1 v0

The bits m[1-0] store the operand mode value, encoding the mode of the operand (as described above). The bits v[5-0] store the 6-bit operand value. If the mode is register or memory cell mode, this gives an unsigned register number in the range 0 to 63.

If the mode is small literal, this gives a signed 6-bit integer, in the range -32 to +31.

Large literal operands can appear as the first operand (src1). Within the instruction word, the encoding for a large literal occupies the space of both src1 and src2, thus using two mode bits and allowing the next 14 bits to contain a signed integer value:

m1 m0 v13 v12 v11 v10 v9 v8 v7 v6 v5 v4 v3 v2 v1 v0

The fields m[1-0] will have value 1 (binary 01) and the value of the operand is the 14-bit signed integer v[13-0].

If an instruction has a large literal src1 operand, the semantics of the instruction are such that the destination operand dst takes the place of the missing src2 operand. So, for instance, addition is normally interpreted as dst = src2 + src1, but with a large literal is taken to mean dst = dst + src1.

Operand Encoding for Code Space Instructions

Operands for code space instructions are encoded differently: there is a single operand in the instruction word, occupying all 24 operand bits in the word, and there are only three addressing modes which can be used. The top two bits of the operand are the mode, and the other 22 are the value:

m1 m0 v21 v20 v19 v18 v17 v16 v15 v14 v13 v12 v11 v10 v9 v8 v7 v6 v5 v4 v3 v2 v1 v0

The two mode bits form a number which is interpreted in the following way:

Value	Syntax	Name
0	`12345`	absolute
1	`+5678`	relative
2	`r4`	register
3		illegal

Absolute mode references the instruction whose address is the signed 22-bit value. In assembly language, labels may be used instead of actual numbers, but these labels are replaced with the numbers when assembled.

Relative mode references the instruction whose address is the pc plus the signed 22-bit value. Unlike small or large literal mode, relative mode literals begins with a plus or minus sign in the assembly language, and the value occupies the remaining 22 bits of the operand. The assembler may choose to replace a label referencing an instruction with a relative mode operand instead of an absolute operand.

Register mode references the instruction whose address is stored in the register. The number of the register is stored in the lowest 6 bits of the 22 bits of the operand.

Code Space Instructions

Code space instructions are branches and other control instructions:

Opcode	Hex	Name	Description
0	00	`halt`	halt machine
1	01	`nop`	no operation
2	02	`bra`	branch always
3	03	`bgt`	branch if greater than
4	04	`bge`	branch if greater than or equal
5	05	`blt`	branch if less than
6	06	`ble`	branch if less than or equal
7	07	`beq`	branch if equal
8	08	`bne`	branch if not equal
9	09	`beof`	branch if not equal
10	0A	`call`	subroutine call
11	0B	`ret`	subroutine return

Most code space instructions take a single operand (the exceptions are halt, nop and ret, which evaluate but ignore their operands). That operand defines the target program counter (pc) of the operation, for example, the first instruction of the subroutine for a call instruction.

The halt instruction halts the machine unconditionally.

The nop instruction does nothing.

The bra instruction is an unconditional branch.

The various other branch instructions branch according to the result of the most recently executed comparison instruction (icmp, dcmp or scmp).

The beof instruction branches according to the result of the most recently executed file I/O instruction (see later).

The call instruction does a subroutine call. Its operation is to push the pc of the call instruction on the stack (storing the pc into the memory cell that r63 addresses and then incrementing r63 to point to the next stack cell), push the frame pointer r62 on the stack, set the frame pointer equal to the stack pointer, and set the pc equal to the operand.

The ret instruction returns from a subroutine by setting the stack pointer equal to the frame pointer, popping the old frame pointer from the stack into r62, and popping the pc from the stack, then incrementing pc and continuing execution from that instruction.

Data Instructions

The remaining instructions manipulate the data in the virtual machine. Each of these instructions uses memory cells, registers and/or literal integer values, and may make use of any or all of the single-byte operands src1, src2 or dst as specified in the descriptions for each instruction.

The general rules for operand modes are as follows (any exceptions to these rules will be noted in the following sections):

Operands which refer to integers may be in the following modes:

src1: small or large literal mode, register mode or memory cell mode.
src2: small literal mode, register mode or memory cell mode.
dst: register mode or memory cell mode.

Operands which refer to floating point numbers must be in memory cell mode.

Operands which refer to strings must be in memory cell mode.

Operands which refer to arrays must be in memory cell mode.

Integer Instructions

The instructions which perform the usual integer arithmetic operations are listed below. In the discussions in this section, assume that all the operands may be any valid integer mode, except that dst may not be a literal and src2 may not be a large literal (but may be a small literal). Operands which are in memory cell mode implicitly refer to integers. There is no check for overflow or underflow.

Opcode	Hex	Name	Description
32	20	`inew`	initialise integer
33	21	`iabs`	absolute value
34	22	`ipush`	push onto stack
35	23	`ipop`	pop from stack
36	24	`icmp`	compare
37	25	`iread`	read value from input
38	26	`iprint`	print value on output
39	27	`iadd`	addition
40	28	`isub`	subtraction
41	29	`imul`	multiplication
42	2A	`idiv`	division
43	2B	`imod`	modulus
44	2C	`iand`	bitwise-and
45	2D	`ior`	bitwise-or
46	2E	`ixor`	bitwise-xor
47	2F	`ilshift`	left shift
48	30	`irshift`	right shift
49	31	`icopy`	copy integer
50	32	`itod`	convert to double
51	33	`itos`	convert to string

The inew instruction initialises the dst to the integer 0. It ignores src1 and src2.

The iabs stores in dst the absolute value of src1, and ignores src2.

The ipush instruction pushes the value of src1 onto the stack and ignores src2 and dst. Pushing onto the stack involves copying the source into the memory cell where r63 points and then incrementing r63 to point to the next cell on the stack.

The ipop instruction pops from the stack the value kept at the cell referenced by r63, and stores that value into dst, ignoring src1 and src2. Popping from the stack involves decrementing r63 then copying the contents of the cell now pointed to by r63 into the destination.

The icmp instruction compares the value of src1 to src2 and records the result in a flag in the virtual machine for interpretation by a later conditional branch instruction. For example,

icmp 11, 5 bgt somewhere

will branch to the instruction labeled somewhere. The icmp instruction ignores dst. This instruction requires both src1 and src2, so if the src1 operand is in large literal mode, src2 is taken to be dst, as usual.

The iread instruction works with all destination types. It reads from standard input the textual representation of a value and stores it in dst. The src1 and src2 operands are ignored. If an end of file condition is encountered, a flag is set inside the virtual machine allowing a subsequent beof instruction to succeed, otherwise that same flag is cleared. Leading white space is ignored, and the format is a decimal integer, or an octal integer beginning with 0, or a hexadecimal integer beginning with 0x or 0X. This is equivalent to using:

fscanf(stdin, "%li", &dst);

The "percent-ell-eye" format reads long integers into a location in memory.

The iprint instruction prints the textual decimal representation of src1 to standard output. It ignores src2 and dst. This is equivalent to using:

fprintf(stdout, "%li", dst);

The "percent-ell-eye" format prints long integers.

The iadd instruction stores in dst the value of src2+src1.

The isub instruction stores in dst the value of src2-src1.

The imul instruction stores in dst the value of src2*src1.

The idiv instruction stores in dst the value of src2/src1. The value is truncated, using C's integer division rules. Division by zero is illegal and halts execution.

The imod instruction stores in dst the value of src2 mod src1. The result is always an integer greater or equal to zero and less than the absolute value of src1. Modulo by zero is illegal and halts execution of the program.

The ilshift instruction stores in dst the value of src2 shifted left src1 bits. The bits shifted into the value are set to 0. The value of src1 must between 0 and 31, inclusive.

The irshift instruction stores in dst the value of src2 shifted right src1 bits. The bit shifted into the value are set to 0. The value of src1 must between 0 and 31, inclusive.

The iand instruction stores in dst the bitwise AND of src1 and src2.

The ior instruction stores in dst the bitwise OR of src1 and src2.

The ixor instruction stores in dst the bitwise exclusive OR of src1 and src2.

The icopy instruction copies the integer src1 to dst without change. The src2 operand is ignored.

The itod instruction converts integers to floating-point numbers. It converts the integer src1 to a floating point number stored in dst. The destination must be in memory cell mode. The src2 operand is ignored.

The itos instruction converts integers to strings, by storing a textual representation of the integer src1 in the destination dst. The result is equivalent to the conversion by sprintf format %ld. The src1 operand must refer to an integer. The destination must be in memory cell mode. The src2 operand is ignored.

Floating-Point Instructions

The instructions which perform the usual floating-point operations are listed below. In this section, all operands are assumed to refer to floating-point numbers in memory cell mode. The behavior of underflow, overflow, and not-a-number values is undefined.

Opcode	Hex	Name	Description
64	40	`dnew`	initialise memory cell
65	41	`dabs`	absolute value
66	42	`dpush`	push onto stack
67	43	`dpop`	pop from stack
68	44	`dcmp`	compare
69	45	`dread`	read value from input
70	46	`dprint`	print value on output
71	47	`dadd`	addition
72	48	`dsub`	subtraction
73	49	`dmul`	multiplication
74	4A	`ddiv`	division
81	51	`dtoi`	convert to integer
82	52	`dcopy`	copy double cell
83	53	`dtos`	convert to string

The dnew instruction initialises the dst to 0.0. It ignores src1 and src2.

The dabs instruction stores in dst the absolute value of src1, and ignores src2.

The dpush instruction pushes the value of src1 onto the stack and ignores src2 and dst. Pushing onto the stack involves copying the source into the memory cell where r63 points and then incrementing r63 to point to the next cell on the stack.

The dpop instruction pops from the stack the value kept at the cell referenced by r63, and stores that value into dst, ignoring src1 and src2. Popping from the stack involves decrementing r63 then copying the contents of the cell now pointed to by r63 into the destination.

The dcmp instruction compares the value of src1 to src2 and records the result in a flag in the virtual machine for interpretation by a later conditional branch instruction. The dcmp instruction ignores dst.

The dread instruction reads from standard input the textual representation of a floating-point value and stores it in dst. The src1 and src2 operands are ignored. If an end of file condition is encountered, a flag is set inside the virtual machine allowing a subsequent beof instruction to succeed, otherwise that same flag is cleared. Leading white space is also ignored, and the format is the usual floating-point format, equivalent to:

fscanf(stdin, "%lg", &dst);

The "percent-ell-gee" format reads double values into a location in memory.

The dprint instruction prints the textual representation of src1 to standard output. It ignores src2 and dst. This is equivalent to using:

fprintf(stdout, "%g", dst);

The "percent-gee" format prints double values (note the different format string than dread).

The dadd instruction stores in dst the value of src2+src1.

The dsub instruction stores in dst the value of src2-src1.

The dmul instruction stores in dst the value of src2*src1.

The ddiv instruction stores in dst the value of src2/src1. Division by zero is illegal and halts execution of the program.

The dtoi instruction converts floating-point numbers to integers. The source floating-point value is converted to an integer by truncating the fraction and storing the result into the destination. The src2 operand is ignored.

The dcopy instruction copies src1 to dst without change. The src2 operand is ignored.

The dtos instruction converts floating-point numbers to strings, by storing a textual representation of the floating point value src1 in the destination dst. The result is equivalent to the conversion by printf format %g with full precision to recover the value using a read instruction. The destination must be in memory cell mode. The src2 operand is ignored.

String Instructions

The instructions which manipulate strings are listed below. In this section, except as noted, all operands used are assumed to be in memory cell mode. The resulting strings may be of any length.

Opcode	Hex	Name	Description
96	60	`snew`	initialise memory cell
97	61	`slen`	string length
98	62	`spush`	push onto stack
99	63	`spop`	pop from stack
100	64	`scmp`	compare
101	65	`sread`	read line from input
102	66	`sprint`	print string on output
103	67	`sadd`	concatenation
104	68	`ssub`	subtraction
105	69	`smul`	multiplication
107	6B	`slshift`	left shift
108	6C	`srshift`	right shift
113	71	`stoi`	convert to integer
114	72	`stod`	convert to double
115	73	`scopy`	copy string
116	74	`sindex`	string index
117	75	`sinsert`	string insert
118	76	`sslice`	slice (substring)
119	77	`sfind`	find within string
120	78	`sord`	ordinal value
121	79	`schr`	convert char to string

The snew instruction initialises the dst to the empty string. It ignores src1 and src2.

The slen instruction stores in the integer dst the length of the string src1, and ignores src2.

The spush instruction pushes the string src1 onto the stack and ignores src2 and dst. Pushing onto the stack involves copying the source into the memory cell where r63 points and then incrementing r63 to point to the next cell on the stack.

The spop instruction pops from the stack the string kept at the cell referenced by r63, and stores that string into dst, ignoring src1 and src2. Popping from the stack involves decrementing r63 then copying the contents of the cell now pointed to by r63 into the destination. The string pointed to by r63 may be garbage collected at this point.

The scmp instruction compares the string src1 to the string src2 and records the result in a flag in the virtual machine for interpretation by a later conditional branch instruction. The dst operand is ignored.

The sread instruction reads a whole line of input, including leading nd trailing white space, and including any terminating newline, and stores it into the string dst. Both src1 and src2 are ignored.

The sprint instruction prints the string src1 to standard output. It does not print any extra newlines or other characters, only the contents of the string. It ignores src2 and dst.

The sadd instruction stores in dst the result of appending the string src1 to the string src2.

The ssub instruction stores in dst the result of removing the string src1 from the end of the string src2. It is the opposite of sadd, but if the end of the string src2 is not the same as src1, src2 is copied to dst unchanged.

The smul instruction stores in dst the result of appending src1 copies of the string src2 to the empty string. The src1 operand must refer to an integer, and its value must be non-negative, or the program halts with an error. If src1 is zero, an empty string is stored in dst.

The slshift instruction stores in dst the string src2 `shifted left' by src1 characters, that is, it drops src1 characters from the beginning of the string. The src1 operand must refer to an integer. Its value must be non-negative or the program halts with an error. It is valid to shift more characters than exist within the string; in that case, dst will become the empty string.

The srshift instruction stores in dst the string src2 `shifted right' by src1 characters, that is, it drops src1 characters from the end of the string. The src1 operand must refer to an integer. Its value must be non-negative or the program halts with an error. It is valid to shift more characters than exist within the string; in that case, dst will become the empty string.

The stoi instruction converts the string src1 to an integer and stores it in dst; the string is in the format accepted by the C function strtol, so it may begin 0x to specify a number in hexadecimal, or 0 to specify octal, and it may begin with a - or optional + to indicate the sign. The src2 operand is ignored.

The stod instruction converts the string src1 to a floating point number, and stores the result into dst which must be in memory cell mode. The string is the format accepted by the C function atof, and it may begin with an optional - or + to indicate the sign. The src2 operand is ignored.

The scopy instruction copies the string src1 to dst without change. The destination must be in memory cell mode. The src2 operand is ignored.

The sindex instruction stores in string dst the character indexed by src1 in the string src2. Strings are indexed from 0. The src1 operand must refer to an integer. The dst operand must be in memory cell mode, and the result stored there will be a string containing the single character. The src1 operand may be negative, in which case it refers to characters in the string counting from the end, so -1 refers to the last character, -2 the second last character and so on. If src1 is greater than or equal to the length of the string, the empty string is stored in dst. If src1 is less than minus the length of the string, the empty string is stored in dst.

The sinsert instruction inserts in the string dst, at position src1, the string src2. Positions start from 0 within a string. The src1 operand must refer to an integer. If src1 is between zero and the length of the string minus one, the string src2 replaces the character at dst[src1]. If src1 is negative, it may refer to a position counting from the end of the string, so -1 is the last character in the string, -2 the second last and so on. If src1 is greater than or equal to the length of the string, src2 is appended to dst. If src1 is less than minus the length of the string, src2 is prepended to dst.

The sslice instruction slices the string dst into a substring by taking all characters from position src1 up to (but not including) position src2 and forming a new string, which replaces dst. Positions within a string begin at 0 and go up to the length of the string minus one. Negative positions are legal and count from the end of the string, so -1 refers to the last character, -2 the second last and so on. If any position refers to before the start of the string, it is equivalent to position 0. Similarly, if any position refers to beyond the end of the string, it is equivalent to the length of the string. If src2 is a position less than or equal to src1, the result will be an empty string.

The sfind instruction finds the substring src1 in the string src2, storing into dst the integer position of the first matching occurrence. The src1 and src2 operands must refer to strings. Positions start from zero, which would indicate that the substring has been found at the start of the source string. If the string is not found, the integer -1 is stored in dst.

The sord instruction converts the string src1 (which must contain exactly one character) to the integer value of that character, placing the result in dst. The src2 operand is ignored. If src1 does not contain exactly one character, the integer -1 is stored in dst instead.

The schr instruction converts the integer src1 to the ASCII character with that value, then forms a string containing that single character, and places the result in dst. The src2 operand is ignored. If src1 does not contain a valid ASCII character, an empty string is stored in dst instead.

Array Instructions

Arrays are an optional extension designed for students of Comp3800 to attempt. Students of the normal stream are not required to implement arrays but may do so if interested.

The instructions which manipulate arrays are listed below. These arrays may be of any length, and each element within an array may store any kind of object, including integers, floating-point numbers, strings and other arrays. An array may store more than one kind of object, but one array element can only store one kind of object at a time.

In this section, all operands which refer to arrays are assumed to be in memory cell mode, while operands which refer to integers should follow the rules listed earlier for when integers are valid. The term "object" is used to refer to a memory cell which contains an integer, floating-point number, string or array, or else to refer to a register which contains an integer. Objects have a type as well as a value.

Opcode	Hex	Name	Description
128	80	`anew`	create empty array
129	81	`alen`	array length
130	82	`apush`	push onto stack
131	83	`apop`	pop from stack
132	84	`acmp`	compare arrays
135	87	`aadd`	concatenation
137	89	`amul`	multiplication
147	93	`acopy`	copy array
148	94	`aindex`	array index
149	95	`ainsert`	array insert
150	96	`aslice`	array slice
151	97	`afind`	find within array
154	9A	`asort`	sort an array

The anew instruction initialises the dst to the empty array (an array of length zero). It ignores src1 and src2.

The alen instruction stores in the integer dst the length of the array src1, and ignores src2. The dst operand may be in any valid integer mode.

The apush instruction pushes the array src1 onto the stack and ignores src2 and dst. Pushing onto the stack involves copying the source into the memory cell where r63 points and then incrementing r63 to point to the next cell on the stack.

The apop instruction pops from the stack the array kept at the cell referenced by r63, and stores that array into dst, ignoring src1 and src2. Popping from the stack involves decrementing r63 then copying the contents of the cell now pointed to by r63 into the destination. The array pointed to by r63 may be garbage collected at this point.

The acmp instruction compares the array src1 to the array src2 and records the result in a flag in the virtual machine for interpretation by a later conditional branch instruction. The dst operand is ignored. Comparison of two arrays proceeds in much the same way as strings. Elements of two arrays are compared pairwise from the start of each array. The first element which differs determines the ordering. Elements which are integers or floating-point numbers are compared numerically, string elements are compared in the same manner as scmp, and elements which contain arrays are compared in the same manner as acmp. If two elements being compared are of different types, note that all numbers are less than all strings which are less than all arrays.

The aadd instruction stores in dst the result of appending the array src1 to the array src2.

The amul instruction stores in dst the result of appending src1 copies of the array src2 to the empty array. The src1 operand must refer to an integer in any valid integer mode, and its value must be non-negative, or the program halts with an error. If src1 is zero, an empty array is stored in dst.

The acopy instruction copies the array src1 to dst without change. The destination must be in memory cell mode. The src2 operand is ignored.

The aindex instruction stores in dst the object indexed by src1 in the array src2. Arrays are indexed from 0. The src1 operand must refer to an integer in any valid integer mode. The dst operand must be in memory cell mode, and the result stored there will be the object at src2[src1]. The src1 operand may be negative, in which case it refers to elements in the array counting from the end, so -1 refers to the last object, -2 the second last object and so on. If src1 is greater than or equal to the length of the array, or if src1 is less than minus the length of the array, the program halts with an error.

The ainsert instruction inserts in the array dst, at position src1, the object src2. Positions start from 0 within an array. The src1 operand must refer to an integer in any valid integer mode. If src1 is between zero and the length of the array minus one, the object src2 replaces the array element at dst[src1]. If src1 is negative, it may refer to a position counting from the end of the array, so -1 is the last element in the array, -2 the second last and so on. If src1 is greater than or equal to the length of the array, src2 is appended to dst. If src1 is less than minus the length of the array, src2 is prepended to dst.

The aslice instruction slices the array dst into a subarray by taking all elements from position src1 up to (but not including) position src2 and forming a new array, which replaces dst. The operands src1 and src2 are integers and may be in any valid integer mode. Positions within an array begin at 0 and go up to the length of the array minus one. Negative positions are legal and count from the end of the array, so -1 refers to the last element, -2 the second last and so on. If any position refers to before the start of the array, it is equivalent to position 0. Similarly, if any position refers to beyond the end of the array, it is equivalent to the length of the array. If src2 is a position less than or equal to src1, the result will be an empty array.

The afind instruction finds the object src1 in the array src2, storing into dst the integer position of the first matching occurrence. The src1 operand may be in any mode and refer to any kind of object. The src2 operand must refer to an array in memory cell mode. The dst operand may be in any valid integer mode. Finding an object involves matching both the type of the object, and its data, in the same way as the various cmp instructions. Positions start from zero, which would indicate that the object has been found at the start of the array. If the object is not found, the integer -1 is stored in dst.

The asort instruction sorts the array src1 and stores the resulting sorted array into dst. It ignores src2. The elements of the array are sorted in the following way: integers and floating-point numbers are folded together and sorted numerically; all numbers are less than all strings; and all strings are less than all arrays. Strings within the array are ordered alphabetically. Arrays within the array are ordered as defined with acmp.

Execution

All registers are initialized to zero, except for r63, which is initialized to one more than the largest initially loaded memory cell address in the program. By default, the machine interpreter adds 1000 cells to the top of memory to serve as the stack.

All cells will contain zero unless otherwise defined in the program.

All integers and floating point numbers are set to zero unless otherwise defined in the program.

All strings are set to the empty string unless otherwise defined in the program.

During execution, the pc holds the address of the instruction being executed. It begins at 0, and is incremented after each instruction is completed. However, the address in a branch is that of the target instruction, so no increment occurs after a pc-modifying instruction.

Change log

July 17: Original version.

July 19: Added some general rules for operand modes in the data instructions section.

July 28: Added dynamic array instructions section, to be implemented by Comp3800 students.

COMP 3100/3800/3204: Software Engineering Specification of the Ace Virtual Machine