Specification of the Ace Virtual Machine

Registers

There are 32 registers, r0 through to r31. All the registers behave the same, except that r31 (the stack pointer) is handled specially by the subroutine call and ret instructions, and the push and pop instructions. The name sp is equivalent to r31 and that register can be manipulated normally, as well as via special instructions.

The program counter, pc, is not addressable and is not visible as a register.

The registers are signed integers 32 bits long.

Address Spaces

There are two address spaces: code space and memory space. Memory space is broken into three data types: integers, floating point numbers, and strings.

Code space stores the instructions for the program, and can only be addressed by the branching and subroutine instructions. The representation of instructions within code space is not specified.

Memory space consists of an array of cells, with addresses 0, 1, 2, ... and upwards. Each cell is capable of holding any one of the three basic data types in the machine. The basic data types are integer, floating-point number and string. One cell can hold one of these types of data at any one time but may be reused to hold a different type of data at a later time.

The integers are all 32-bit signed quantities, equivalent to long in C.

The floating-point numbers are 64-bit IEEE floating point numbers, equivalent to double in C.

The strings are sequences of ASCII characters, and have arbitrarily variable length. A pointer from a memory cell points to the sequence of characters, which exists outside memory space. The characters within a string cannot be directly addressed, except via special string instructions. The representation of the sequence of characters is not specified.

Each instruction in the machine operates on one or more of these basic types, according to the operand addressing modes stored with the instruction.

Instruction Layout

Instructions are laid out in a fixed format within a 4-byte, 32-bit word. The high byte encodes the operation code (opcode); the remaining bytes encode the operands:

opcode src1 src2 dst

The opcode is stored as a single byte.

Code space instructions have one operand spanning all three operand fields. Other instructions have three operands: two source operands src1 and src2 and one destination operand dst. Not all operands are meaningful for all instructions, and src1 may be a large literal, occupying the space for both src1 and src2. Instructions with a large literal operand are written in the assembler as having only two operands, e.g.

	add	1234, r6
but behave as though they have three operands, with dst acting also as src2, and the large literal value acting as src1.
	add	1234, r6, r6

The following sections describe how the operands are encoded and how the instructions specified in the opcode interpret those operands.

Operand Modes

There are six addressing modes, encoded in three bits at the top of each operand (except for large literal and code space instructions, which are explained below).

ValueSyntax Name
0 1234 large literal
1 4 small literal
2 r4 register
3 i[r4] integer cell
4 d[r4] double cell
5 s[r4] string cell
6 illegal
7 illegal

Large literal mode refers to the signed integer value stored in the current and subsequent operands. It uses more than one operand to store a larger than normal value, as discussed in detail in the next section on operand encoding.

Small literal mode refers to the integer value contained in the current operand byte. Small literal mode operands are always signed integers, but because they must fit within one operand can only have a small range of possible values.

Register mode references the value of the register itself as a signed integer.

Integer cell mode references a cell in memory by using a register as an index into the virtual machine's memory. This mode implies the memory cell is to be treated as if it contains or will now contain an integer, depending on whether it is being used as source or destination operand.

Double cell mode references a cell in memory by using a register as an index into the virtual machine's memory. This mode implies the memory cell is to be treated as if it contains or will now contain a floating-point number, depending on whether it is being used as source or destination operand.

String cell mode references a cell in memory by using a register as an index into the virtual machine's memory. This mode implies the memory cell is to be treated as if it contains or will now contain a string, depending on whether it is being used as source or destination operand.

Operand Encoding

An operand is encoded as follows:

m2 m1 m0 v4 v3 v2 v1 v0

The bits m[2-0] store the operand mode value, encoding the mode of the operand (as described above). The bits v[4-0] store the 5-bit operand value. If the mode is register or one of the memory cell modes, this gives an unsigned register number in the range 0 to 31.

If the mode is small literal, this gives a signed 5-bit integer, in the range -16 to +15.

Large literal operands can appear as the first operand (src1). Within the instruction word, the encoding for a large literal occupies the space of both src1 and src2, thus using three mode bits and allowing the next 13 bits to contain a signed integer value:

m2 m1 m0 v12 v11 v10 v9 v8 v7 v6 v5 v4 v3 v2 v1 v0

The fields m[2-0] will have value 0 (binary 000) and the value of the operand is the 13-bit signed integer v[12-0].

If an instruction has a large literal src1 operand, the semantics of the instruction are such that the destination operand dst takes the place of the missing src2 operand. So, for instance, addition is normally interpreted as dst = src2 + src1, but with a large literal is taken to mean dst = dst + src1.

Operand Encoding for Code Space Instructions

Operands for code space instructions are encoded differently: there is a single operand in the instruction word, occupying all 24 operand bits in the word, and there are only three addressing modes which can be used. The top three bits of the operand are the mode, and the other 21 are the value:

m2 m1 m0 v20 v19 v18 v17 v16 v15 v14 v13 v12 v11 v10 v9 v8 v7 v6 v5 v4 v3 v2 v1 v0

The three mode bits form a number which is interpreted in the following way:

ValueSyntax Name
0 12345 absolute
1 -5678 relative
2 r4 register

Absolute mode references the instruction whose address is the signed 21-bit value. In assembly language, labels may be used instead of actual numbers, but these labels are replaced with the numbers when assembled.

Relative mode references the instruction whose address is the pc plus the signed 21-bit value. Unlike small literal mode, relative mode literals begins with a plus or minus sign in the assembly language, and the value occupies the remaining 21 bits of the operand. The assembler may choose to replace a label referencing an instruction with a relative mode operand instead of an absolute operand.

Register mode references the instruction whose address is stored in the register. The number of the register is stored in the lowest 5 bits of the 21 bits of the operand.

Code Space Instructions

Code space instructions are branches and other control instructions:

Opcode   Hex   Name Description
000 halt halt machine
101 nop no operation
202 call subroutine call
303 ret subroutine return
404 bra branch always
505 bgt branch if greater than
606 bge branch if greater than or equal
707 blt branch if less than
808 ble branch if less than or equal
909 beq branch if equal
100Abne branch if not equal
110Bbeof branch if end of file

Most code space instructions take a single operand (the exceptions are halt, nop and ret, which evaluate but ignore their operands). That operand defines the target program counter (pc) of the operation, for example, the first instruction of the subroutine for a call instruction.

The halt instruction halts the machine unconditionally.

The nop instruction does nothing.

The call instruction does a subroutine call. Its operation is to push the pc of the call instruction on the stack (storing the pc into the memory cell that r31 addresses and then incrementing r31 to point to the next stack cell), and set the pc equal to the operand.

The ret instruction returns from a subroutine by popping the pc from the stack, then incrementing pc and continuing operation from that instruction.

The bra instruction is an unconditional branch.

The beof instruction branches according to whether the most recently executed I/O instruction resulted in an end-of-file condition.

The various other branch instructions branch according to the result of the most recently executed comparison instruction.

Non-Control Instructions

The remaining instructions manipulate the data in the virtual machine. Each of these instructions uses either a memory cell, a register or a literal integer value. In the following discussion, the phrase `all types' refers to the three valid types in memory cells, integer, floating-point (also known as double) and string. The phrase `integer modes' refers to the literal, register or integer cell modes.

Opcode   Hex   Name Description
3220new initialise memory cell
3321size absolute value or size
3422push push onto stack
3523pop pop from stack
3624cmp compare
3725read read value from input
3826print print value on output
3927add addition
4028sub subtraction
4129mul multiplication
422Adiv division
432Blshiftleft shift
442Crshiftright shift
452Dcopy copy or convert data
462Eindex string index
472Finsertstring insert
4830find find within string
4931mod modulus
5032hash modulus
255FFerr no operation

These instructions, unless noted otherwise, will only operate on one type of data. For instance, the sub instruction takes three operands, and they must all be of one type. If they are all integer types (literal, register or integer cell mode operands), the instruction subtracts the integers specified, storing the result into the location given as the integer destination operand. If all the operands are of type double (double cell mode operands) the same instruction subtracts the floating-point operands, storing a floating-point value into the destination. It is not legal to subtract an integer from a double using that instruction.

The general rules for allowable operand types are:

There are some exceptions to these rules. These exceptions are noted with the descriptions of the exceptional instructions. Not all instructions have meaning for all operand types, for instance, the div instruction never uses string operands. The following sections describe all the valid uses of the instructions.

Instruction Descriptions

The new instruction works with all destination types. It initialises the dst to 0, 0.0 or the empty string, depending on whether the type of the dst operand references an integer, double or string value respectively. It ignores src1 and src2.

The size instruction works with all operand types. It stores in dst the absolute value of src1, and ignores src2. If the source is a string, the destination must be in register or integer cell mode and the length of the string, in characters, is stored in the destination. Otherwise, dst must be of the same type as src1.

The push instruction works with all source operand types. It pushes the value of src1 onto the stack and ignores src2 and dst. Pushing onto the stack involves copying the source into the memory cell where r31 points and then incrementing r31 to point to the next cell on the stack.

The pop instruction works with all destination operand types. It pops from the stack the value kept at the cell referenced by r31, and stores that value into dst, ignoring src1 and src2. Popping from the stack involves decrementing r31 then copying the contents of the cell now pointed to by r31 into the destination, then re-initialising that stack cell in the same manner as the new instruction specifies.

The cmp instruction works with src1 and src2 operands which are of the same type. They might both refer to integers, or both refer to floating-point cells, or both refer to string cells. It compares the value of src1 to src2 and records the result in a flag in the virtual machine for interpretation by a later conditional branch instruction. For example,

	cmp	11, 5
	bgt	somewhere
will branch to the instruction labeled somewhere. The cmp instruction ignores dst. Because this instruction requires both src1 and src2, the src1 operand may not be in large literal mode.

The read instruction works with all destination types. It reads from standard input the textual representation of a value and stores it in dst. The src1 and src2 operands are ignored. If an end of file condition is encountered, a flag is set inside the virtual machine allowing a subsequent beof instruction to succeed, otherwise that same flag is cleared. For register or integer cell modes, leading white space is ignored, and the format is a decimal integer, or an octal integer beginning with 0, or a hexadecimal integer beginning with 0x or 0X. For the floating-point double cell mode, leading white space is also ignored, and the format is the usual floating-point format. For string cell mode, the instruction reads a whole line of input, including leading white space, and including the terminating newline.

The print instruction works with src1 operands of any mode. It prints the textual representation of src1 to standard output. It ignores src2 and dst.

Integer Instructions

The instructions which perform the usual arithmetic operations are listed below. In the discussions in this section, assume that all the operands may be any valid integer mode, except that dst may not be a literal and src2 may not be a large literal (but may be a small literal). There is no check for overflow or underflow.

The add instruction stores in dst the value of src2+src1.

The sub instruction stores in dst the value of src2-src1.

The mul instruction stores in dst the value of src2*src1.

The div instruction stores in dst the value of src2/src1. The value is truncated, using C's integer division rules. Division by zero is illegal and halts execution.

The lshift instruction stores in dst the value of src2 shifted left src1 bits. The bits shifted into the value are set to 0. The value of src1 should be between 0 and 31, inclusive. If the value is between -31 and -1 inclusive, that value is negated and the operation becomes a right shift. If the value is less than -31 or greater than 31, the result will be zero.

The rshift instruction stores in dst the value of src2 shifted right src1 bits. The bit shifted into the value are set to 0. The value of src1 should be between 0 and 31, inclusive. If the value is between -31 and -1 inclusive, that value is negated and the operation becomes a left shift. If the value is less than -31 or greater than 31, the result will be zero.

The copy instruction copies src1 to dst without change if the operands are both in integer modes. The src2 operand is ignored.

The copy instruction can also convert floating-point numbers to integers. If dst is in integer mode, but src1 is in double cell mode, the source floating-point value is converted to an integer by truncating the fraction and storing the result into the destination. The src2 operand is ignored.

The copy instruction can also convert strings to integers. If dst is in integer mode, but src1 is in string cell mode, the source string is converted to an integer and stored in the destination; the string is in the format accepted by the C function strtol, so it may begin 0x to specify a number in hexadecimal, or 0 to specify octal, and it may begin with a - or optional + to indicate the sign. The src2 operand is ignored.

The mod instruction stores in dst the value of src2 mod src1. The result is always an integer greater or equal to zero and less than the absolute value of src1. Modulo by zero is illegal and halts execution.

The and instruction stores in dst the bitwise AND of src1 and src2.

The or instruction stores in dst the bitwise OR of src1 and src2.

The xor instruction stores in dst the bitwise exclusive OR of src1 and src2.

The hash instruction stores in dst a hash value of src1. Typically this will just be the value of src1 itself.

The err instruction prints an error message containing the value of src1 and causes the program to finish.

Floating-Point Instructions

The instructions which perform the usual floating-point operations are listed below. In this section, all operands are assumed to refer to floating-point numbers in double cell mode. The behavior of underflow, overflow, and not-a-number values is undefined.

The add instruction stores in dst the value of src2+src1.

The sub instruction stores in dst the value of src2-src1.

The mul instruction stores in dst the value of src2*src1.

The div instruction stores in dst the value of src2/src1. Division by zero is illegal and halts execution.

The copy instruction copies src1 to dst without change if the operands are both in double cell mode. The src2 operand is ignored.

The copy instruction can also convert integers to floating-point numbers. If the src1 operand is in an integer mode, but dst is in double cell mode, it converts the integer src1 to a floating point number stored in dst. The destination must be in double cell mode. The src2 operand is ignored.

The copy instruction can also convert strings to floating-point numbers. If the src1 operand is in string cell mode, it converts the source string to a floating point number, and stores the result into dst which must be in double cell mode. The string is the format accepted by the C function atof, and it may begin with a - or optional + to indicate the sign. The src2 operand is ignored.

The hash instruction stores in dst a hash value of src1. This will be all the bits of src1 scrambled and arithmetically combined in some deterministic way.

String Instructions

The instructions which manipulate strings are listed below. In this section, except as noted, all operands used are assumed to be in string cell mode. The resulting strings may be of any length.

The add instruction stores in dst the result of appending the string src1 to the string src2. If src1 is in an integer mode, it is interpreted as a character to be appended to the string src2 and stored in dst.

The mul instruction stores in dst the result of appending src1 copies of the string src2 to the empty string. The src1 operand must be in an integer mode, and its value must be non-negative.

The lshift instruction stores in dst the string src2 `shifted left' by src1 characters, that is, it drops src1 characters from the beginning of the string. The src1 operand is an integer in any valid integer mode. Its value should be non-negative and no greater than the length of the string. If the value is negative, an error results. If the value is greater than the length of the string, the empty string is placed into dst.

The rshift instruction stores in dst the string src2 `shifted right' by src1 characters, that is, it drops src1 characters from the end of the string. The src1 operand is an integer in any valid integer mode. Its value should be non-negative and no greater than the length of the string. If the value is negative, an error results. If the value is greater than the length of the string, the empty string is placed into dst.

The copy instruction copies src1 to dst without change if the operands are both in string cell mode. The destination must be in string cell mode. The src2 operand is ignored.

The copy instruction can also convert integers to strings, by storing a textual representation of the integer src1 in the destination dst. The result is equivalent to the conversion by sprintf format %d. The src1 operand must be in an integer mode. The destination must be in string cell mode. The src2 operand is ignored.

The copy instruction can also convert floating-point numbers to strings, by storing a textual representation of the floating point value src1 in the destination dst. The result is equivalent to the conversion by printf format %g with full precision to recover the value using a read instruction. The destination must be in string cell mode. The src2 operand is ignored.

The index instruction stores in dst the character indexed by src1 in the string src2. Strings are indexed from 0. The src1 operand is an integer in any valid integer mode. Its value must be non-negative and less than the length of the string. If dst is in register or integer cell mode, the result is the integer value of the character; otherwise, dst must be in string cell mode, and the result stored there will be a string containing the single character.

The insert instruction inserts in the string dst, at position src1, the character or string in src2. The src1 operand is an integer and can be in any valid integer mode. Its value must be non-negative and no greater than the length of the string; otherwise the machine halts with an error. If it is equal to the length of dst, src1 is appended to dst. If src2 refers to an integer, it represents a character to replace dst[src1]. Otherwise, it must be string cell mode and references a string to be inserted into dst before the src1th character.

The find instruction finds the character or substring src1 in the string cell referenced by src2, storing into dst the integer location of the first matching occurrence. The src1 operand may be in an integer mode, in which case its value is interpreted as a character, or it may be in string cell mode, referring to a substring to find. The src2 operand must be in string cell mode, and dst must be in an integer mode. Locations start from zero, which would indicate that the character or substring has been found at the start of the source string. If the string cannot be found, the value negative one is stored into dst.

The hash instruction stores in dst a hash value of src1. This will be all the chars of src1 scrambled and arithmetically combined in some deterministic way.

The err instruction prints the string given in src1 as an error message and causes the program to finish.

Execution

All registers are initialized to zero, except for r31, which is initialized to one more than the largest initially loaded memory cell address in the program. By default, the machine interpreter adds 1000 cells to the top of memory to serve as the stack.

All cells will contain zero unless otherwise defined in the program.

All integers and floating point numbers are set to zero unless otherwise defined in the program.

All strings are set to the empty string unless otherwise defined in the program.

During execution, the pc holds the address of the instruction being executed. It begins at 0, and is incremented after each instruction is completed. However, the address in a branch is that of the target instruction, so no increment occurs after a pc-modifying instruction.

Change log

March 30: Original version.

April 12: Corrected discrepancy in string index instruction.

May 5: Fixed large literal examples.

September 17: Added hash and err operations.


Copyright © 1999 R. Pike and L. Patrick