Specification of the Ace Machine
and Assembler Formats

Program Names

There are two programs to create, named aceasm and acevm. The aceasm program (the Ace assembler) assembles files of suffix .aca into files of suffix .acm which are then interpreted and executed by the acevm program (the Ace virtual machine).

Assembly Language

The assembly language is a fairly straightforward representation of the instructions and values of the machine. It is usually stored in a file suffixed .aca and assembled into a code file suffixed .acm.

Comments begin with a number sign # and go to the end of the line of text.

Instructions are grouped into one section of the assembly file, and each type of memory value is grouped into its own section, each prefixed by an identifying word. CODE introduces the code section, INT the integer value section, DOUBLE the floating-point section, and STRING the string section. The sections must appear in the above order, and may appear only once in the program. A missing section is taken to be empty; the code section may not be empty. The successive elements of each section are assembled in order, so code begins at address 0 in the CODE section, while the first value in the first data section starts at memory cell address 0.

Any address in any section may be labeled by an alphanumeric string, followed by a colon. The label may be used as an operand; its value is the address of the memory cell that it labels. For example,

max:	7
defines an integer labeled max whose initial value is 7. The value may be loaded into register zero as follows:
	copy	max, r0
	copy	i[r0], r0

Since max is the address of a cell containing 7, the first line in the above code loads r0 with that address, and the next line loads the integer from that cell of memory into r0.

To clarify what constitutes a label: a label must consist of alphabetic or numeric characters or the underscore character '_', and it must begin with an alphabetic character. Labels which are keywords in the assembler are disallowed, such as r0 to r31 or pc or sp or any of the instruction names.

In the code section, labels represent instruction addresses; they need no adornment and are usually translated into the pc-relative addressing mode. For example,

	bra	loop
is an infinite loop representing a branch of +0.

Each label must be unique across all sections of the program, but there may be multiple labels with the same value.

Instruction syntax is defined in the document specifying the virtual machine. The only difference from that definition and the actual assembly language is that labels may be used as literal values and branch and call targets, and that the assembler is free to encode small literals as large literals if desired.

(Note: The assemblers of Advanced and Honours students are required to accept C constant expressions wherever integer constants are valid in the assembly language. (Don't go down that road for floating-point; it's a vale of tears.))

Integer and floating-point values are represented in their usual textual form. Integers may be hexadecimal or octal, so indicated by a 0x or 0 prefix. They may also be normal decimal values.

Strings are represented as double-quoted strings, equivalent to those in C. Note that this is not the same as the format used in virtual machine code files: that format has no double quotes and a less general escape mechanism.

In the integer, floating-point, and string sections, a label may be followed by an integer value in square brackets, for example,

myvalues: [10]
This defines (in this case) ten units of storage beginning at the current address. If some of those values are to be initialized, they must appear as a comma-separated list on the same line of text as the original label, and they initialize the first elements of the block of storage. For example,
pow2: [8] 1, 2, 4, 8
defines a block of 8 integers but initializes only the first four of those elements. The remaining elements of the block are initialized to the default value, 0 for integers, 0.0 for doubles, and the empty string for strings.

In general, the input is free format, except that there can be only one instruction on a line, and the entire instruction definition must be on a single line. Also, as noted, the definition of block data is sensitive to the placement of newlines.

Machine Format

The input file to the virtual machine, usually stored in a file suffixed .acm, is a textual encoding, one item per line, of the various initial values to be loaded into the code and memory sections of the program. The first line of the file contains four decimal integers, separated by a single space character, stating the number of lines of each of the four sections of the program, in the order: code, integer, double, string. Call these four values Ncode, Nint, Ndouble, and Nstring. All must be non-negative, and Ncode must be greater than zero.

The next Ncode lines of the file contain the instructions of the machine, one instruction per line. The instructions are encoded as described in the definition of the virtual machine, and are stored in the file as 8-digit, unsigned, zero padded hexadecimal numbers. Thus, each instruction is stored as 8 hex digits followed by a newline; there is no leading 0x. The instructions are stored in increasing address order, without gaps, starting at address 0 (on the second line of the file).

Following the instructions are Nint lines giving the initial values of the successive integer values in the virtual machine's memory cells. These are stored as plain signed decimal integers, as might be converted using atoi in C.

Following the integers are Ndouble lines giving the initial values of the successive double-precision floating-point values in the machine's memory cells. These are stored as plain, signed, floating-point values, as might be converted by atof in C.

Following the floating-point values are Nstring lines giving the initial values of the successive string values in the machine's memory cells. These are encoded as character data, but with escapes to encode newlines and other difficult-to-represent values. The only escape sequences recognized in this encoding are: \n for newline, \t for tab, \b for backspace, \r for carriage return, \f for vertical tab (form feed), and \\ for a single backslash. Any other backslash-character sequence represents just the character following the backslash, so \v represents just the letter v. Note that each string value occupies a single line of text. The newline that terminates that line is not included in the value, and newlines within the string must be represented explicitly using the \n notation.

At the start of executing the program in the virtual machine, the memory cells of the machine will be initialised with the values as given, in the order given above, so integers will appear in memory before floating-point numbers, and then the strings.

A halt instruction is appended to the code. Implementation-defined space is allocated, at the top of the initialised memory cells, to hold the stack. Note that values must be supplied for every cell of memory to be used in the program (except the stack); there is no mechanism for defining an uninitialized block of values.

Example Assembler File .aca

The following is an example assembly language program, which adds up the floating-point numbers in an array and prints the total.


	copy	0, r0			# counter
	copy	max, r1			# address of max
	copy	i[r1], r1		# r1 is now 5
	copy	total, r2
	copy	numbers, r3
	cmp	r0, r1
	bge	end
	add	d[r3], d[r2], d[r2]	# add to total
	add	1, r0			# step counter
	add	1, r3			# step pointer
	bra	loop
	copy	report, r5
	print	s[r5]
	print	d[r2]
	copy	newline, r5
	print	s[r5]


max:		5


total:		0.0
numbers:	[5] 0.35, 0.57, 0.76, 0.61, 0.83


report:		"The total is "
newline:	"\n"

Example Virtual Machine Code File .acm

The program in the previous section, when assembled, yields the following .acm file.

17 1 6 2
The total is 

Copyright © 1999 R. Pike and L. Patrick