COMP 3100/3800/3204: Software Engineering
Specification of the Ace Machine
and Assembler Formats

Program Names

There are three programs which are part of the Ace project: a compiler named ace, an assembler named asm and a virtual machine named avm. If you are only taking Comp3100/3800 you will only be building the virtual machine, but reading this document will help you understand how the virtual machine's input file is translated from higher-level languages. The ace program (the Ace compiler) compiles programs written in a high-level language (with suffix .ace) into the assembler file format. The asm program (the Ace assembler) assembles files of suffix .asm into files of suffix .avm which are then interpreted and executed by the avm program (the Ace virtual machine).

Assembly Language

The assembly language is a fairly straightforward representation of the instructions and values of the machine. It is usually stored in a file suffixed .asm (dot-lowercase-a-s-m) and assembled into a code file suffixed .avm.

Comments begin with a number sign # and go to the end of the line of text.

Each type of memory value is grouped into its own section of the assembly file, and instructions are grouped in another section. Each section is prefixed by an identifying word. INT introduces the integer data section, DOUBLE the floating-point data section, STRING the string data section and CODE the instruction section. The sections must appear in the above order, and each section may appear only once in any assembly file. A missing section is taken to be empty; the code section may not be empty. The successive elements of each section are assembled in order. The first value in the first data section starts at memory cell address 0, and the addresses increase as successive values are assembled. Instructions are part of code space, so the numbering begins again at code address 0 with the first instruction in the CODE section.

Any address in any section may be labeled by an alphanumeric string, followed by a colon. The label may be used as an operand; its value is the address of the memory cell or instruction that it labels (not the value of that memory cell or instruction). For example,

max:	7
defines an integer cell labeled max whose initial value is 7. The value may be loaded into register zero as follows:
	icopy	max, r0
	icopy	[r0], r0

Since max is the address of a cell containing 7, the first line in the above code loads r0 with that address, and the next line loads the integer from that cell of memory into r0.

To clarify what constitutes a label: a label must consist of alphabetic or numeric characters or the underscore character '_', and it must begin with an alphabetic character. Labels which are keywords in the assembler are disallowed, such as r0 to r63 or pc or sp or fp or any of the instruction names.

In the code section, labels represent instruction addresses; they need no adornment and are usually translated into the pc-relative addressing mode. For example,

loop:
	bra	loop
is an infinite loop representing a branch of +0.

Each label must be unique across all sections of the program, but there may be multiple labels with the same value, for instance:

loop_test:
exit_test:
	icmp	r0, r1
	bra	loop_end
defines two labels which both address the same instruction.

Instruction syntax is defined in the document specifying the Ace virtual machine. The only difference from that definition and the actual assembly language is that labels may be used as literal values and branch and call targets, and that the assembler is free to encode small literals as large literals if desired.

Integer and floating-point values are represented in their usual textual form. Integers may be hexadecimal or octal, so indicated by a 0x or 0 prefix respectively. They may also be normal decimal values.

Strings are represented as double-quoted strings, equivalent to those in C. Note that this is not the same as the format used in virtual machine code files: that format has no surrounding double quotes and a less general escape mechanism for special characters.

In the integer, floating-point, and string sections, a label may be followed by an integer value in square brackets, for example,

myvalues: [10]
This defines (in this case) ten units of storage beginning at the current address. If some of those values are to be initialized, they must appear as a comma-separated list on the same line of text as the original label, and they initialize the first elements of the block of storage. For example,
pow2: [8] 1, 2, 4, 8
defines a block of 8 integers but initializes only the first four of those elements. The remaining elements of the block are initialized to the default value, 0 for integers, 0.0 for doubles, and the empty string for strings.

In general, the input is free format, except that there can be only one instruction on a line, and an entire instruction definition must be on a single line. Also, as noted, the definition of block data is sensitive to the placement of newlines.

Machine Format

The input file to the Ace virtual machine, usually stored in a file suffixed .avm (dot-lowercase-a-v-m), is a textual encoding, one item per line, of the various initial values to be loaded into the code and memory sections of the program. The first line of the file contains a comment which begins with a hash character # and continues until the end of the line. This line is ignored by the virtual machine. The second line of the file contains four decimal integers, separated by a single space character, stating the number of lines in each of the four sections of the program, in the order: integer, double, string, code. Call these four values Nint, Ndouble, Nstring, and Ncode. All must be non-negative, and Ncode must be greater than zero.

The next Nint lines give the initial values of the successive integer values in the virtual machine's memory cells. These are stored as plain signed decimal integers, as might be converted using atoi in C.

Following the integers are Ndouble lines giving the initial values of the successive double-precision floating-point values in the machine's memory cells. These are stored as plain, signed, floating-point values, as might be converted by atof in C.

Following the floating-point values are Nstring lines giving the initial values of the successive string values in the machine's memory cells. These are encoded as character data, but with escapes to encode newlines and other difficult-to-represent values. The only escape sequences recognized in this encoding are: \n for newline, \t for tab, \b for backspace, \r for carriage return, \f for vertical tab (form feed), and \\ for a single backslash. Any other backslash-character sequence represents just the character following the backslash, so \v represents just the letter v. Note that each string value occupies a single line of text. The newline that terminates that line is not included in the value, and newlines within the string must be represented explicitly using the \n notation. There are no quotes around the string, so quotes within the string can simply be represented with a quotation mark, or they may be escaped using backslash.

The last Ncode lines of the file contain the instructions of the machine, one instruction per line. The instructions are encoded as described in the definition of the Ace virtual machine, and are stored in the file as 8-digit, unsigned, zero padded hexadecimal numbers. Thus, each instruction is stored as 8 hex digits followed by a newline; there is no leading 0x. The instructions are stored in increasing address order, without gaps, starting at address 0.

At the start of executing the program in the virtual machine, the memory cells of the machine will be initialised with the values as given, in the order given above, so integers will appear in memory before floating-point numbers, and then the strings.

A halt instruction is appended to the code. Implementation-defined space is allocated, at the top of the initialised memory cells, to hold the stack. Note that values must be supplied for every cell of memory to be used in the program (except the stack); there is no mechanism for defining an uninitialized block of values.

Example Assembler File .asm

The following is an example assembly language program, which adds up the floating-point numbers in an array and prints the total.

# Example assembler file

INT

max:		5

DOUBLE

total:		0.0
numbers:	[5] 0.35, 0.57, 0.76, 0.61, 0.83

STRING

report:		"The total is "
newline:	"\n"

CODE

	icopy	0, r0			# counter
	icopy	max, r1			# address of max
	icopy	[r1], r1		# r1 is now 5
	icopy	total, r2		# address of total
	icopy	numbers, r3		# address of numbers
loop:
	icmp	r0, r1
	bge	end
	dadd	[r3], [r2], [r2]	# add to total
	iadd	1, r0			# step counter
	iadd	1, r3			# step pointer
	bra	loop
end:
	icopy	report, r5		# address of report
	sprint	[r5]			# print report
	dprint	[r2]			# print total
	icopy	newline, r5		# address of newline
	sprint	[r5]			# print newline
	halt

Example Virtual Machine Code File .avm

The program in the previous section, when assembled, yields the following .avm file.

# avm file
1 6 2 17
5
0.0
0.35
0.57
0.76
0.61
0.83
The total is 
\n
31000080
31000081
31c10081
31010082
31020083
24808100
04400005
47c3c2c2
27400180
27400183
027ffffb
31070085
66c50000
46c20000
31080085
66c50000
00000000


Copyright © 2000 R. Pike and L. Patrick