There are three programs which are part of the Ace project: a compiler named ace, an assembler named asm and a virtual machine named avm. If you are only taking Comp3100/3800 you will only be building the virtual machine, but reading this document will help you understand how the virtual machine's input file is translated from higher-level languages. The ace program (the Ace compiler) compiles programs written in a high-level language (with suffix .ace) into the assembler file format. The asm program (the Ace assembler) assembles files of suffix .asm into files of suffix .avm which are then interpreted and executed by the avm program (the Ace virtual machine).
The assembly language is a fairly straightforward representation of the instructions and values of the machine. It is usually stored in a file suffixed .asm (dot-lowercase-a-s-m) and assembled into a code file suffixed .avm.
Comments begin with a number sign # and go to the end of the line of text.
Each type of memory value is grouped into its own section of the assembly file, and instructions are grouped in another section. Each section is prefixed by an identifying word. INT introduces the integer data section, DOUBLE the floating-point data section, STRING the string data section and CODE the instruction section. The sections must appear in the above order, and each section may appear only once in any assembly file. A missing section is taken to be empty; the code section may not be empty. The successive elements of each section are assembled in order. The first value in the first data section starts at memory cell address 0, and the addresses increase as successive values are assembled. Instructions are part of code space, so the numbering begins again at code address 0 with the first instruction in the CODE section.
Any address in any section may be labeled by an alphanumeric string, followed by a colon. The label may be used as an operand; its value is the address of the memory cell or instruction that it labels (not the value of that memory cell or instruction). For example,
max: 7
icopy max, r0 icopy [r0], r0
Since max is the address of a cell containing 7, the first line in the above code loads r0 with that address, and the next line loads the integer from that cell of memory into r0.
To clarify what constitutes a label: a label must consist of alphabetic or numeric characters or the underscore character '_', and it must begin with an alphabetic character. Labels which are keywords in the assembler are disallowed, such as r0 to r63 or pc or sp or fp or any of the instruction names.
In the code section, labels represent instruction addresses; they need no adornment and are usually translated into the pc-relative addressing mode. For example,
loop: bra loop
Each label must be unique across all sections of the program, but there may be multiple labels with the same value, for instance:
loop_test: exit_test: icmp r0, r1 bra loop_end
Instruction syntax is defined in the document specifying the Ace virtual machine. The only difference from that definition and the actual assembly language is that labels may be used as literal values and branch and call targets, and that the assembler is free to encode small literals as large literals if desired.
Integer and floating-point values are represented in their usual textual form. Integers may be hexadecimal or octal, so indicated by a 0x or 0 prefix respectively. They may also be normal decimal values.
Strings are represented as double-quoted strings, equivalent to those in C. Note that this is not the same as the format used in virtual machine code files: that format has no surrounding double quotes and a less general escape mechanism for special characters.
In the integer, floating-point, and string sections, a label may be followed by an integer value in square brackets, for example,
myvalues: [10]
pow2: [8] 1, 2, 4, 8
In general, the input is free format, except that there can be only one instruction on a line, and an entire instruction definition must be on a single line. Also, as noted, the definition of block data is sensitive to the placement of newlines.
The input file to the Ace virtual machine, usually stored in a file suffixed .avm (dot-lowercase-a-v-m), is a textual encoding, one item per line, of the various initial values to be loaded into the code and memory sections of the program. The first line of the file contains a comment which begins with a hash character # and continues until the end of the line. This line is ignored by the virtual machine. The second line of the file contains four decimal integers, separated by a single space character, stating the number of lines in each of the four sections of the program, in the order: integer, double, string, code. Call these four values Nint, Ndouble, Nstring, and Ncode. All must be non-negative, and Ncode must be greater than zero.
The next Nint lines give the initial values of the successive integer values in the virtual machine's memory cells. These are stored as plain signed decimal integers, as might be converted using atoi in C.
Following the integers are Ndouble lines giving the initial values of the successive double-precision floating-point values in the machine's memory cells. These are stored as plain, signed, floating-point values, as might be converted by atof in C.
Following the floating-point values are Nstring lines giving the initial values of the successive string values in the machine's memory cells. These are encoded as character data, but with escapes to encode newlines and other difficult-to-represent values. The only escape sequences recognized in this encoding are: \n for newline, \t for tab, \b for backspace, \r for carriage return, \f for vertical tab (form feed), and \\ for a single backslash. Any other backslash-character sequence represents just the character following the backslash, so \v represents just the letter v. Note that each string value occupies a single line of text. The newline that terminates that line is not included in the value, and newlines within the string must be represented explicitly using the \n notation. There are no quotes around the string, so quotes within the string can simply be represented with a quotation mark, or they may be escaped using backslash.
The last Ncode lines of the file contain the instructions of the machine, one instruction per line. The instructions are encoded as described in the definition of the Ace virtual machine, and are stored in the file as 8-digit, unsigned, zero padded hexadecimal numbers. Thus, each instruction is stored as 8 hex digits followed by a newline; there is no leading 0x. The instructions are stored in increasing address order, without gaps, starting at address 0.
At the start of executing the program in the virtual machine, the memory cells of the machine will be initialised with the values as given, in the order given above, so integers will appear in memory before floating-point numbers, and then the strings.
A halt instruction is appended to the code. Implementation-defined space is allocated, at the top of the initialised memory cells, to hold the stack. Note that values must be supplied for every cell of memory to be used in the program (except the stack); there is no mechanism for defining an uninitialized block of values.
The following is an example assembly language program, which adds up the floating-point numbers in an array and prints the total.
# Example assembler file INT max: 5 DOUBLE total: 0.0 numbers: [5] 0.35, 0.57, 0.76, 0.61, 0.83 STRING report: "The total is " newline: "\n" CODE icopy 0, r0 # counter icopy max, r1 # address of max icopy [r1], r1 # r1 is now 5 icopy total, r2 # address of total icopy numbers, r3 # address of numbers loop: icmp r0, r1 bge end dadd [r3], [r2], [r2] # add to total iadd 1, r0 # step counter iadd 1, r3 # step pointer bra loop end: icopy report, r5 # address of report sprint [r5] # print report dprint [r2] # print total icopy newline, r5 # address of newline sprint [r5] # print newline halt
The program in the previous section, when assembled, yields the following .avm file.
# avm file 1 6 2 17 5 0.0 0.35 0.57 0.76 0.61 0.83 The total is \n 31000080 31000081 31c10081 31010082 31020083 24808100 04400005 47c3c2c2 27400180 27400183 027ffffb 31070085 66c50000 46c20000 31080085 66c50000 00000000
Copyright © 2000 R. Pike and L. Patrick