First pass:
- collect information for numbers, registers and which instructions
contain label references
- encode all instructions that don't contain label references
- Set (temporary) addresses for each instruction
Second pass:
- Collect information about label references (address, offset, size)
- encode all instructions that contain label references
- Update (if necessary) addresses for each instruction
The second pass is iterated 10 times or until no instructions change
size, whichever comes first.
- Add a instruction value that contains the encoding, the address and a
flag to indicate if this instruction contains label references
- Add label value that contains an address
- Add reference value that contains offset, an absolute address and an
operand size
- define types for all value options in the union
- define accessor functions for all the values in the union
Before it kept track of a more specific node that referenced the symbol
in some way. Now it will only keep track of the actual label defining
statements. This is done to facilitate encoding. The encoder can now go
from a symbol name to the statement that defines the symbol.
Restructure the encoder to deal with this and pass the correct statement
to the symbol update function.
When two identifiers follow eachother it could be two instruction
mnemonics or one instruction mnemonic and one operand. To fix this
TOKEN_NEWLINE has been reintroduced as a semantic token. The grammar has
been changed to allow empty statements and every instruction and
directive has to end in a newline. Labels do not have to end in a
newline.
In addition to updating the grammar, the implementation of tokenlist,
ast and parser has been updated to reflect these changes.
When a number has a suffix the lexer state didn't record the number of
characters consumed for this suffix. This made the lexer state be 2-3
characters short in its line location reporting until it encountered a
newline character. It did not otherwise corrupt the state of the lexer.
- Exposes all errors in the header file so any user of the api can test
for the specific error conditions
- Mark all static error pointers as const
- Move generic errors into error.h
- Name all errors err_modulename_* for errors that belong to a specific
module and err_* for generic errors.
The buffer length len and the requested number of tokens n are mixed up
in an invalid comparison. This causes all valid requests for n < len
tokens to be denied and all invalid requests for n > len tokens to be
accepted. This may cause a buffer overflow if the caller requests more
characters than they provide space for.
The linked list is doubly linked so the parser can look forward into it
and error reporting can look backward.
This commmit also reworks main to use the tokenlist instead of dealing
with the lexer manually.