First pass:
- collect information for numbers, registers and which instructions
contain label references
- encode all instructions that don't contain label references
- Set (temporary) addresses for each instruction
Second pass:
- Collect information about label references (address, offset, size)
- encode all instructions that contain label references
- Update (if necessary) addresses for each instruction
The second pass is iterated 10 times or until no instructions change
size, whichever comes first.
- Add a instruction value that contains the encoding, the address and a
flag to indicate if this instruction contains label references
- Add label value that contains an address
- Add reference value that contains offset, an absolute address and an
operand size
- define types for all value options in the union
- define accessor functions for all the values in the union
Before it kept track of a more specific node that referenced the symbol
in some way. Now it will only keep track of the actual label defining
statements. This is done to facilitate encoding. The encoder can now go
from a symbol name to the statement that defines the symbol.
Restructure the encoder to deal with this and pass the correct statement
to the symbol update function.
Add -Werror to the release configuration. Also add the release build as
a dependency of the make validate rule. The idea is that builds should
not pass validation if they have warnings but it shouldn't stop debug
builds during development from compiling while work is in progress.
When two identifiers follow eachother it could be two instruction
mnemonics or one instruction mnemonic and one operand. To fix this
TOKEN_NEWLINE has been reintroduced as a semantic token. The grammar has
been changed to allow empty statements and every instruction and
directive has to end in a newline. Labels do not have to end in a
newline.
In addition to updating the grammar, the implementation of tokenlist,
ast and parser has been updated to reflect these changes.
When a number has a suffix the lexer state didn't record the number of
characters consumed for this suffix. This made the lexer state be 2-3
characters short in its line location reporting until it encountered a
newline character. It did not otherwise corrupt the state of the lexer.
- Exposes all errors in the header file so any user of the api can test
for the specific error conditions
- Mark all static error pointers as const
- Move generic errors into error.h
- Name all errors err_modulename_* for errors that belong to a specific
module and err_* for generic errors.
Split most of the work off into make/base.mk and allow for easy wrappers
to be created around that that can build with different instrumentation
in their own build directory.
Create wrappers for the following:
- release build
- debug build
- afl++ fuzzing build
- static analysis with clang
- clang memory sanitizer
- clang address/undefined sanitizer
The buffer length len and the requested number of tokens n are mixed up
in an invalid comparison. This causes all valid requests for n < len
tokens to be denied and all invalid requests for n > len tokens to be
accepted. This may cause a buffer overflow if the caller requests more
characters than they provide space for.