grammar doesn't work for single instruction zero operands #16

New Issue

omicron · 2025-04-16T00:09:00Z

omicron commented

2025-04-16 00:09:00 +00:00

the grammar can't tell the difference between instructions with a single argument or an instruction with zero arguments followed by an instruction

.export _start

_start:
    syscall
    ret

NODE_PROGRAM
  NODE_DIRECTIVE
    NODE_DOT "."
    NODE_EXPORT_DIRECTIVE
      NODE_EXPORT "export"
      NODE_IDENTIFIER "_start"
  NODE_LABEL
    NODE_IDENTIFIER "_start"
    NODE_COLON ":"
  NODE_INSTRUCTION
    NODE_IDENTIFIER "syscall"
    NODE_OPERANDS
      NODE_IMMEDIATE
        NODE_LABEL_REFERENCE "ret"

This is problematic, I'm not sure what the best fix is. Probably the solution is to remove newlines from trivia tokens and require that each instruction end with a newline.

While this would be an easy solution, there's a question on what to do with empty lines then. They would either need to be collected in a NODE_NEWLINE or (preferably) we would be able to skip them somehow.

Currently the entire parser operates under the assumption that a parse node is always returned if there is no error in the result. Skipping nodes could work by returning a result that has no error and no parse node. All of the combinators would need to be able to work with that though.

Then there is the question about what to do with newlines in future .data directives:

my_bytes:
.data8 0,  1,  2,  3,  4,  5,  6,  7,
       8,  9, 10, 11, 12, 13, 14, 15

initially the plan was to allow big tables to be defined like this across newlines. I suppose it is best to also force these to be on a single line.

the grammar can't tell the difference between instructions with a single argument or an instruction with zero arguments followed by an instruction ```asm .export _start _start: syscall ret ``` ``` NODE_PROGRAM NODE_DIRECTIVE NODE_DOT "." NODE_EXPORT_DIRECTIVE NODE_EXPORT "export" NODE_IDENTIFIER "_start" NODE_LABEL NODE_IDENTIFIER "_start" NODE_COLON ":" NODE_INSTRUCTION NODE_IDENTIFIER "syscall" NODE_OPERANDS NODE_IMMEDIATE NODE_LABEL_REFERENCE "ret" ``` This is problematic, I'm not sure what the best fix is. Probably the solution is to remove newlines from trivia tokens and require that each instruction end with a newline. While this would be an easy solution, there's a question on what to do with empty lines then. They would either need to be collected in a NODE_NEWLINE or (preferably) we would be able to skip them somehow. Currently the entire parser operates under the assumption that a parse node is _always_ returned if there is no error in the result. Skipping nodes could work by returning a result that has no error and no parse node. All of the combinators would need to be able to work with that though. Then there is the question about what to do with newlines in future .data directives: ```asm my_bytes: .data8 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 ``` initially the plan was to allow big tables to be defined like this across newlines. I suppose it is best to also force these to be on a single line.

omicron added the bug label 2025-04-16 00:09:00 +00:00

omicron referenced this issue

2025-04-16 11:32:19 +00:00

Fix zero operand parser bugs #17

omicron commented

2025-04-16 11:33:19 +00:00

fixed by #17

omicron closed this issue

2025-04-16 11:33:19 +00:00

Sign in to join this conversation.