Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More on 0xSCA #68

Open
mleise opened this issue May 10, 2012 · 2 comments
Open

More on 0xSCA #68

mleise opened this issue May 10, 2012 · 2 comments

Comments

@mleise
Copy link

mleise commented May 10, 2012

First of all, thank you J. Kuijpers and M. Beermann for putting up a formal specification of the assembly source code file format. I believe standards are necessary and have to match the state of the art. Here is what I found, reading through the 0xSCA. I hope that it helps improve the spec:

4.3. Directives

"For the purpose of this document, a dot (.) is used to describe preprocessor directives."

"In computer science, a preprocessor is a program that processes its input data to produce output that is used as input to another program." - Wikipedia
The attempt to make a distinction from C preprocessor macros (as stated in this spec) led to the . syntax for both preprocessor and assembler directives. This makes it impossible for a preprocessor to tell - without a blacklist - which directives it should handle and which ones it should leave for the assembler. In order to reduce confusion of both terms, make assembler implementations simpler and to allow for actual preprocessors I propose to change the syntax of preprocessor directives back. Assembler directives are:

● .org (Preprocessor can't do the required padding)
● .align (Same as above)
● .dw/.dp/.fill/.ascii <...> (Generate binary data)
● .error (Defined as an assembler directive)
● .echo (Defined as an assembler directive)
● .equ (#define is for the preprocessor, .equ adds a constant to the assembler's symbol table)

Preprocessor directives are:

● #include, #incbin, #def/#define, #undef, #macro, #repeat, #if.../#else..

As indicated before, the mix of both concepts makes it harder to break down the complexity for anyone implementing an assembler or preprocessor. Knowing that all preprocessor directives can be removed before the assembly stage helps a lot here. This is especially true for #define and #undef:

  SET PC, foobar    ; Jumps to label
#define foobar 1234
foobar:             ; Easily a bug in assembler implementations:
  SET PC, foobar    ; needs to error out on duplicate symbol definition
#undef foobar
  SET PC, foobar    ; Is foobar an undefined symbol ?

Another example I want to show is why .equ is an assembler macro, not a preprocessor macro:

.equ sushi vram+32  ; Second line of video RAM. "Sushi" goes into the assembler symbol table.
.org 0x8000
vram:               ; just a label

Note: Macros are preprocessor directives as soon as they contain preprocessor directives that are evaluated at macro expansion time.

4.3.3.1. Ascii Literal Flags

"For the purpose of determining string length, this zero will add quantity of zero octets added divided by the octet width of each character."

You lost me at "this zero will add quantity". That said the paragraphs before were all sane and logical, but this one needs some rephrasing. I couldn't figure out what how the string length will be affected or why "the octet width" would be anything other than '8'. Also is this a typo?: "Flags w and x are incompatible." I can't find the 'w' anywhere. Seems like it got renamed to 'x'.

4.3.7. Conditionals

"If expression consists of a single constant value, then expression = 1 MUST be assumed."

That goes against what I would expect from programming-languages, where if (expression) is usually interpreted as if (expression != 0).
Right now I read it as if (expression == 1), could you clarify if that was intended?

8.2. Preprocessor

"A preprocessor must accept every directive with a dot (.) or a number sign (#) prefix. While Notch seems to prefer the latter, the former is much more common among todays assemblers."

For the reason stated above, I share Notch's opinion. Merging the preprocessing into the assembler should not be taken so lightly. Modern assemblers have evolved quite a bit to get there and while it looks more concise with only the (.), two symbols make it much clearer which directive is evaluated in the preprocessing phase, and which in the assembly stage.

Thanks for reading!

P.S.: Have you thought about supporting some form of $ feature for local labels?

@mleise
Copy link
Author

mleise commented May 10, 2012

The C preprocessor turns:

#define FOO 123
FOO:
  SET PC, FOO

into

123:
  SET PC, 123

So my example from above:

  SET PC, foobar
#define foobar 1234
foobar:
  SET PC, foobar
#undef foobar
  SET PC, foobar

in a real preprocessor would likely end up so:

  SET PC, foobar
1234:
  SET PC, 1234
  SET PC, foobar

So this is expected from people using the C preprocessor directives. It also vastly differs from .equ foobar, 1234 which wouldn't trigger any search and replace orgy on the source code.

@0xabad1dea
Copy link
Member

"P.S.: Have you thought about supporting some form of $ feature for local labels?"

I strongly prefer local labels with names as opposed to the anonymous $ (I assume this is what you meant). In particular someone asked us to implement $+ and $- notation which I think is just way too fragile and difficult for a third party to read. $ all by itself (ie, no skipping over multiple successive $'s) may be acceptable if enough people want it.

Plus ten thousand points for feedback :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants