-
-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bytecode further work tracking issue #148
Comments
I've started work on C/BASIC support, or at least support for the runtime libraries. I'm able to compile a very simple BASIC program (LED blinker) now. I think we'll need a helper COG for printf and inline assembly, but that should only be pulled in if actually required. |
Running inline ASM in another cog seems not very useful - takes a cog and would acess that cog's registers. I'd just not allow it with the ROM interpreter. |
You can also get away without a cog for serial if you can live with low baud rate (19200 seems to work) |
Not sure how to handle variadic functions though. Dynamic parameter count is easy, but then all the locals shift around. I think the locals would need to be put before the parameters and the caller needs to make space for them on the stack (presumably by using a REG_MODIFY to increment DCURR?) |
Also, how'd we include the interpreter code (when not using the ROM interpreter / on P2)? I guess the same kinda thing that happens with the syscode, though I have a very strong distaste for manual updating of intermediate files, i.e. it not automatically updating the header file from the source file (and IDK what tool is used to generate it to begin with). I think inline ASM can be used to include a file directly, though that'll only work for compilers with GAS-like inline asm. I think as is flexspin only builds on GCC/LLVM, anyways, so that wouldn't be an issue? |
The .spin files in the sys/ directory are converted to binary blobs with xxd automatically, and we can do something similar for the interpreter (although it will have to be compiled first, but we can do the compilation with flexspin itself). The .spin.h files are checked in to github to help people bootstrap and to make it easier to build on Windows (not sure if xxd is widely available there, it's a standard Unix tool though). |
Could you, like, not close the tracking issue please? |
Also,
xxd seems to be included with Git for Windows, so it seems unlikely that it'd be not installed, though it isn't on PATH (Git Bash seems to add it though. |
Sorry, clicked on the wrong button |
For multiple return values, we could perhaps use the DIRB, INB, and OUTB registers, e.g. the first return value goes on the stack as usual, second one goes in DIRB, third one in INB, and fourth one in OUTB. For plain Spin programs this won't be an issue (they don't have multiple returns) but it will allow us to compile Spin2, BASIC, and C. |
Yeah, that seems like a good workaround. Though for Spin2 they'll still need to be allocated on the stack for indexing and such. The same add-to-DCURR thing that'll need to happen for varargs, I guess. |
Actually, results are zero-initialized in Spin2, too, right? so it'd have to be bunch of zero constants before the parameter expressions. That's smaller than adding to DCURR for small counts, anyways. |
There's a new test script, Test/runtests_bc.sh, for testing byte code. I've disabled the C and BASIC tests (we know they won't work). Quite a few of the Spin tests do work, and one of them showed a bug that I've fixed (converting division into right shift fails for negative numbers that are not powers of 2). Of the remaining failing tests, I think the big missing features are multiple assignment (exec03) and method/object pointers (exec04). exec11 is a timing test so its failure is entirely expected (and I should add an #ifdef for bytecode output so we can ignore it). |
multi-assign between variables should be easy, I think? Unless there's a silly Spin2 quirk that makes it stupid, that is. Calling through pointers is a problem with the P1 ROM interpreter. The obvious idea is to create a dummy header entry and rewrite it's offset before the call, but
For object pointers, I think just writing them to VBASE between the call parameters and the call opcode would work? That'd be thread-safe and probably less opcodes. Would still need a dummy header entry with zero offset. C-style Function pointers, uhhhh, idk. Again, need to know the local size to call a function. I guess make it like a Spin2 method pointer, but with PBASE instead of VBASE Spin2-style method pointers would need quite a bit of helper code to work on P1 and will require implementing Spin2-style RTTI (wherein the first VAR long of an object gets PBASE written to it when taking a method pointer). There'll need to be a special helper function to handle calls to a method pointer that
I think the pointer will need to be passed in a register, because it simplifies the helper function a lot. I think this'd work: PUB __methodptr_helper
' doesn't declare parameters, but will be called with whatever the target function expects
__interp_vbase := inb[0..15]
__interp_pbase := long[__interp_vbase][0] ' There is no syntax for reading the variable at a given offset. I guess we could just optimize this workaround.
result := long[__interp_pbase][inb[20..31]] ' This is where Spin2 keeps the method ID, but for P1 it'd be more sensible to be at 24..31
__interp_dcurr += result.word[1]~ ' allocate locals
inb := 0
__interp_pcurr := result.word[0]~ + __interp_pbase ' jump into function
' Same thing but for function pointer
PUB __funcptr_helper
__interp_pbase := inb[0..15]
result := long[__interp_pbase][inb[20..31]]
__interp_dcurr += result.word[1]~ ' allocate locals
inb := 0
__interp_pcurr := result.word[0]~ + __interp_pbase ' jump into function
|
Also, I think:tm: the proper fix for the division to right-shift would be change the |
Dave Hein created a method pointer class in Spin1 that used 4 words to store pcurr, vbase, stack variable offset, and pbase. This aligns pretty well with how the assembly code handles it (it uses 2 longs instead of 4 words, but same amount of space). |
The division right shift issue is a problem if the left hand side is negative. That is, -1 SHR 1 is still -1, whereas -1 / 2 should give 0. In the assembly for x / 2^N we generate something like (x < 0) ? -(ABS(x)>>N) : (x>>N), but I'm not sure if it's worth doing that for bytecode? I guess it's quite a bit faster than division, but it'll consume a lot more space, too. |
There's problems with that approach though
|
Ahh, of course. Doing that workaround is likely not (much) faster than an actual divide and doesn't work as an assignment, so yeah, probably just remove it. Keep the no-op optimization though. |
Here's another idea for function/method pointers: they could always point at the descriptor for the function inside the object (the one with the dcurr and pcurr offsets). Immediately after any function whose address was taken we would insert a dummy function descriptor which points back at pbase. The function pointer itself would then fit in 16 bits, and for method pointers we'd need an additional 16 bits for vbase. |
That could work well, yeah. Any major advantage over the Spin2-like approach though? |
Unrelatedly, I think I'm going to look into jump table CASE now. Saying that so we don't step on our feet. |
Sounds good, thanks. I'm going to take a stab at function/method pointers, so as to get C/BASIC printing to work (the file handles use a table of pointers). |
Regarding the zero-extend operator, it'd be useful to optimize constant zero-extend into an AND. Can you add that to OptimizeOperator? Also, setting |
Also, I think it's just implemented wrong to begin with, shouldn't the shift amount be |
Also, the current implementation for jump table generation relies on arbitrary GOTO and there being a variable holding the expression result. The former needs to be implemented for C/BASIC, anyways, the latter I'll see if I can work around. |
RE: the zeroext thing As I said, the zeroext to AND transform should really be in |
I've got the jump tables implemented, will finish up tomorrow. Is in my BC-casefast branch. Main problem RN is that the jump table index needs unsigned min/max to be computed. I've built an optimization that converts unsigned ops to signed ones where it makes sense, but right now that only works if the first case is zero (because otherwise a subtraction is inserted) and the expression is explicitly masked (#157 blocks it from detecting small unsigned types). But I think the actual solution is to insert an additional OTHER case at the beginning of a table, such that an expression like The unsigned->signed optimization will be neat for C though, I guess. |
Looking into multi-assign, wondering what's up with the eval order. PASM backend seems to eval both LHS and RHS from left to right, whereas the """intuitive""" order would be to eval the LHR right to left. Not sure what PNut/proptool do, listings are garbo and I don't have P2 HW hooked up right now. |
I'm not sure what PNut does, but I agree, it makes sense to evaluate the RHS in the same order as function parameters (so left to right) and hence do the LHS in the opposite order in order to pop things off the stack. Probably the PASM should be changed to do this too. |
Yours looks a bit nicer, so close my PR if you want. Slight nitpick: In |
Not sure if handling the register passing at the IR->bytecode step is a good idea though. Brings a lot of complexity there, when you can do it just as well in the code that emits the IR (that needs to know the result counts, anyways). |
Will have to check how Pnut handles multireturn though, maybe that needs the result count, anyways |
I think it needs to go in IR->bytecode because it's only something we do in the Spin1 ROM (in P2 we definitely won't be pushing/popping registers for the extra results, although I'm not sure exactly how it gets handled... maybe the RETURN and/or CALL encodes the number of results). I've got something checked in to the branch now that works well enough to pass the exec03 multiple assignment tests, although it's still incomplete (needs more error checking, for one thing!) |
That's not really a reason though, the higher level stuff can check gl_interp_kind, too, you know. Can also spot some issues:
|
I guess it does make sense to move the register push/pop up to outbc.c, because then bcir can see it and optimize it. So I've done that, thanks. |
I guess I don't understand the popResults/asStatement dichotomy. I think asStatement should always be false for EXPRLISTs, because we really do need to push the results onto the stack and explicitly pop them. I've probably misunderstood something there. |
basically,
In what way? Spin2 abort only allows a single parameter. Is multi-value ABORT an extension that exists? |
Perhaps interesting: new GEAR version that works with flexspin's bytecode output (previously, reverse math on MODIFY opcodes was unimplemented): https://github.com/davispuh/gear-emu/releases/tag/v21.06.01 |
Thanks. BTW, I'm getting a very strange compilation error in C programs with break inside loops:
gives me an error about ENDCASE outside of a CASE. I'm guessing the "break" is being mangled by mistake into an ENDCASE by some of the case handling code? |
Yeah, no idea why that'd happen |
Never mind, that's probably my fault... C doesn't have seperate keywords for "end case" and "break out of loop" so the AST_ENDCASE gets generated for both. I'll add some code to work around that. |
hmm, alloca is a tricky one. Can't just go above the current stack, because that'd break if the alloca is in an expression. Moving the temporaries up and allocating the memory inbetween the variables and temporaries would work for that, but requires a longmove and breaks if there's more than one alloca. If there was a high level transform moving the alloca into a standalone assignment expression if it isn't already, I think the top-of-stack thing would work, but the allocation size will need to be pushed again after the allocation so it can be popped off when out of scope, which means that the allocation amount needs to be in a variable (so we don't need to eval+pad it twice). I guess the variable that will get the pointer can be temporarily reused for that |
A global pass to pull out the allocas makes sense. I'll open another bug for it, alloca in general is a mess (AST_SCOPE doesn't really work right even in PASM). |
BTW, I've implemented some very primitive member variable code on the test/bytecode_objptrs branch. |
looking good |
Actually, better idea: after the allocation, the first long of it still contains the allocation amount that was popped off to add onto DCURR, so we can just push that. |
Are you looking into object pointers (i.e. C struct pointers) yet? Because if not, I'll give it a look. |
No, I haven't had a chance to look at object / struct pointers yet. I'm working right now on trying to fix the assignment bug. So it'd be great if you could look at object pointers. I think pointers to data shouldn't be too bad? Internally I think a pointer to an object will generally just be a pointer to the object's VBASE, so "p->x" should translate into something like "long[p + offsetof(x)]" (where long gets replaced by byte/word depending on the type of x). Invoking methods on pointers will be a little bit trickier... for "p->foo()" we know the type info of p at compile time, so we can find all the object data. VBASE will come from p itself; PBASE comes from the object type info, and DCURR_OFFSET and PCURR come from the function info for "foo". I guess we could set up the 4 registers directly with bytecode commands like __interp_vbase = p? |
Ok, will look at it. For calls, I think the best method is to add a dummy entry for the used type to the object header with a zero offset and then just set VBASE after pushing the last parameter (and then do the normal function call opcode). Since the current VBASE is pushed by the anchor opcode, it will be correct again on return. |
The trouble with that though is figuring out which types are called through pointers before compiling the object. |
Also, accessing variables through pointers already works, it seems, though suboptimal because it indeed does |
Unless you're already working on it, I was planning to look at method pointers again. My scheme (re-using some of the PASM code to allow for 8 byte method pointers) is probably suboptimal, but I think it will be easy to implement and will at least let us get some more of the BASIC tests working. |
No, not doing anything RN. Not sure if hacking the Spin2 kinda scheme onto the P1 interpreter is much better, either, so go ahead I guess. |
Just took a look at your method pointer branch. '
' up to 8 parameters works OK; more than that and we could
' run into problems
'
pri __call_methodptr(a,b,c,d,e,f,g,h) | p, off, pc
p := INB
__interp_vbase := long[p]
off := word[p+6]<<2 ' function offset as words
p := word[p+4] ' new pbase
__interp_pbase := p
p += off
pc := word[p] + __interp_pbase ' new pc
'off := word[p][1] ' number of locals
'off -= 12 ' we have 2 locals already
'if off > 0
' __interp_dcurr += (off)
__interp_pcurr := pc I think this could be written better: pri __call_methodptr
__interp_vbase := long[INB]
result := word[INB+6]<<1 ' function offset as words
__interp_pbase := word[INB+4] ' new pbase
__interp_dcurr += word[__interp_pbase][result+1] ' number of locals
__interp_pcurr := word[__interp_pbase][result~] + __interp_pbase ' new pc There should probably also be an optimization for memops where the baseExpr is a read from a base register.... |
Thanks, I thought this could be improved but hadn't really worked it through -- your version is much better. |
Relatedly, I think REG_WRITE to PCURR could be treated as a terminal op to get rid of the dead RETURN_PLAIN after it |
C's implicit top module is only ever instantiated once, right? Because currently calling a method on a global object in C has to go through the objpointer path, which is kinda less than ideal. So I think there could be a special optimization to just add global objects into the object table so they can be called natively.... |
Here's all the stuff that'd be nice to do with the bytecode output. Mostly just so I don't forget when I come back to it.
x + (-1)
intox - 1
x == 0
andx <> 0
conditionsrepeat while
(currently compiles 1 unnecessary jump. Could be a peephole)The text was updated successfully, but these errors were encountered: