Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retirement: Disable side effects on exception #493

Closed
wants to merge 2 commits into from

Conversation

Arusekk
Copy link
Contributor

@Arusekk Arusekk commented Oct 28, 2023

Split from #348 to make review easier and get somewhere in finite time.

In scope of this PR:

  • A. precommit methods accept an extra side_fx input
  • B. LSU respects the side_fx input
    • B'. does it wait for pending reads to finish? no idea! current impl skips processing an instruction immediately as soon as precommit is called; it might as well be utter nonsense, because I can't even reason about it, feedback appreciated
  • C. CSR respects the side_fx input (simply skips calling read/write handlers)
  • D. when side effects are disabled, FRAT gets partially restored (we might want to revisit the priorities of transactions accessing frat.rename; they won't starve for sure, but flushing might get faster, I don't know)
  • E. when an exception happens, the side effects get disabled
  • F. BONUS: regression tests in assembly that actually break when side_fx.eq(0) is commented out!
    • F'. tests for whether LSU disables its side fx correctly; feedback appreciated, because we might want to extend the assembly tests facility for testing memory contents (mightn't we?)

Out of scope:

  • Unit tests for LSU side_fx=0 itself without the rest of the core? I hope they are out of scope, otherwise help is desperately needed. Might actually help diagnosing B'.
  • Fiddling with the fetcher. It still ingests instructions after this PR, they just get ignored in the retirement phase. In the future we might want to just call stall() exported as a method.
  • Fixing CSRs so that they actually signal important (i.e. ISA-changing or w/e) side effects through a 'benign exception' instead of stalling the fetcher. But it would be nice to get it once exceptions are quasi-ready.
  • Fiddling with the jump/branch unit to signal mispredictions through benign exceptions. If ain't broke, don't fix it (yet) 😉

@tilk
Copy link
Member

tilk commented Oct 29, 2023

does it wait for pending reads to finish? no idea! current impl skips processing an instruction immediately as soon as precommit is called; it might as well be utter nonsense, because I can't even reason about it, feedback appreciated.

I believe that:

  • By setting result_ready in LSUDummy to 1, you allow get_result to run - but, as you are doing an combinational assignment, this only works if get_result is called in the same cycle!
  • Even though side effects were cancelled, if the read ends with an error, the exception will still end up reported. But the instruction was already committed!
  • We will need support for memory mapped I/O regions SOON. Reads to MMIO need to wait for precommit, just like writes. Otherwise we lose our fun LiteX demo, as LiteX-UART reads are side-effecting.

And, unrelated to your changes:

  • When I think about it, communication of execute seems to have the same problem as yours with get_result: if bus.request is not ready, the signal will be missed. Clearly, we need a simple way to safely communicate conditions between transactions. Maybe we already have it - this feels like a job for a Forwarder.

@lekcyjna123, @piotro888: what are your thoughts?

@tilk tilk added enhancement New feature or request microarch Involves the processor's microarchitecture labels Oct 29, 2023
@tilk tilk added this to the Implement machine mode ISA milestone Oct 29, 2023
@Arusekk
Copy link
Contributor Author

Arusekk commented Oct 29, 2023

this only works if get_result is called in the same cycle!

Actually precommit is a special method and therefore executes every cycle until the instruction gets committed, so I am pretty sure it also works. I also don't quite like it, we should probably make it fit transactron ecosystem better.

Even though side effects were cancelled, if the read ends with an error, the exception will still end up reported. But the instruction was already committed!

Right, this has a worse implication: if there are two failed reads in the pipeline, the fresh one takes precedence. A simple and versatile fix would be to ignore exception causes right in retirement stage if side_fx was set to 0 already.

  • TODO(self): implement that.

We will need support for memory mapped I/O regions SOON. Reads to MMIO need to wait for precommit, just like writes. Otherwise we lose our fun LiteX demo, as LiteX-UART reads are side-effecting.

This is actually counter-intuitively easier and already correct: side_fx can only be disabled on instruction commit boundary, so precommit either gets it disabled all the time, or enabled all the time: they can just be skipped entirely (and this is the case). I'm only worried by reads that have already started, because they can no longer be 'skipped'. And I'm afraid they currently cannot be skipped at all and must fully execute, because we don't have access to side_fx in exec stage, only in precommit (unless exec happens during precommit). (Is it a bad thing though? We've already given up on optimizing miss penalty at the expense of simplicity.) I'm still not sure if letting them finish execution is handled correctly.

This brings me to an idea of even dumber dummyLSU: assume all L/S is MMIO & only do it during precommit (horrible performance, but its code would be clean as never before; sounds like a Good First Issue for educational purposes). This would of course impair throughput, which is not nice, so it could only be useful as an extra option for debugging, not a replacement.

@lekcyjna123
Copy link
Contributor

If read started an execution it have to be fully executed, to don't left garbage in the data bus. I see two ways to take care about this:

  • add clearing of the bus - after we execute all instruction, whose side effects should be skipped, we can clear the bus buffers to remove garbage
  • check in LSU if we started execution (internal.instr_ready == 1) if yes we have to wait till the end of instruction. If not we can block the update and block the instruction execution by setting rp_s1 and rp_s2 to non zero. Execution of get_result after blocking execution is safe.

coreblocks/lsu/dummyLsu.py Show resolved Hide resolved
coreblocks/lsu/dummyLsu.py Show resolved Hide resolved

rp_freed = Signal(self.gen_params.phys_regs_bits)
with m.If(side_fx):
m.d.comb += rp_freed.eq(rat_out.old_rp_dst)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be assigned in av_comb.

@tilk
Copy link
Member

tilk commented Oct 29, 2023

Actually precommit is a special method and therefore executes every cycle until the instruction gets committed, so I am pretty sure it also works. I also don't quite like it, we should probably make it fit transactron ecosystem better.

I believe you are right - this is probably just fine.

I'm only worried by reads that have already started, because they can no longer be 'skipped'. And I'm afraid they currently cannot be skipped at all and must fully execute, because we don't have access to side_fx in exec stage, only in precommit (unless exec happens during precommit).

This is exactly the problem I'm talking about - the code as it is written allows MMIO reads to go out to the bus before we know they should actually happen. And, as I said, there is a simple way to correct this: just make the MMIO reads (but not memory reads) wait for precommit, just like all writes do currently.

with m.If(rat_out.old_rp_dst): # don't put rp0 to free list - reserved to no-return instructions
self.free_rf_put(m, rat_out.old_rp_dst)
with m.If(rp_freed): # don't put rp0 to free list - reserved to no-return instructions
self.free_rf_put(m, rp_freed)

self.instret_csr.increment(m)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instret CSR should only be incremented on side_fx instructions (and without rob_entry.exception instruction too)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and that is additional source of possible side effects and should be considered in future FUs

@lekcyjna123
Copy link
Contributor

lekcyjna123 commented Oct 30, 2023

I checked the reads and it wouldn't work:

  • start a read (send a request on the bus)
  • precommit with side_fx==0 cause result_ready==1
  • this cause execution of get_result which indicates internal.get_result_ack==1 and reserved==0
  • internal.get_result_ack==1 cause change of state machine in LSUDummyInternals to Start
  • reserved ==0 so we can select and insert new instruction
  • new load is inserted and passed to execution
  • data from first load are ready on the bus and send as output from second load

@lekcyjna123
Copy link
Contributor

Short and dirty fix:

        @def_method(m, self.precommit)
        def _(rob_id: Value, side_fx: Value):
            with m.If(~side_fx):
                m.d.comb += result_ready.eq(internal.fsm.state=="Start")

and

        instr_ready = (
            (self.current_instr.rp_s1 == 0)
            & (self.current_instr.rp_s2 == 0)
            & self.current_instr.valid
            & ~self.result_ready
            & self.side_fx
        )

So instr is ready only if side_fx==1 and we can not run get_ready method as long as the state machine is not in "Start" state/

@tilk
Copy link
Member

tilk commented Oct 30, 2023

So instr is ready only if side_fx==1 and we can not run get_ready method as long as the state machine is not in "Start" state/

Isn't this equivalent to removing the is_load condition in the Start state? Also, there is no get_ready method, there is get_result, but it depends only on result_ready.

Geez, DummyLSU is still a giant liability.

@piotro888
Copy link
Member

Shouldn't we also check

with m.If(~side_fx & current_instr.valid & (rob_id == current_instr.rob_id)):
    m.d.comb += result_ready.eq(internal.fsm.state=="Start")

?
Otherwise get_result would be ready on each precommit call (to different FUs!) and output illegal entries to pipeline in each cycle. Or am I missing something?

@tilk
Copy link
Member

tilk commented Oct 31, 2023

I'm assuming the following invariant: once a precommit is sent for a given rob_id, the value of side_fx can't suddenly change while waiting for commit.

I see two solutions: one quick and dirty (and probably fine for now), and one which requires some additional thought.

The simple one: just remove the | is_load condition from the if in the Start state. This ensures that, if side_fx is False, no requests are sent nor are any exceptions reported, and everything works just fine. Performance is affected, but we can fix it later. (If this option allows @Arusekk to continue working faster, I'm all for it.)

The complex one: vary the behavior depending on the current state of processing the request. If the request was not sent yet, ensure that it will not be sent (possibly by modifying the condition in the Start state). If the request was sent, let it run to completion (so don't assign result_ready locally, wait until it's set by the internals module). I don't know if something needs to be done about exception reporting: this would be encountering an exception while processing a previous exception, we need some strategy for that, as this will not be the only place where something like this could happen. I would much prefer having some general solution instead of special-casing in each module which can report exceptions. @piotro888, what are your thoughts?

@piotro888
Copy link
Member

I don't know if something needs to be done about exception reporting: this would be encountering an exception while processing a previous exception, we need some strategy for that, as this will not be the only place where something like this could happen.

I think it will work just fine. ExceptionCauseRegister is only updated for 'older' instructions (based on rob_id), so it will be updated to match the correct (first retired) instruction. If multiple exceptions will happen, we only know about it globally (via side_fx) after instruction with exception reaches the retirement - so this is correct, and would not change other behaviour.

Only other special effect of exceptions is setting of exception bit on, but it should be easily ignored in retirement if side_fx are disabled (and it looks like it is already done)

And I noticed one thing to remember for next part of exceptions integration - we should call clear() on ExceptionCauseRegister after flush (at the same moment when restoring fetch) to correctly register next exceptions and ignore all that happened when side_fx were disabled.

The complex one:

I'm obviously for the complex solution, and I don't think it will be that difficult to implement. But obviously, as always, this can be done later in other PR (with simple solution workaround for now). And I think it is the right thing to do now - to not delay this PR and exception development.

@lekcyjna123
Copy link
Contributor

The complex one: vary the behavior depending on the current state of processing the request. If the request was not sent yet, ensure that it will not be sent (possibly by modifying the condition in the Start state). If the request was sent, let it run to completion (so don't assign result_ready locally, wait until it's set by the internals module).

This was an idea behind my changes proposition. Adding If in precommit to set result_ready only on fsm.state=="Start" should guarantee that every started request will run to completion. Adding self.side_fx to instr_ready guarantee that no new request will be started if side effects are turned off.

Isn't this equivalent to removing the is_load condition in the Start state?

No, it isn't. I my proposition we still can run loads before precommiting the load. We can speculate about value of side_fx based on the earlier instructions, if it is 1 then we start a load request immediately. Of course our speculation can fail, so in such case we simply wait till the request will be processed by memory.

Copy link
Member

@piotro888 piotro888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left comments from meeting. Otherwise looks good


self.rf_free(m, rat_out.old_rp_dst)
rp_freed = Signal(self.gen_params.phys_regs_bits)
with m.If(side_fx):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First instruction with rob_entry.exception is incorrectly retired here (because of synchronous setting of side_fx - takes effect at next cycle). Breaks register values (RAT) and insret CSR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add "MMIO only" parameter to LSU. It could be easly done by disabling is_load part in with m.If(instr_ready & (self.execute | is_load)):

@tilk
Copy link
Member

tilk commented Nov 15, 2023

Made some changes on a branch. In particular, I switched to the new, not-yet-merged LSU.

@tilk
Copy link
Member

tilk commented Nov 18, 2023

This was merged through #512.

@tilk tilk closed this Nov 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request microarch Involves the processor's microarchitecture
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants