-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sidecar: after being reset, the SP seems to struggle with I2C (at least to the VPD) #1955
Comments
The first wave is actually not a glitch, in the sense that it's intentional. That's the driver's "bus unstick" cargo cult sequence, where it attempts to clock out 9 bits to reset things. It's not really supposed to do that unless the bus is stuck, though, so I guess it thought it saw an error? The second thing is definitely a glitch, and I suspect it's the existing problem #1824. Not managing pin state correctly when switching pads, basically. The ten second delay is surprising. If we can't find a software interlock that's blocking access (and I can't think why there'd be one), I'm starting to suspect that the CPU is starved during the FPGA loads. |
I suspect the |
I explicitly have a Here I reset, do the vpd interaction, then dump tasks. I don't think this is interesting?
|
This time I
|
Link to another dump with a T+4 second offset. In both of these dumps |
The auxflash code appears to have a design flaw that causes Sidecar boot to get slower and slower as more updates are applied to auxflash. We're testing a fix in #1959. I'm waiting to confirm logic analyzer traces before I get too excited about it. |
Capture uploaded here No more 11.62 second gap! |
The auxflash server was computing a SHA3 of every potentially occupied slot in the QSPI flash, only to compare it to _the stored SHA3_ and then compare it to _the expected SHA3_ and then throw it away. This has been causing Sidecar startup to be linear in the number of valid images that have ever been flashed to auxflash, up to 16. This change rearranges the logic, at least for startup. For each chunk, we now see if it even claims to have the right SHA. Only then do we compute the actual SHA to validate. This reduces an 11.6s delay observed on one Sidecar to just over 1s, and knocks 6s off the boot (something else is still delaying for 5s). I haven't changed the behavior of the externally exposed `read_slot_checksum` operation, because it has no documentation and I can't figure out what it's used for, so I'm not sure I would get the semantics right or know how to test it. Fixes #1955.
The auxflash server was computing a SHA3 of every potentially occupied slot in the QSPI flash, only to compare it to _the stored SHA3_ and then compare it to _the expected SHA3_ and then throw it away. This has been causing Sidecar startup to be linear in the number of valid images that have ever been flashed to auxflash, up to 16. This change rearranges the logic, at least for startup. For each chunk, we now see if it even claims to have the right SHA. Only then do we compute the actual SHA to validate. This reduces an 11.6s delay observed on one Sidecar to just over 1s, and knocks 6s off the boot (something else is still delaying for 5s). I haven't changed the behavior of the externally exposed `read_slot_checksum` operation, because it has no documentation and I can't figure out what it's used for, so I'm not sure I would get the semantics right or know how to test it. Fixes #1955.
We came across this in manufacturing and had discussed it in chat but were basically stumped without being able to get traces from the bus. Well, we've got traces now!
First off, I attempt to reproduce by just running the
sidecar_fru.sh
flow on the station:The capture is stored here. Despite what the output says, it actually seems to work? I'd also like to highlight some glitches:
Next, I captured what it looked like to reset the SP and try to write to its VPD immediately. This fails and we never see the write actually happen. The capture is here.
Finally, I do a reset, sleep for 10 seconds, and then write to the VPD. This all happens successfully. Experimentally it takes a sleep of something like 5-7 seconds for this to work! The capture is here.
cc @bcantrill since you were interested in this
The text was updated successfully, but these errors were encountered: