-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removes mutex locks from most memory accesses #120
base: master
Are you sure you want to change the base?
Conversation
* dumps layout of used VPM per kernel * rewrites Emulator to handle VPM configuration per QPU * fixes bug in eliminaion of bit operations * fixes bug mapping IR operations to machine code * fixed bug mapping volatile parameters to read-only parameters * Emulator now tracks TMU read per TMU See #113
Memory access is now mapped in following steps: * Determine prefered and fall-back lowering type per memory area * Check whether lowering type can be applied, reserve resources * Map all memory access to specified lowering level Also disables combining of VPM/DMA writes/reads for now. See #113 Effects (test-emulator, last 2 commits): Instructions: 45160 to 45779 (+1%) Cycles: 659247 to 661193 (+0.2%) Mutex waits: 282551 to 281459 (-0.3%)
This changes allows us to remove mutex locks from "direct" memory access. See #113
* fixes memory association for some special cases * fixes elimination of moves for VPM usage without mutex Effects (test-emulator, last 2 commits): Instructions: 45779 to 52510 (+14%) Cycles: 661193 to 644891 (-2%) Mutex waits: 281459 to 0 Total time (in ms): ~57192 to 57232 (+-0%)
This version will only combine writing of same setup values, where possible. The full version is also removed, since it will anyway become obsolete with VPM cached memory (see #113). Effects (test-emulator): Instructions: 52511 to 49793 (-5%) Cycles: 644891 to 641680 (-0.5%) Total time (in ms): 62869 to 58456 (-7%)
I am not sure why we can remove all of mutex locks. You mean, the responsibility to avoid the resource conflicts is moved to the users? |
I am actually also not 100% sure, but I did not meet any test so far, which failed with the mutices removed, even when I explicitly forced race-conditions between the QPUs.
I thought the same, but I might have been wrong. My old theory/understanding of the VPM: My new theory:
No, VC4C would still synchronize, where required (i.e. when accessing memory regions), but most accesses to the VPM is restructured so the QPUs do not access the same VPM ares and therefore do not need to lock. |
Sorry to be late. I finished my vacation. From now, I can response quickly. I tried some program to check your opinion is correct.
From the expement, I think this modification is a dangerous |
* Adds BasicBLock#to_string printing the block label name * Fixes error with empty CFG, fixes crash in #98 * Fixes memory access error copying buffers for emulation tests * Fixes a few intrisic precalculations * Adds tests for intrinsics precalculation * Adds support for reading/writing multiple rows from/to RAM, taken from #120 * Adds function to dump used VPM layout to debug log
Memory access is now mapped in following steps: * Determine preferred and fall-back lowering type per memory area (e.g. register, VPM, TMU, DMA) * Check whether lowering type can be applied, reserve resources * Map all memory access to specified lowering level Copies useful changes from #120 without changing the semantics.
See #113
Effects (test_emulator, whole PR):
Tests:
To be fixed:
clpeak results:
Compared to the last results (from Nov. 2017), this is: