-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace bitwise operands with Structs for performance #58
Conversation
The bit shifting and masking is expensive on ARM64 for some reason. The unions seem to greatly reduce the perfomance hit of these common calls.
The values for offset0 and offset1 were coming out to 63 when they should be no more than 3. I think the devide should have beena modulus? I wrote out the code with more vars to figure ouit what was going on
looking at the offical shamusworld repo the division was moved after the 0xFF checks
GPU_RUNNING running macro was pretty slow on ARM for some reason. Bitswise structs are faster in my testing
Signed-off-by: Joe Mattiello <[email protected]>
Signed-off-by: Joe Mattiello <[email protected]>
Signed-off-by: Joe Mattiello <[email protected]>
90505b6
to
520e621
Compare
Ready for review |
Signed-off-by: Joseph Mattello <[email protected]> # Conflicts: # src/blitter.c # src/dsp.c # src/gpu.c # src/joystick.c
Signed-off-by: Joseph Mattello <[email protected]> # Conflicts: # src/blitter.c # src/dsp.c # src/gpu.c # src/joystick.c # src/vjag_memory.h
Doom, Tempest 2000, Wolfenstein 3D and others still don't work with this PR. I suggest we start all over from scratch with this and split this up into separate PRs. It's not wise to do so many changes in one big omnibus PR without proper regression testing. |
This is already split up into the smallest denomination of changes to not lose my sanity. I had all those roms working on my end and had the bugs down to 1 line in 1 commit, a program counter that got overwritten. someone else can split this up but I'm going to continue to work on this PR as-is on my own. I have other obligations this week, I'll get back to it shortly. Edit: I need to setup an intel build environment, it might just be a simple endian bug in one of the structs (though Provenance in iOS simulator runs it fine which would be AMD64) |
I'm testing it on Windows 10/11, on an intel x64 CPU. |
84a0808
to
27a19b9
Compare
This is simply a no-go in its current state. Too many things are changed, not enough testing has been done outside of Mac/iOS, the regressions that already exist with this are already way too high and the chance for future regressions even higher still. This entire PR needs to be entirely redone from the ground up and split up into bite-sized chunks that each get tested on at least Windows, Linux, and Mac. These PRs should each be rather isolated and not try to do too much things at once, so that we have a better way of chasing after any potential regressions. Also, the testing surface area needs to be bigger Just testing it on Mac alone and then only on ARM is simply not going to lead to results that can eventually be merged, libretro targets way more systems than just that. |
I'm well aware of that. This PR is the last part of the already split up larger PR. I have testing environments for other arches. I've just been busy with holidays and job interviews. There's a reason this is still marked draft. Note, I've also tested intel since I have Provenance running on MacOS catalyst and the simulator builds x86 native. I know that works fine here. Also I think this PR is as small as it goes, it's a change in the memory data struct and the code that calls it. edit:
|
97eca1d
to
7bd8425
Compare
OK, it being split up in 3 parts like that would be better, and would allow us to test easier for regressions across platforms. |
No description provided.