Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace bitwise operands with Structs for performance #58

Closed
wants to merge 37 commits into from

Conversation

JoeMatt
Copy link
Collaborator

@JoeMatt JoeMatt commented Oct 13, 2021

No description provided.

The bit shifting and masking is expensive on ARM64 for some reason.
The unions seem to greatly reduce the perfomance hit of these common calls.
The values for offset0 and offset1 were coming out to 63 when they should be no more than 3. I think the devide should have beena modulus? I wrote out the code with more vars to figure ouit what was going on
looking at the offical shamusworld repo the division was moved after the 0xFF checks
GPU_RUNNING running macro was pretty slow on ARM for some reason. Bitswise structs are faster in my testing
Signed-off-by: Joe Mattiello <[email protected]>
Signed-off-by: Joe Mattiello <[email protected]>
Signed-off-by: Joe Mattiello <[email protected]>
@JoeMatt JoeMatt force-pushed the joematt/structs branch 6 times, most recently from 90505b6 to 520e621 Compare October 14, 2021 05:50
@JoeMatt JoeMatt marked this pull request as ready for review October 15, 2021 00:07
@JoeMatt
Copy link
Collaborator Author

JoeMatt commented Oct 15, 2021

Ready for review
Tested Doom, NBA Jam and Tempest 2000

Signed-off-by: Joseph Mattello <[email protected]>

# Conflicts:
#	src/blitter.c
#	src/dsp.c
#	src/gpu.c
#	src/joystick.c
Signed-off-by: Joseph Mattello <[email protected]>

# Conflicts:
#	src/blitter.c
#	src/dsp.c
#	src/gpu.c
#	src/joystick.c
#	src/vjag_memory.h
@JoeMatt JoeMatt changed the title Structs Replace bitwise operands with Structs for performance Oct 16, 2021
@JoeMatt JoeMatt requested a review from inactive123 October 16, 2021 04:33
@JoeMatt JoeMatt self-assigned this Oct 16, 2021
@inactive123
Copy link
Contributor

Doom, Tempest 2000, Wolfenstein 3D and others still don't work with this PR.

I suggest we start all over from scratch with this and split this up into separate PRs. It's not wise to do so many changes in one big omnibus PR without proper regression testing.

@JoeMatt
Copy link
Collaborator Author

JoeMatt commented Oct 21, 2021

This is already split up into the smallest denomination of changes to not lose my sanity.

I had all those roms working on my end and had the bugs down to 1 line in 1 commit, a program counter that got overwritten.

someone else can split this up but I'm going to continue to work on this PR as-is on my own.

I have other obligations this week, I'll get back to it shortly.

Edit:
I'm still having no issues on iOS or Mac M1 with those 3 roms.

I need to setup an intel build environment, it might just be a simple endian bug in one of the structs (though Provenance in iOS simulator runs it fine which would be AMD64)

@inactive123
Copy link
Contributor

I'm testing it on Windows 10/11, on an intel x64 CPU.

@inactive123
Copy link
Contributor

inactive123 commented Nov 29, 2021

This is simply a no-go in its current state. Too many things are changed, not enough testing has been done outside of Mac/iOS, the regressions that already exist with this are already way too high and the chance for future regressions even higher still.

This entire PR needs to be entirely redone from the ground up and split up into bite-sized chunks that each get tested on at least Windows, Linux, and Mac. These PRs should each be rather isolated and not try to do too much things at once, so that we have a better way of chasing after any potential regressions. Also, the testing surface area needs to be bigger Just testing it on Mac alone and then only on ARM is simply not going to lead to results that can eventually be merged, libretro targets way more systems than just that.

@JoeMatt
Copy link
Collaborator Author

JoeMatt commented Dec 1, 2021

I'm well aware of that. This PR is the last part of the already split up larger PR. I have testing environments for other arches. I've just been busy with holidays and job interviews.

There's a reason this is still marked draft.

Note, I've also tested intel since I have Provenance running on MacOS catalyst and the simulator builds x86 native. I know that works fine here.

Also I think this PR is as small as it goes, it's a change in the memory data struct and the code that calls it.
I don't think it's possible to go any smaller but I'll look into it when I get back to this.

edit:
I actually think this can be split into 3 parts.

  1. Const additions
  2. GPUControl struct
  3. Blitter commands as structs

@JoeMatt JoeMatt force-pushed the joematt/structs branch 3 times, most recently from 97eca1d to 7bd8425 Compare December 1, 2021 04:21
@JoeMatt JoeMatt mentioned this pull request Dec 1, 2021
@JoeMatt JoeMatt closed this Dec 1, 2021
@inactive123
Copy link
Contributor

OK, it being split up in 3 parts like that would be better, and would allow us to test easier for regressions across platforms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants