-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: EDGE3D update #478
base: develop
Are you sure you want to change the base?
WIP: EDGE3D update #478
Conversation
32723a7
to
702238f
Compare
@pguthrey is this ready for review? |
Not just yet. Need to experiment more with different implementations. Will come back to this later. |
050da7e
to
c9b19a6
Compare
Here are the results of these changes. Good improvement for CUDA. Impossibly good improvement for HIP. I checked that the results are the same as the previous algorithm... but I might look more into what is going on with HIP.
|
Perhaps there could still be register spilling with cuda or something like that that is making a dramatic difference. We'll have to look at the instructions to see what happened. |
That makes some sense. If I add the memory needed by the vectors and the matrix together I get
|
Summary