Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project 2: Di Lu #24

Open
wants to merge 36 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
a86f7e9
Finish all scan functions
dluisnothere Sep 16, 2022
e2317df
More progress on compact stream
dluisnothere Sep 17, 2022
5338e67
complete main homework components
dluisnothere Sep 17, 2022
bdbe23f
Update README.md
dluisnothere Sep 18, 2022
790179f
Update README.md
dluisnothere Sep 18, 2022
b42cde2
Update README.md
dluisnothere Sep 18, 2022
9d65bfb
Update README.md
dluisnothere Sep 18, 2022
2cbe705
Update README.md
dluisnothere Sep 19, 2022
b22d3eb
Update README.md
dluisnothere Sep 19, 2022
0d55882
Update README.md
dluisnothere Sep 19, 2022
fa1c587
add image and attempt radix sort
dluisnothere Sep 19, 2022
c79900b
Add images
dluisnothere Sep 19, 2022
4003434
Update README.md
dluisnothere Sep 19, 2022
b353073
add better pic
dluisnothere Sep 19, 2022
81a585c
Update README.md
dluisnothere Sep 19, 2022
bbbd84a
Update README.md
dluisnothere Sep 19, 2022
a66f4c3
Update README.md
dluisnothere Sep 19, 2022
18f2fe5
Update README.md
dluisnothere Sep 19, 2022
01e72ff
Update README.md
dluisnothere Sep 19, 2022
65d0503
Update README.md
dluisnothere Sep 19, 2022
3c0f515
Update README.md
dluisnothere Sep 19, 2022
1d14bdd
Update README.md
dluisnothere Sep 19, 2022
c72c300
add table images
dluisnothere Sep 19, 2022
6629f18
Update README.md
dluisnothere Sep 19, 2022
4dae2cf
Update README.md
dluisnothere Sep 19, 2022
880233b
Update README.md
dluisnothere Sep 19, 2022
d41d4a3
Update README.md
dluisnothere Sep 19, 2022
18ffcb6
Update README.md
dluisnothere Sep 19, 2022
07baae9
Update README.md
dluisnothere Sep 19, 2022
ba55e23
Update README.md
dluisnothere Sep 19, 2022
a1c867c
Update README.md
dluisnothere Sep 19, 2022
b6de820
Update README.md
dluisnothere Sep 19, 2022
8ea98cd
Update README.md
dluisnothere Sep 19, 2022
6a5cf83
Update README.md
dluisnothere Sep 19, 2022
488c2d6
Update README.md
dluisnothere Sep 19, 2022
74dc846
Update README.md
dluisnothere Sep 19, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Update README.md
  • Loading branch information
dluisnothere authored Sep 19, 2022
commit 488c2d6536541804dba3834086e07a74f4a052e7
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -185,13 +185,13 @@ a 0 at the very end.
* Can you find the performance bottlenecks? Is it memory I/O? Computation? Is
it different for each implementation?

I am not surprised that the GPU efficient implementation isn't faster until the size of the input array gets
I am not surprised that the GPU efficient implementation isn't faster until the size of the input array gets
significantly large. This is because I didn't optimize my implementations to ensure that threads are not idling.
For example, I'm only using every other thread, then every 4 threads, etc. during scan. A way to bypass this is to
launch fewer threads, but computing the destination it should go to based on its idx and the current up-sweep or
down-sweep index.

Additionally, I ensured that my GPU timers included as few cudaMallocs, Memcpys, and Memsets as possible, but it's not
Additionally, I ensured that my GPU timers included as few cudaMallocs, Memcpys, and Memsets as possible, but it's not
always feasible to move all of them outside of my "main" algorithm code, so to speak. For example, I need to call
cudaMalloc on an array where I only know its length after a value is computed during the algorithm. Hence, I can only cudaMalloc
within the algorithm.