Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project 1 : Jie Meng #10

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 58 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,63 @@
**University of Pennsylvania, CIS 565: GPU Programming and Architecture,
Project 1 - Flocking**
=======================================================================
* Jie Meng
* [LinkedIn](https://www.linkedin.com/in/jie-meng/), [twitter](https://twitter.com/JieMeng6).
* Tested on: Windows 10, i7-7700HQ @ 2.80GHz, 16GB, GTX 1050 4GB (My personal laptop)

* (TODO) YOUR NAME HERE
* (TODO) [LinkedIn](), [personal website](), [twitter](), etc.
* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)

### (TODO: Your README)

Include screenshots, analysis, etc. (Remember, this is public, so don't put
anything here that you don't want to share with the world.)
### (Result)
* Flocking (Uniform 2xGrid Scattered @10k boids & 1024 blocksize)
![Running screenshot](images/screenshot1.png)
* Gif
![Running gif](images/3.gif)

### (Performance Analysis)
**Bechmark Method:**

* Use kernel function call frequency as indicator
* Calculation method: Launch NSight Performance Analysis, use function call times(under Cuda Summary) divided by running time(substracts launching time, usually around 3s)
* In one simulation round, the kernel function is called exactly once, so this way we can get average framerate

**Data and Analysis**

Number of boids affecting framerate, NOT visualized
![Framerate Comparison](images/notvisualized.png)
* For brutal method, performance is significantly affected by the number of boids: from 5k-25k boids, framerate decreases by a factor of 20. At 1 million boids, the framerate is almost 0.
* Scatter grid search method works well: its performance is relatively unsensitive (comparing to brutal force) to the change of number of boids: from 5k to 25k, framerate only has slight fluctuation around 245fps. At 1 million boids, it runs at about 3.2fps
* Coherent grid search method also has a good performance: it is also unsensitive to the change of number of boids: from 5k to 25k, framerate only has slight fluctuation around 235fps. At 1 million boids, it runs at about 5.1fps
* At 5k boids, brutal method has significantlly larger framerate than the other two, after 10k boids, it performs worse than the other two.
* Uniform Grid methods have stable performance, and essentially performs better than brutal method. This is because they only search a fixed number of cells, not all boids.
* If number of grids continues to increase, all methods would slow down
* At 1 million boids, coherent method is better(5.1fps) than scattered method(3.2fps), this suggests that coherent method performance is relatively unsensitive comparing to scattered method, and would perform better than scattered method at some number of boids(this will be clearer from next graph).

Number of boids affecting framerate, visualized
![Framerate Comparison2](images/visualized.png)
This illustrates similar results as above, differences worth pointing out:
* Even at 5k boids, uniform grid methods are better than brutal force method.
* Coherent method becomes better than scattered method after at most 50k boids.

Block size affecting framerate, visualized
![Framerate Comparison3](images/blocksize.png)
From this graph:
* Brutal force method is essentially not affected by block size.
* Scattered method works worse with larger block size.
* Coherent method works better with larger block size.
* Coherent method becomes better than scattered method after a block size of 512.
* The data of uniform methods appears a strange behavior: it's not changing stably, but I think this is because the block size is increasing exponentially.

**Answer to questions:**
* Framerate decreases when the number of boids increases for all methods, since this would involve more computation.
Uniform grid methods are better than brutal force method, since much less computation is involved. With 5k boids brutal force method performs better (not visualized) because the number of boids is relatively small, so the extra operations in uniform grids methods slow them down.
* Performance varies only slightly when block size changes(with block count changes simultaneously). I'm not sure why they changes this way(see the above graph): the brutal force method is not affected by block size; the scattered method tend to slow down when block size increases; the coherent method tend to speed up when block size increases.
I didn't figure out why it performs like this, but I guess there is something to do with memory access speed.
* I expected that the coherent method would always perform better than scattered method, but it is not. At smaller boid counts the coherent method is worse. I believe it is because
the time spent on copying arrays(reshuffle and shuffle back). But when boid count becomes significantly large, the coherent method is better, since much more boid data is sequential by shuffling, so memory access saves much more time.
* Under the configuration of (Uniform grid Scattered @ 10k boids & 1024 block size), checking 27 neighboring cells (1xgrid) is slightly faster than 8 cells: 190fps vs. 180fps, I believe this is because the boids count need to check is smaller with 1X grid size, so less data fetch operation is required, therefore performance is better.

### (Code Gists)
* In `kernel.cu` ,toggle between **GRID1X** and **GRID2X** for difference grid cell size
* Helper functions: `swapspeed` to ping-ping speed buffer, `shuffleBufferWithIndices` to shuffle position and velocity arrays.
`kernUpdateVelNeighborSearchScattered1XGRID` and `kernUpdateVelNeighborSearchCoherent1XGRID` for 1X grid cell width search.
* 2X grid cell width search is a little bit hard code and may appears arcane.
Binary file added images/1.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/3.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/blocksize.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/notvisualized.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/screenshot1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/visualized.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading