Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project 1: Wanru Zhao #7

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 68 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,73 @@
**University of Pennsylvania, CIS 565: GPU Programming and Architecture,
Project 1 - Flocking**

* (TODO) YOUR NAME HERE
* (TODO) [LinkedIn](), [personal website](), [twitter](), etc.
* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
* Wanru Zhao, 59981278
* [LinkedIn](www.linkedin.com/in/wanru-zhao).
* Tested on: Windows 10, Intel(R) Xeon(R) CPU E5-1630 v4 @ 3.70GHz, GTX 1070 (SIG Lab)

### (TODO: Your README)
### Screenshots

Include screenshots, analysis, etc. (Remember, this is public, so don't put
anything here that you don't want to share with the world.)
Screenshot of flocking boids

![](images/screenshot.jpg)

GIF

![](images/flocking_cut.gif)

### Performance Analysis
#### Framerate change with increasing number of boids

Number of Boids | Navie | Uniform Grid | Coherent Grid
:---|:---:|:---:|:---:
5000 | 547.471 | 932.684 | 943.434
10000 | 230.529 | 1546.9 | 1521.98
15000 | 120.268 | 1549.28 | 1498.19
20000 | 57.9549 | 1347.32 | 1398.56
50000 | 19.7976 | 912.897 | 988.624
100000 | 2.7 | 510.311 | 593.427
150000 | Crash | 313.939 | 391.472

![](images/fps_boidnum.jpg)

#### Framerate change with increasing block size with 5000 boids

Block Size | Navie | Uniform Grid | Coherent Grid
:---|:---:|:---:|:---:
32 | 549.209 | 910.022 | 912.538
64 | 548.936 | 923.373 | 942.928
128 | 547.471 | 932.684 | 943.434
256 | 541.605 | 961.5 | 956.118
512 | 529.656 | 891.667 | 939.112

![](images/fps_blocksize.JPG)

#### Framerate change with 8 Cells and 21 Cells, block size = 128, Coherent Grid

Number of Boids | 8 Cells | 21 Cells
:---|:---:|:---:
5000 | 943.434 | 925.261
10000 | 1521.98 | 1579.39
15000 | 1498.19 | 1529.99
20000 | 1398.56 | 1438.4
50000 | 988.624 | 981.994

![](images/fps_8v27.JPG)

### Problems

* For each implementation, how does changing the number of boids affect performance? Why do you think this is?

For Naive method, as number of boids increase, the average FPS drops, while fps of uniform and coherent grid firstly increases and then drops, since there are more boids needed to be calculated as neighbors which influence every boid. For grid ones, when the number of boids is not large enough, the cost of computing grids reduces the performance, however, this cost can be neglected when number of boids is large.

* For each implementation, how does changing the block count and block size affect performance? Why do you think this is?

When the block size increases, the average FPS for each method does not change a lot. For Naive, the performance drops, and for grid methods, the performance increases at first and then drops slightly. The warp size is 32 and the number of SMs is 15 for the computer I used. I think the reason for performance decreasing is due to the warp size.

* For the coherent uniform grid: did you experience any performance improvements with the more coherent uniform grid? Was this the outcome you expected? Why or why not?

Yes. When the number of boids is larger than 50000, the performance of coherent grid is better than scattered grid. Since the searching step for shuffled boid indices is skipped.

* Did changing cell width and checking 27 vs 8 neighboring cells affect performance? Why or why not? Be careful: it is insufficient (and possibly incorrect) to say that 27-cell is slower simply because there are more cells to check!

The performance of 27-cell is slightly better than 8-cell when number of boids is within some range. Within this range, the cost of determining which cell should be considered as neighbor of current cell is slightly larger than the cost of iterating more cells.
Binary file added images/flocking.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/flocking_cut.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/fps_8v27.JPG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/fps_blocksize.JPG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/fps_boidnum.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/screenshot.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,5 @@ set(SOURCE_FILES

cuda_add_library(src
${SOURCE_FILES}
OPTIONS -arch=sm_20
OPTIONS -arch=sm_61
)
Loading