Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What are gem5 simulation options for streaming engine in SSP? #1

Open
haeunlee99 opened this issue Aug 11, 2022 · 3 comments
Open

What are gem5 simulation options for streaming engine in SSP? #1

haeunlee99 opened this issue Aug 11, 2022 · 3 comments

Comments

@haeunlee99
Copy link

Hello,

I want to check out streaming engine introduced in the paper "Stream-based Memory Access Specialization for General Purpose Processors". I am running experiments that I think would be the right configuration, however I am new to gem5 so is not pretty sure whether I am doing the right thing. If possible, can someone provide gem5 simulation options so I can reproduce engine introduced in above paper?

Thanks a lot!
Haeun

@seanzw
Copy link
Collaborator

seanzw commented Aug 17, 2022

I think these are the options to configure the stream engine in SSP for out-of-order 8 core.

        "--gem-forge-stream-engine-enable",
        "--gem-forge-stream-engine-total-run-ahead-bytes=2048",
        "--gem-forge-stream-engine-enable-lsq",
        "--gem-forge-stream-engine-enable-coalesce",
        "--gem-forge-stream-engine-throttling=global",

You can try to play with it and take a look at the gem5/configs/example/gem_forge/run.py for all the options.

@haeunlee99
Copy link
Author

Thanks for reply and sorry for asking again :(
Are those options then appropriate for single out of order core SSP configuration?

--llvm-store-queue-size=32
--llvm-mcpat=0
--caches
--l2cache
--gem-forge-num-active-cpus=1
--gem-forge-cache-load-ports=6
--gem-forge-cache-store-ports=4
--link-width-bits=256
--llc-select-low-bit=6
--gem-forge-enable-func-acc-tick
--prog-interval=10000
--tlb-timing-se
--l1tlb-size=64
--l1tlb-assoc=8
--l2tlb-size=2048
--l2tlb-assoc=16
--l2tlb-hit-lat=8
--walker-se-lat=16
--walker-se-port=2
--num-cpus=1
--num-l2caches=1
--ruby
--access-backing-store
--router-latency=2
--link-latency=1
--mem-channels=2
--mem-size=16GB
--l1i_size=32kB
--l1i_assoc=8
--l1d_size=32kB
--l1d_lat=8
--l1d_mshrs=8
--l1d_assoc=8
--l1_5d_size=256kB
--l1_5d_assoc=16
--l1_5d_mshrs=16
--l2_lat=16
--l2_size=1MB
--l2_assoc=16
--l3_lat=20
--fast-forward=-1
--options=1
--cpu-type=DerivO3CPU
--llvm-issue-width=8
--gem-forge-stream-engine-enable
--gem-forge-stream-engine-total-run-ahead-bytes=2048
--gem-forge-stream-engine-enable-lsq
--gem-forge-stream-engine-enable-coalesce
--gem-forge-stream-engine-throttling=global \

I have omitted following options, that seems to be related to multi core environment.

--num-dirs=4
--mesh-rows=8
--network=garnet2.0
--garnet-enable-multicast
--topology=MeshDirCorners_XY
--routing-YX \

But I am not sure whether these options are necessary.

--link-width-bits=256
--gem-forge-enable-func-acc-tick \
--access-backing-store
--router-latency=2
--link-latency=1
--gem-forge-stream-engine-throttling=global

And I have several questions regarding it.

  1. Since L3 cache should also be connect to memory, are link-width-bits, router-latency, link-latency necessary?
  2. Does access-backing-store option mean we have DRAM?
  3. What is the difference between dynamic throttling and global throttling? Isn't the one introduced in paper dynamic throttling?
  4. What is the option gem-forge-enable-func-acc-tick?

Thank you :)

@seanzw
Copy link
Collaborator

seanzw commented Aug 31, 2022

Sorry for the late replay.

I think you are mixing some options here: gem5 has two cache systems. One is classical cache system, and the other one is Ruby. Which one are you trying to use here? All the mesh topology and link width are related to Ruby. All the following explanation assumes you are using Ruby.

  1. In our configuration, L3 cache is not directly connected to the DRAM. Instead, they communicated through the router and the mesh network, so link-width-bits, router-latency and link-latency still matters here.
  2. access-backing-store is an option for Ruby that always get the data from a backing "groundtruth" storage. We need this to simplify the implementation efforts. Regardless of this option, we always have DRAM.
  3. IIRC, global-throttling is the one we evaluated in the paper. You can ignore dynamic-throttling, as it's some design choices that we did not end up using.
  4. gem-forge-enable-func-acc-tick will enable gem5 to dump a breakdown of how many execution cycles are spent within each function in the output folder. It's a profiling flag and should not change the simulation results.

I hope this answers your questions. Let me if you have any more issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants