Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RTL Simulation Enhancements #562

Closed
wants to merge 15 commits into from
Closed

RTL Simulation Enhancements #562

wants to merge 15 commits into from

Conversation

ssteffl
Copy link
Contributor

@ssteffl ssteffl commented May 22, 2020

Related issue:

Type of change: new features

Impact: simulation infrastructure

Release Notes

  • Enabled verilator multithreading, profiling, fst waveforms
  • Added bin/numa_prefix wrapper for multi-socket multithreaded verilator
  • Enabled selective hardware printfs in verilator.
  • created make help command with variable and command usage
  • change $DIR evaluation strategy in scripts/build-toolchains.sh and env.sh

Copy link
Contributor

@jwright6323 jwright6323 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't comment much on the actual verilator stuff, but this looks like it is based on an older commit, so we should make sure we capture all the changes back into this. Some general questions/style nits too.

sims/verilator/Makefile Outdated Show resolved Hide resolved
sims/verilator/Makefile Outdated Show resolved Hide resolved
sims/verilator/Makefile Outdated Show resolved Hide resolved
scripts/build-util.sh Outdated Show resolved Hide resolved
generators/utilities/src/main/resources/csrc/emulator.cc Outdated Show resolved Hide resolved
scripts/build-isa-sim.sh Outdated Show resolved Hide resolved
scripts/build-toolchains.sh Outdated Show resolved Hide resolved
sims/verilator/Makefile Outdated Show resolved Hide resolved
Copy link
Contributor

@colinschmidt colinschmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of changes here that seem to regress functionality we currently have.

generators/utilities/src/main/resources/csrc/emulator.cc Outdated Show resolved Hide resolved
generators/utilities/src/main/resources/csrc/emulator.cc Outdated Show resolved Hide resolved
generators/utilities/src/main/resources/csrc/emulator.cc Outdated Show resolved Hide resolved
generators/utilities/src/main/resources/csrc/emulator.cc Outdated Show resolved Hide resolved
sims/verilator/Makefile Outdated Show resolved Hide resolved
sims/verilator/Makefile Show resolved Hide resolved
sims/verilator/Makefile Show resolved Hide resolved
sims/verilator/Makefile Outdated Show resolved Hide resolved
sims/verilator/Makefile Show resolved Hide resolved
scripts/build-toolchains.sh Outdated Show resolved Hide resolved
@abejgonzalez
Copy link
Contributor

Do you know if Verilator multi-threading works for BOOM targets? Last time I tried doing it, it failed.

@ssteffl
Copy link
Contributor Author

ssteffl commented May 22, 2020

i'm pretty sure it failed for this reason: #562 (comment)

@jwright6323
Copy link
Contributor

A few other questions/comments in addition to @abejgonzalez's.

  1. Does this run in multithreaded mode by default now, or do we need to set flags (documentation looks like flags)?
  2. If the latter, we probably need to add CI for it. If the former, that needs to go in the documentation (and probably needs to be discussed in the chipyard meeting).
  3. Have you benchmarked this? Does it actually improve performance? I was under the impression that these types of simulations don't parallelize well, even in proprietary simulators, but if it does that would be good to note.
  4. Is VL8 a commonly used prefix for verilator? I don't necessarily think you need to change it back, but I find it a bit unintuitive. Maybe others can comment.

common.mk Outdated Show resolved Hide resolved
@ssteffl
Copy link
Contributor Author

ssteffl commented May 22, 2020

@jwright6323

  1. default is 1 thread
  2. i've already updated the docs, and we can surely add some MT test case to CI
  3. yes. i get a 6.7x speedup on a bwrc using 20 cores (25 kHz vs 4 kHz)
  4. i like it, since you can use 3 letters for the digital simulators: VCS, IVL, VL8

bin/numa_prefix Outdated Show resolved Hide resolved
Co-authored-by: Abraham Gonzalez <[email protected]>
@jwright6323
Copy link
Contributor

jwright6323 commented May 22, 2020

  1. yes. i get a 6.7x speedup on a bwrc using 20 cores (25 kHz vs 4 kHz)

@abejgonzalez how many cores does ferry have? Will this speed up CI?

@ssteffl
Copy link
Contributor Author

ssteffl commented May 22, 2020

  1. yes. i get a 6.7x speedup on a bwrc using 20 cores (25 kHz vs 4 kHz)

@abejgonzalez how many cores does ferry have? Will this speed up CI?

@jwright6323 keep in mind that compilation for me took 40% longer when compiling for 20 threads on my test case.

but verilator compilation is only < 20% of the total latency in the chisel->verilated flow anyways, so it not really that big of a compile-time hit

@@ -0,0 +1,74 @@
#!/usr/bin/env python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is useful, but I'd argue this is also out of scope for the PR. Is this worked into the build flow at all? That would be a nice feature to add.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think its in the scope of this pr. this is my "verilator enhancements" pr. i'm removed the build-script stuff

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this script is a hack. this could be done in a firrtl pass, and hopefully the firrtl-pass would not be too slow compared to this hacky solution.

however, the firrtl pass does not exist, so we must live with this hack for now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title is Multithread verilator. This is what goes into the release notes, merge commit, etc. If this is a Verilator enhancement PR you need to adjust the title/description.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree with John here. I think the multi-threading work will take significantly longer to get in/reviewed, whereas the printf stuff would be a quick merge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR scope adjusted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still like to see this worked into the build flow in this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the other hand, it is nice to not have to invoke the JVM and run all the firrtl passes again just to change which printfs get displayed. So I can see a case for having the printf filtering happen after the make verilog target completes.

Although I admit again, that the current solution is a hack.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you need to invoke the JVM? Just have another intermediate make target that runs this script

@ssteffl
Copy link
Contributor Author

ssteffl commented May 22, 2020

@colinschmidt i agree that multithreading should not be blindly applied in all scenarios. and it seems less useful when running a large batch of tiny simulations.

a 6.7x speedup is assuming your only running 1 simulation at a time. i never did a performance comparison of full machine utilization with 1-thread per task in parallel, versus all-cores per tasks serially, but it probably won't a 6.7x speed difference.

@ssteffl ssteffl changed the title Verilator Enhancements RTL Simulation Enhancements May 23, 2020
Copy link
Contributor Author

@ssteffl ssteffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incorporated feedback

Copy link
Contributor Author

@ssteffl ssteffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finished review

common.mk Outdated Show resolved Hide resolved
Copy link
Contributor

@abejgonzalez abejgonzalez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to have this in the next Chipyard release instead of the most recent one so that we have more time to dogfood this and update formatting/VCS simulation. So that means wait to merge this for a week or two.

Copy link
Contributor

@jwright6323 jwright6323 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still kind of all over the place, and hard to review. I think the build system changes need to be in with this, and VCS needs to mirror the verilator changes (e.g. printf stuff, etc), or they will diverge. We want to try to keep them as similar as possible, or features get dropped and it causes a bigger maintenance burden. I haven't really reviewed the verilator Makefile changes, becuase there are too many formatting changes to really tell what's different.

.circleci/config.yml Show resolved Hide resolved
@@ -0,0 +1,74 @@
#!/usr/bin/env python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still like to see this worked into the build flow in this PR

#########################################################################################
# run normal binary with hardware-logged insn dissassembly
run-binary: $(sim)
(set -o pipefail && $(NUMA_PREFIX) $(sim) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think this is less readable than it was before, but even if I didn't, style changes make PRs harder to review, as it's less obvious where functionality has and has not changed. I'm not just saying this to be stubborn; you're taking more of your reviewers' time by doing this, and it's annoying that I keep having to repeat this.

common.mk Show resolved Hide resolved

#----------------------------------------------------------------------------
HELP_COMMANDS += \
" run-binary = run [./$(shell basename $(sim))] and output instructions" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not immediately obvious to me what "output instructions" means. I might suggest "log instructions to a file"

sim_prefix = simulator
sim = $(sim_dir)/$(sim_prefix)-$(MODEL_PACKAGE)-$(CONFIG)
sim_debug = $(sim_dir)/$(sim_prefix)-$(MODEL_PACKAGE)-$(CONFIG)-debug

# verilator doesn't use +permissive, but common.mk expects the simulator's
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

colin's PR was going to fix this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement is not true. +permissive is a valid HTIF argument, and is currently able to be passed to verilator if you set it so, so this comment is wrong, and there's no reason to add it.

sims/verilator/Makefile Show resolved Hide resolved
sims/verilator/Makefile Show resolved Hide resolved
--unroll-count 256 \
-Werror-PINMISSING \
-Werror-IMPLICIT \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these important?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so... why are they removed if these are important?

@@ -1,112 +1,207 @@
#########################################################################################
#############################################################################
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are just too many formatting changes to give this file the detailed review it needs. Many of these flags are not exhaustively tested by CI, so we need to be pretty careful about what we change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the solution sounds like beefing up the regression tests. i will do that so we can be confident that formatting changes are not causing regressions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the right way to do this is to separate the refactor into multiple PRs.

@jwright6323
Copy link
Contributor

@ssteffl Please stop marking conversations as resolved, it's hard to find them to reply.

@abejgonzalez
Copy link
Contributor

I have just started splitting this work into multiple PR's to get individual feedback / fasttrack portions of this PR. #649 #650.

This was referenced Aug 21, 2020
@abejgonzalez
Copy link
Contributor

Closing since this was split into #655 #654 #650

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants