Skip to content

Gem Forge Framework

Zhengrong Wang edited this page Jun 10, 2021 · 2 revisions

Overview

Gem-Forge is originally developed to accelerate the architectural design space exploration with both trace-based and execution-based simulation. Unlike other trace-based simulators that work on traces for assembly instructions, Gem-Forge is built on traces of LLVM IR instructions, which allows the user to directly leverage various compiler analyses and access high-level information lost when compiling to low-level assembly code. Users can manipulate LLVM IR traces to explore architectural changes, e.g. replacing some instructions with a new complex instruction, vectorizing, etc. Both the vanilla trace and the modified one can be simulated by a modified CPU model in gem5, which gives you the performance results. The overall workflow can be summarized like this:

  • Collect Traces
    Gem-Forge provides an LLVM pass to instrument the program with calls to a runtime library. When the instrumented program runs, it reads some environment variables and dumps a trace of LLVM IR instructions in compressed Protobuf format. The runtime tracer can be configured into two modes: profiling mode which just dumps the instructions to reduce the trace size, and detailed mode which includes both the instructions and all the operands. Typically users would first run the program in profiling mode to identify hot regions (e.g., using SimPoint), and then collect detailed traces for these hot regions. The detailed mode records the actual value of instructions' operands and results, which are important for many analyses, e.g. recognizing the memory access pattern.

    The tracer is implemented in transform/src/trace.

  • Transform Traces
    After collecting the trace, you can transform it to reflect the architectural changes you want to explore. For example, your new CPU has a special function unit to compute multiply-and-accumulate, and you want to verify how much benefit the benchmark would get from this new feature without really implement the compiler analysis and backend. In such cases, Gem-Forge can be handy as you can just create a "fake" mul-acc instruction to replace those matched sequences in the trace. The transformed trace can be simulated in Gem5 to validate your assumption.

    We already provide many transformations, e.g. stream analysis, vectorization, etc. They are located in transform/src, with separate folders for each transformation.

  • Simulate Traces
    Finally, traces are simulated in Gem5. We add a new GemForgeCPU model in gem5/src/cpu/gem_forge, which takes in an LLVM IR trace and feeds into an out-of-order pipeline. This CPU is divided into five stages: fetch, decode, rename, IEW, and commit, similar to Gem5's O3 CPU.

Execution-Based Workflow

Although trace-based simulation allows you to quickly verify the idea and potential benefits, it is not as accurate as execution-based simulation and only serves as a first-order approximation, especially for LLVM IR traces, as LLVM IR assumes an infinite number of registers. Also, the misspeculated path is not in the trace, which also leads to inaccurate results. Therefore, after validation with trace-based simulation, users may want to switch to execution-based workflow to fully examine the benefits and overheads.

Gem-Forge also provides supports for execution-based simulation. We have supported execution-based simulation for stream specialization, i.e. we can generate a binary with stream instructions and simulate it in Gem5's original Minor/O3 CPU model with Ruby coherence support.

Clone this wiki locally