When the using frequency of a block exceeds a predetermined threshold,
the tier-1 JIT compiler traces the chained block and generate
corresponding low quailty machine code. The resulting target machine
code is stored in the code cache for future utilization.
The primary objective of introducing the tier-1 JIT compiler is to
enhance the execution speed of RISC-V instructions. This implementation
requires two additional components: a tier-1 machine code generator,
and code cache. Furthermore, this tier-1 JIT compiler serves as the
foundational target for future improvements.
In addition, we have developed a Python script that effectively traces
code templates and automatically generates JIT code templates. This
approach eliminates the need for manually writing duplicated code.
As shown in the performance analysis below, the tier-1 JIT compiler's
performance closely parallels that of QEMU in benchmarks with a
constrained dynamic instruction count. However, for benchmarks
featuring a substantial dynamic instruction count or lacking specific
hotspots—examples include pi and STRINGSORT—the tier-1 JIT compiler
demonstrates noticeably slower execution compared to QEMU.
Hence, a robust tier-2 JIT compiler is essential to generate optimized
machine code across diverse execution paths, coupled with a runtime
profiler for detecting hotspots.
* Perfromance
| Metric | rv32emu-T1C | qemu |
|----------+-------------+-------|
|aes | 0.02| 0.031|
|mandelbrot| 0.029| 0.0115|
|puzzle | 0.0115| 0.009|
|pi | 0.0413| 0.0177|
|dhrystone | 0.331| 0.393|
|Nqeueens | 0.854| 0.749|
|qsort-O2 | 2.384| 2.16|
|miniz-O2 | 1.33| 1.01|
|primes-O2 | 2.93| 1.069|
|sha512-O2 | 2.057| 0.939|
|stream | 12.747| 10.36|
|STRINGSORT| 89.012| 11.496|
As demonstrated in the memory usage analysis below, the tier-1 JIT
compiler utilizes less memory than QEMU across all benchmarks.
* Memory usage
| Metric | rv32emu-T1C | qemu |
|----------+-------------+---------|
|aes | 186,228|1,343,012|
|mandelbrot| 152,203| 841,841|
|puzzle | 153,423| 890,225|
|pi | 152,923| 879,957|
|dhrystone | 154,466| 856,404|
|Nqeueens | 154,880| 858,618|
|qsort-O2 | 155,091| 933,506|
|miniz-O2 | 165,627|1,076,682|
|primes-O2 | 150,540| 928,446|
|sha512-O2 | 153,553| 978,177|
|stream | 165,911| 957,845|
|STRINGSORT| 167,871|1,104,702|
Related: sysprog21#238