-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce a tier-1 JIT compiler based on x86-64 architecture #289
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
2143749
to
34d1bdf
Compare
34d1bdf
to
ee66f7f
Compare
b52269f
to
e80b8e1
Compare
e80b8e1
to
1b7fb89
Compare
bdf96a4
to
165f036
Compare
165f036
to
22cf901
Compare
The proposed JIT compiler does not encompass effective code generation for RV32F. You should clarify which instruction sets and extensions are supported. While launching the Quake via
Another reproducer:
|
22cf901
to
51e84b7
Compare
Tested on Ubuntu 20.04, Python 3.8 generates the following error messages:
I installed Python 3.9.5 for comparison, and the above message still appear. |
51e84b7
to
d402886
Compare
As alternative, I rewrite python switch into if-else. |
The JIT-compiled code can initiate Doom, but it crashes when attempting to start the game. Tested on Ubuntu Linux 20.04. |
42fbdd1
to
49f89a1
Compare
When the using frequency of a block exceeds a predetermined threshold, the tier-1 JIT compiler traces the chained block and generate corresponding low quailty machine code. The resulting target machine code is stored in the code cache for future utilization. The primary objective of introducing the tier-1 JIT compiler is to enhance the execution speed of RISC-V instructions. This implementation requires two additional components: a tier-1 machine code generator, and code cache. Furthermore, this tier-1 JIT compiler serves as the foundational target for future improvements. In addition, we have developed a Python script that effectively traces code templates and automatically generates JIT code templates. This approach eliminates the need for manually writing duplicated code. As shown in the performance analysis below, the tier-1 JIT compiler's performance closely parallels that of QEMU in benchmarks with a constrained dynamic instruction count. However, for benchmarks featuring a substantial dynamic instruction count or lacking specific hotspots—examples include pi and STRINGSORT—the tier-1 JIT compiler demonstrates noticeably slower execution compared to QEMU. Hence, a robust tier-2 JIT compiler is essential to generate optimized machine code across diverse execution paths, coupled with a runtime profiler for detecting hotspots. * Perfromance | Metric | rv32emu-T1C | qemu | |----------+-------------+-------| |aes | 0.02| 0.031| |mandelbrot| 0.029| 0.0115| |puzzle | 0.0115| 0.009| |pi | 0.0413| 0.0177| |dhrystone | 0.331| 0.393| |Nqeueens | 0.854| 0.749| |qsort-O2 | 2.384| 2.16| |miniz-O2 | 1.33| 1.01| |primes-O2 | 2.93| 1.069| |sha512-O2 | 2.057| 0.939| |stream | 12.747| 10.36| |STRINGSORT| 89.012| 11.496| As demonstrated in the memory usage analysis below, the tier-1 JIT compiler utilizes less memory than QEMU across all benchmarks. * Memory usage | Metric | rv32emu-T1C | qemu | |----------+-------------+---------| |aes | 186,228|1,343,012| |mandelbrot| 152,203| 841,841| |puzzle | 153,423| 890,225| |pi | 152,923| 879,957| |dhrystone | 154,466| 856,404| |Nqeueens | 154,880| 858,618| |qsort-O2 | 155,091| 933,506| |miniz-O2 | 165,627|1,076,682| |primes-O2 | 150,540| 928,446| |sha512-O2 | 153,553| 978,177| |stream | 165,911| 957,845| |STRINGSORT| 167,871|1,104,702| Related: sysprog21#238
49f89a1
to
3aa197b
Compare
While T1C is capable of running Quake, it does not behave as expected when compared to the interpreter. I encountered an unexpected crash with Quake. To reproduce this, first run
Th log generated by the interpreter:
|
We do need an effective fallback for RV32F and RV32A when the JIT compiler is unable to generate the corresponding machine code. A trampoline would be required for switching between translated blocks and non-JIT parts. |
042ac1c
to
dc80005
Compare
dc80005
to
03cdb98
Compare
Finally, the JIT compiler is incorporated into this emulator repository. |
When the using frequency of a block exceeds a predetermined threshold, the tier-1 JIT compiler traces the chained block and generate corresponding low quailty machine code. The resulting target machine code is stored in the code cache for future utilization.
The primary objective of introducing the tier-1 JIT compiler is to enhance the execution speed of RISC-V instructions. This implementation requires two additional components: a tier-1 machine code generator, and code cache. Furthermore, this tier-1 JIT compiler serves as the foundational target for future improvements.
In addition, we have developed a Python script that effectively traces code templates and automatically generates JIT code templates. This approach eliminates the need for manually writing duplicated code.
As shown in the performance analysis below, the tier-1 JIT compiler's performance closely parallels that of QEMU in benchmarks with a constrained dynamic instruction count. However, for benchmarks featuring a substantial dynamic instruction count or lacking specific hotspots—examples include pi and STRINGSORT—the tier-1 JIT compiler demonstrates noticeably slower execution compared to QEMU.
Hence, a robust tier-2 compiler is essential to generate optimized machine code across diverse execution paths, coupled with a runtime profiler for detecting hotspots.
As demonstrated in the memory usage analysis below, the tier-1 JIT compiler utilizes less memory than QEMU across all benchmarks.
Related: #238