Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a tier-1 JIT compiler based on x86-64 architecture #289

Merged
merged 2 commits into from
Dec 16, 2023

Conversation

qwe661234
Copy link
Collaborator

@qwe661234 qwe661234 commented Dec 12, 2023

When the using frequency of a block exceeds a predetermined threshold, the tier-1 JIT compiler traces the chained block and generate corresponding low quailty machine code. The resulting target machine code is stored in the code cache for future utilization.

The primary objective of introducing the tier-1 JIT compiler is to enhance the execution speed of RISC-V instructions. This implementation requires two additional components: a tier-1 machine code generator, and code cache. Furthermore, this tier-1 JIT compiler serves as the foundational target for future improvements.

In addition, we have developed a Python script that effectively traces code templates and automatically generates JIT code templates. This approach eliminates the need for manually writing duplicated code.

As shown in the performance analysis below, the tier-1 JIT compiler's performance closely parallels that of QEMU in benchmarks with a constrained dynamic instruction count. However, for benchmarks featuring a substantial dynamic instruction count or lacking specific hotspots—examples include pi and STRINGSORT—the tier-1 JIT compiler demonstrates noticeably slower execution compared to QEMU.

Hence, a robust tier-2 compiler is essential to generate optimized machine code across diverse execution paths, coupled with a runtime profiler for detecting hotspots.

  • Performance
Metric rv32emu(T1C) qemu
aes 0.02 0.031
mandelbrot 0.029 0.0115
puzzle 0.0115 0.009
pi 0.0413 0.0177
dhrystone 0.331 0.393
Nqeueens 0.854 0.749
qsort-O2 2.384 2.16
miniz-O2 1.33 1.01
primes-O2 2.93 1.069
sha512-O2 2.057 0.939
stream 12.747 10.36
STRINGSORT 89.012 11.496

As demonstrated in the memory usage analysis below, the tier-1 JIT compiler utilizes less memory than QEMU across all benchmarks.

  • Memory usage
Metric rv32emu(T1C) qemu
aes 186,228 1,343,012
mandelbrot 152,203 841,841
puzzle 153,423 890,225
pi 152,923 879,957
dhrystone 154,466 856,404
Nqeueens 154,880 858,618
qsort-O2 155,091 933,506
miniz-O2 165,627 1,076,682
primes-O2 150,540 928,446
sha512-O2 153,553 978,177
stream 165,911 957,845
STRINGSORT 167,871 1,104,702

Related: #238

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

jserv

This comment was marked as resolved.

src/riscv_private.h Outdated Show resolved Hide resolved
src/rv32_template.c Outdated Show resolved Hide resolved
src/rv32_template.c Outdated Show resolved Hide resolved
src/rv32_template.c Outdated Show resolved Hide resolved
src/cache.c Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
@jserv jserv requested a review from vacantron December 12, 2023 15:49
@jserv jserv changed the title Introduce baseline tiered1 JIT compiler Introduce a tiered-1 JIT compiler based on x86-64 architecture Dec 12, 2023
jserv

This comment was marked as outdated.

@qwe661234 qwe661234 force-pushed the wip/tiered_1_JIT branch 2 times, most recently from 2143749 to 34d1bdf Compare December 13, 2023 07:48
src/jit_x64.c Outdated Show resolved Hide resolved
src/rv32_template.c Outdated Show resolved Hide resolved
@qwe661234 qwe661234 force-pushed the wip/tiered_1_JIT branch 2 times, most recently from b52269f to e80b8e1 Compare December 13, 2023 08:37
jserv

This comment was marked as outdated.

src/common.h Outdated Show resolved Hide resolved
src/jit_x64.c Outdated Show resolved Hide resolved
src/jit_x64.h Outdated Show resolved Hide resolved
src/feature.h Outdated Show resolved Hide resolved
src/jit_x64.h Outdated Show resolved Hide resolved
src/jit_x64.h Outdated Show resolved Hide resolved
src/cache.c Outdated Show resolved Hide resolved
src/jit_x64.c Outdated Show resolved Hide resolved
src/jit_x64.h Outdated Show resolved Hide resolved
src/jit_x64.c Outdated Show resolved Hide resolved
src/jit_x64.c Outdated Show resolved Hide resolved
src/jit_x64.c Outdated Show resolved Hide resolved
src/rv32_template.c Outdated Show resolved Hide resolved
@jserv
Copy link
Contributor

jserv commented Dec 14, 2023

The proposed JIT compiler does not encompass effective code generation for RV32F. You should clarify which instruction sets and extensions are supported.

While launching the Quake via make quake, I am aware of the following:

Playing registered version.
Console initialized.
Exe: 20:12:56 Sep  1 2023
 8.0 megabyte heap
792k surface cache
Assertion failed: (NULL), function do_fmvwx, file src/rv32_jit_template.c, line 556.

Another reproducer:

 $ build/rv32emu build/maj2random.elf
maj_random (NROUNDS=2)                 count     mean(x)     mean(y) variance(x) variance(y)  std-dev(x)  std-dev(y)
(             0, (0 - 1K)/8K )    999999,057     0.49915     0.50594     0.08122     0.08092     0.28499     0.28447
Assertion failed: (NULL), function do_fmvwx, file src/rv32_jit_template.c, line 556.

@jserv
Copy link
Contributor

jserv commented Dec 14, 2023

Tested on Ubuntu 20.04, Python 3.8 generates the following error messages:

  File "tools/gen-jit-template.py", line 153
    match items[0]:
          ^
SyntaxError: invalid syntax
make: *** [Makefile:130: src/rv32_jit_template.c] Error 1
$ python3 --version
Python 3.8.10

I installed Python 3.9.5 for comparison, and the above message still appear.
According to Status of Python versions, Python 3.8 and 3.9 are still supported by Python Software Foundation.

@qwe661234
Copy link
Collaborator Author

Tested on Ubuntu 20.04, Python 3.8 generates the following error messages:

  File "tools/gen-jit-template.py", line 153
    match items[0]:
          ^
SyntaxError: invalid syntax
make: *** [Makefile:130: src/rv32_jit_template.c] Error 1
$ python3 --version
Python 3.8.10

I installed Python 3.9.5 for comparison, and the above message still appear. According to Status of Python versions, Python 3.8 and 3.9 are still supported by Python Software Foundation.

As alternative, I rewrite python switch into if-else.

@jserv
Copy link
Contributor

jserv commented Dec 14, 2023

The JIT-compiled code can initiate Doom, but it crashes when attempting to start the game.

Tested on Ubuntu Linux 20.04.

src/rv32_template.c Outdated Show resolved Hide resolved
@qwe661234 qwe661234 force-pushed the wip/tiered_1_JIT branch 4 times, most recently from 42fbdd1 to 49f89a1 Compare December 15, 2023 12:01
When the using frequency of a block exceeds a predetermined threshold,
the tier-1 JIT compiler traces the chained block and generate
corresponding low quailty machine code. The resulting target machine
code is stored in the code cache for future utilization.

The primary objective of introducing the tier-1 JIT compiler is to
enhance the execution speed of RISC-V instructions. This implementation
requires two additional components: a tier-1 machine code generator,
and code cache. Furthermore, this tier-1 JIT compiler serves as the
foundational target for future improvements.

In addition, we have developed a Python script that effectively traces
code templates and automatically generates JIT code templates. This
approach eliminates the need for manually writing duplicated code.

As shown in the performance analysis below, the tier-1 JIT compiler's
performance closely parallels that of QEMU in benchmarks with a
constrained dynamic instruction count. However, for benchmarks
featuring a substantial dynamic instruction count or lacking specific
hotspots—examples include pi and STRINGSORT—the tier-1 JIT compiler
demonstrates noticeably slower execution compared to QEMU.

Hence, a robust tier-2 JIT compiler is essential to generate optimized
machine code across diverse execution paths, coupled with a runtime
profiler for detecting hotspots.

* Perfromance
| Metric   | rv32emu-T1C | qemu  |
|----------+-------------+-------|
|aes	   |         0.02|  0.031|
|mandelbrot|	    0.029| 0.0115|
|puzzle	   |       0.0115|  0.009|
|pi        |       0.0413| 0.0177|
|dhrystone |	    0.331|  0.393|
|Nqeueens  |	    0.854|  0.749|
|qsort-O2  |	    2.384|   2.16|
|miniz-O2  |	     1.33|   1.01|
|primes-O2 |	     2.93|  1.069|
|sha512-O2 |	    2.057|  0.939|
|stream	   |       12.747|  10.36|
|STRINGSORT|       89.012| 11.496|

As demonstrated in the memory usage analysis below, the tier-1 JIT
compiler utilizes less memory than QEMU across all benchmarks.

* Memory usage
| Metric   | rv32emu-T1C |   qemu  |
|----------+-------------+---------|
|aes	   |      186,228|1,343,012|
|mandelbrot|	  152,203|  841,841|
|puzzle	   |      153,423|  890,225|
|pi        |      152,923|  879,957|
|dhrystone |	  154,466|  856,404|
|Nqeueens  |	  154,880|  858,618|
|qsort-O2  |	  155,091|  933,506|
|miniz-O2  |	  165,627|1,076,682|
|primes-O2 |	  150,540|  928,446|
|sha512-O2 |	  153,553|  978,177|
|stream	   |      165,911|  957,845|
|STRINGSORT|      167,871|1,104,702|

Related: sysprog21#238
@jserv
Copy link
Contributor

jserv commented Dec 15, 2023

While T1C is capable of running Quake, it does not behave as expected when compared to the interpreter. I encountered an unexpected crash with Quake. To reproduce this, first run make quake and watch the demo scene without any keystrokes. You will then encounter a segmentation fault.

You got the shells
You got the Grenade Launcher
You receive 25 health
You get 2 rockets  
You got the rockets
You got the nailgun
You got the nails
You got the nails
You receive 15 health
You receive 15 health
You got the rockets
/bin/sh: line 1: 71739 Segmentation fault: 11  ../build/rv32emu quake.elf

Th log generated by the interpreter:

You got the shells
You got the Grenade Launcher
You receive 25 health
You get 2 rockets  
You got the rockets
You got the nailgun
You got the nails
You got the nails
You receive 15 health
You receive 15 health
You got the rockets
You get 2 rockets  
You got the nails
You got the nails
You receive 25 health
You got the gold key
Playing demo from demo2.dem.
You got the shells
You got the shells
You got the nails
You got armor
You got the nails
You receive 25 health
You got the Super Nailgun
You got the nails
You get 2 rockets  

@jserv
Copy link
Contributor

jserv commented Dec 15, 2023

Because quake has RV32F instructions, but T1C does not support it now.

We do need an effective fallback for RV32F and RV32A when the JIT compiler is unable to generate the corresponding machine code. A trampoline would be required for switching between translated blocks and non-JIT parts.
See https://github.com/jserv/rv32jit/blob/master/src/codegen/jitabi.cpp

src/decode.h Show resolved Hide resolved
@sysprog21 sysprog21 deleted a comment from qwe661234 Dec 15, 2023
@jserv jserv merged commit f8c698d into sysprog21:master Dec 16, 2023
7 checks passed
@jserv
Copy link
Contributor

jserv commented Dec 16, 2023

Finally, the JIT compiler is incorporated into this emulator repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants