Use portable JIT compilation for accelerating RISC-V emulation #81

jserv · 2022-11-04T10:48:04Z

rv8 demonstrates how RISC-V instruction emulation can benefit from JIT compilation and aggressive optimizations. However, it is dedicated to x86-64 and hard to support other host architectures, such as Apple M1 (Aarch64). SFUZZ is a high performance fuzzer using RISC-V to x86 binary translations with modern fuzzing techniques. RVVM is another example to implement tracing JIT.

The goal of this task to utilize existing JIT framework as a new abstraction layer while we accelerate RISC-V instruction executions. In particular, we would

Avoid direct machine code generation. Instead, most operations are enforced in intermediate representation (IR) level.
Perform common optimization techniques such as peephole optimization. ria-jit performs excellent work in regards to such optimization. See src/gen/instr/patterns.c and MEMZERO and MEMCOPY instructions proposal
Use high-level but still efficient IR. MIR is an interesting implementation, which allows using subset of C11 for IR. SFUZZ's note Code Generation is worth reading. ETISS (Extendable Translating Instruction Set Simulator) translates binary instructions into C code and appends translated code into a block, which will be compiled and executed at runtime. As aforementioned, it is Extendable, thus it supports myriad level of customization by adopting the technique of plug-ins.
Ensure shorter startup-time. It can be achieved by means of lightweight JIT framework and AOT compilation.

The JIT compilation's high level operation can be summed up as follows:

Look in a hash map for a code block matching the current PC
if a block is found
- execute this block
if a block is not found
- allocate a new block
- invoke the translator for this block
- insert it into the hash map
- execute this block

Every block will come to an end after a branch instruction has been translated since translation occurs at the basic block level. Then, there is room for further optimization passes performed on the generated code.

We gain speed by using the technique for the reasons listed below:

No instruction fetch
No instruction decode
Immediate values are baked into translated instructions
- Values of 0 can be optimized
register x0 can be optimized
- No lookup required
- Writes are discarded
Reduced emulation loop overhead
Blocks can be chained based on previous branch pattern for faster lookup

Reference:

Banshee: A Fast LLVM-Based RISC-V Binary Translator
rv8: a high performance RISC-V to x86 binary translator / slides
Dynamic Binary Translation for RISC-V code on x86-64 / slides / cm2 branch
R2VM / Accelerate Cycle-Level Multi-Core RISC-V Simulation with Binary Translation
Dynamic Binary Translator for RISC-V / riscv-dbt
High-Performance RISC-V Emulation: LLVM-based Static Binary Translation (SBT)
Hybrid-DBT: Hardware/Software Dynamic Binary Translation Targeting VLIW / slides / HybridDBT / translationCache
Adaptive simulation with Virtual Prototypes in an open-source RISC-V evaluation platform
Code specialization for the MIR lightweight JIT compiler
A Faster CRuby interpreter with dynamically specialized IR
HIFIVE1-VP
RISC-V static binary translator targeting AArch64 / discussion
A Journey to the Absolute Limit: CKB-VM's LLVM AOT engine
libriscv: RISC-V Binary Translation
Explaining JavaScript VMs in JavaScript - Inline Caches / code: inline-cache
Just-In-Time Compilation on ARM—A Closer Look at Call-Site Code Consistency
LuaJIT Remake: the interpreter and the JIT tiers are automatically generated from a semantical description of the bytecodes. / Building the fastest Lua interpreter.. automatically!
Crafting Interpreters
How I developed a faster Ruby interpreter
Parsing Protobuf at 2+GB/s: How I Learned To Love Tail Calls in C
WasmNow: Fast WebAssembly Baseline Compiler
A fast in-place interpreter for WebAssembly / wizard-engine
Kilite: a complete example for using C2MIR
DarkSwordVM: JIT compilation for LLVM IR
pydrofoil: Fast RISC-V Sail emulation using PyPy/RPython's JIT compiler
Compiling SQLite queries to native code with LLVM / LLVMSQLite
QEMU Tiny-Code Threaded Interpreter
MINRES: DBT-RISE: Dynamic Binary Translation (DBT) based environment to implement instruction set simulator (ISS)

jserv · 2022-11-17T07:36:03Z

Think about whether we want to load a 64-bit immediate under the present RISC-V specification. Despite memory access being substantially slower, it is simpler to just load a constant from memory since it requires less instructions.

In RISC-V

    lui r1, 0x01234
    lui r2, 0x89ABC
    addi r1, 0x567
    addi r2, 0xDEF
    shl r1, 32
    or r1, r2

24-bytes, 6 instruction cycles and 2 registers.

In x86_64:

    movabs rax, 0x0123456789ABCDEF

11-bytes, 1 instruction cycle, 1 register.

Ma-Y · 2022-11-17T14:40:53Z

it's not so apparent where the starting and ending position of a particular code block is . does this mean hard coding pseudo pattern(common code block pattern) for the binary to match on?

jserv · 2022-11-23T05:49:43Z

blink is a virtual machine for running x86-64-linux programs on different operating systems and hardware architectures. Recently, it implements a JIT. Quote from blink/jit.c:

This file implements an abstraction for assembling executable code at runtime. This is intended to be used in cases where it's desirable to have fast "threaded" pathways, between existing functions, which were compiled statically. We need this because virtual machine dispatching isn't very fast, when it's implemented by loops or indirect branches.
Modern CPUs go much faster if glue code without branches is outputted to memory at runtime, i.e. a small function that calls the functions.

jserv · 2022-11-23T05:55:08Z

wasm3 is a fast WebAssembly interpreter without JIT. In general, its strategy seems capable of executing code around 4-15x slower than compiled code on a modern x86 processor.

Reduce bytecode decoding overhead
- Bytecode/opcodes are translated into more efficient "operations" during a compilation pass, generating pages of meta-machine code
- wasm3 trades some space for time. Opcodes map to up to 3 different operations depending on the number of source operands and commutative-ness.
- Commonly occurring sequences of operations can also be optimized into a "fused" operation. This sometimes results in improved performance.
- In wasm3, the stack machine model is translated into a more direct and efficient "register file" approach.
Tightly Chained Operations

Tails is a minimal, fast Forth-like interpreter core. It uses no assembly code, only C++, but an elegant tail-recursion technique inspired by Wasm3 makes it nearly as efficient as hand-written assembly.

jserv · 2022-11-28T07:20:34Z

Benchmark results of rv8: (RV32 only, smaller is better)

-	interpreter	JIT	qemu-user	native-x86
primes	34.33	1.75	1.55	1.00
miniz	40.19	1.95	2.68	1.00
SHA-512	76.23	3.40	3.78	1.00
AES	88.02	2.57	4.08	1.00
qsort	28.82	1.86	7.50	1.00
dhrystone	128.97	2.72	12.04	1.00

qwe661234 · 2023-04-21T06:12:44Z

Our strategy to develop JIT is utilizing Clang to generate optimized target code. The JIT compiler design is shown in Figure 1.
When a block's usage frequency exceeds a predetermined threshold, we trace its taken and untaken branches and use a code generator to convert the instruction sequence into C code. We then compile the C code using Clang and store the resulting target machine code as a function in the code cache for future use.

An example instruction sequence that is a hot spot in Mandelbrot is shown in Figure 2, along with the corresponding EBB. The generated code for this EBB is shown as below.

Figure1
Figure2
Instruction Sequence in Mandelbrot

10750: lw  a5,128(sp)
10754: beq a5,s10,0x10760
10758: add s1,s1,a0
1075c: j   0x10728
10760: mv  s8,a0
10764: sub s9,s1,s7
10768: bne s1,s7,0x10a0c
...

Generated Code

insn_10750:
  ...
  goto insn_10754;
insn_10754:
  ...
  if (...)
    goto insn_10760;
  goto insn_10758;
...

qwe661234 · 2023-05-03T08:13:18Z

The commit 36f304c implements the JIT strategy described above. The benchmark results, as shown in the statistics below, demonstrate that the JIT has a positive effect on benchmarks with long execution times. However, in the case of Mandelbrot, its short execution time means that the overhead of the JIT outweighs its benefits.

Test	rv32emu (interpreter)	rv32emu (JIT)	Speedup
CoreMark	1155.174 (Iterations/Sec)	1796.483 (Iterations/Sec)	+55.5％
dhrystone	1282 DMIPS	2521 DMIPS	+96.64%
nqueens	7766.80 msec	3893.81 msec	+99.47%
mandelbrot	32.28 msec	77.34 msec	-139.59%

qwe661234 · 2023-05-30T07:25:35Z

We experiment the same JIT strategy based on different compiler clang and mir.

clang

Metric	Interpreter-Only	Just-in-time Compiler	Speedup
CoreMark	1155.174 (Iterations/Sec)	1796.483 (Iterations/Sec)	+55.5％
Dhrystone	1282 DMIPS	2521 DMIPS	+96.64%

mir

Metric	Interpreter-Only	Just-in-time Compiler	Speedup
CoreMark	1155.174 (Iterations/Sec)	2194.907 (Iterations/Sec)	+90％
Dhrystone	1282 DMIPS	2522 DMIPS	+96.7%

The issue with Clang is that we need to fork a Clang process, which results in a significant overhead. However, its ability to optimize code is strong. In contrast, launching mir has a relatively small overhead, but its ability to optimize code is relatively weak and we cannot determine the code size of the machine code compiled by mir. This limitation prevents us from using a code cache to manage machine code effectively.

jserv · 2023-06-15T04:03:58Z

The issue with Clang is that we need to fork a Clang process, which results in a significant overhead. However, its ability to optimize code is strong. In contrast, launching mir has a relatively small overhead, but its ability to optimize code is relatively weak and we cannot determine the code size of the machine code compiled by mir. This limitation prevents us from using a code cache to manage machine code effectively.

The preliminary baseline JIT compiler has been landed in wip/jit branch. Meanwhile, it comes with some known issues, including Arm64 breakage. The design principle of baseline JIT is to stick to C2MIR approach where we can improve macro operation fusion and/or RISC-V oriented optimizations.

qwe661234 · 2023-07-04T10:47:13Z

Ongoing

As Utilize dominators for constructing extended basic blocks #142, the related experiment of using a dominator tree to detect loops has been preliminarily implemented on the branch detect_loop.
Improving memory I/O.
Fix the known issue that JIT does not work on macOS on Apple Silicon. JIT compilation works on GNU/Linux for both x86-64 and Arm64.

jserv · 2023-07-05T10:24:15Z

As Utilize dominators for constructing extended basic blocks #142, the related experiment of using a dominator tree to detect loops has been preliminarily implemented on the branch detect_loop.

You shall describe the details in #142 rather than here. In #142, we care about the feasibility to improve block-based execution by introducing dominator tree.

jserv · 2023-07-10T08:21:45Z

The author of RVVM discussed the design choices where it's substantially different to QEMU.

Performance-wise:

Instead of a static translate-and-run flow like in QEMU, RVVM has an interpret-trace-run execution loop which is remotely similar to JVM, and allows to collect some data like branch probabilities and hot loops, and optimize better

Using a hardware host FPU instead of softfp emulation. This is like, 10x faster with some synthetic FPU benchmarks

Conscious decisions for beneficial trade-offs, like fast-path JIT trace cache, JIT IR is more streamlined to "Big ISA Triad" (RISC-V, ARM64, x86-64), etc

Infrastructure-wise:

A public library API for a lot of things: Machine management (Construct and run 'em in any program), device integration, registering new CPU instructions, userspace emulation

Subjectively, a more lean and clean codebase, in places where it wasn't harmed by either 1) performance decisions to copy-paste or restructure things 2) complexity of related things like JIT backend arches

Portability. RVVM officially runs in WASM, runs on Haiku, SerenityOS, KolibriOS, even DOS!

jserv · 2023-08-02T03:41:58Z

lightrec is a MIPS-to-everything dynamic re-compiler (aka JIT compiler or dynrec) for PlayStation emulators, using GNU Lightning as the code emitter. Features:

High-level optimizations. The MIPS code is first pre-compiled into a form of Intermediate Representation (IR). Basically, just a single-linked list of structures representing the instructions. On that list, several optimization steps are performed: instructions are modified, reordered, tagged; new meta-instructions can also be added.
Lazy compilation. If Lightrec detects a block of code that would be very hard to compile properly (e.g. a branch with a branch in its delay slot), the block is marked as not compilable, and will always be emulated with the built-in interpreter. This allows to keep the code emitter simple and easy to understand.
Run-time profiling. The generated code will gather run-time information about the I/O access (whether they hit RAM, or hardware registers). The code generator will then use this information to generate direct read/writes to the emulated memories, instead of jumping to C for every call.
Threaded compilation. When entering a loading zone, where a lot of code has to be compiled, we don't want the compilation process to slow down the pace of emulation. To avoid that, the code compiler optionally runs on a thread, and the main loop will emulate the blocks that have not been compiled yet with the interpreter. This helps to drastically reduce the stutter that typically happens when a lot of new code is run.

Check optimizer.c, blockcache.c, and TLSF for the implementation.

Test hardware: Desktop PC – Core i7 7700k, Windows 10

Game	Interpreter (No Dithering)	Interpreter (With Dithering)	Dynarec (No Dithering)	Dynarec (With Dithering)
Final Doom	246fps	245fps	621fps	616fps
Resident Evil	250fps	248fps	642fps	639fps
Tekken 3	190fps	175fps	279fps	250fps

jserv · 2023-08-03T02:06:17Z

The copyjit draws inspiration from the paper "Copy-and-Patch Compilation." However, what if patching could be entirely eliminated?

The core concept revolves around using the compiler to generate 'templates' that can be directly copied into place. This approach heavily relies on continuation passing, which means that all operations defined by the jit library must allow for continuation passing optimizations. In copy-and-patch, the templates are filled in at runtime with user-selected values. Unfortunately, this method relies on parsing ELF relocations, which necessitates porting the library to different platforms. While not a major issue, avoiding runtime patching of relocations could potentially enable the creation of a JIT library that is architecture agnostic and offers very low latencies.

bcgen generates a number of files in a directory called gen in the working directory. These generated files are included by bcode.c, which you can compile into an object file that provides an interface to compiling and running bytecode system.

See also: A Template-Based Code Generation Approach for MLIR

jserv · 2023-08-16T04:03:26Z

QEMU employs a two-step process for executing binaries, involving an intermediate representation known as tiny code. This tiny code is interpreted in two ways: first through emulation and second via compilation into native code using a JIT compiler, often leading to enhanced speed.

However, the use of JITs demands the allocation of executable memory to house the compiled code, which is not permitted in iOS. To circumvent this restriction, a technique is employed that involves reusing portions of code that are already in executable memory. This concept takes on various names such as code re-use, ROP (Return-Oriented Programming), and ret2code. It is formalized as "weird machines" due to the differing semantics between the original code and final execution.

This process involves the creation of code gadgets, such as ldr x0, [sp], #16; ret (Aarch64), which execute an action and return to their caller upon completion of the first instruction. These gadgets are chained together, forming sequences that achieve specific objectives. The qemu-tcg-tcti project has developed a script that generates pre-compiled code snippets (gadgets) with different functionalities. These gadgets are compiled during the iOS app's build process and are mapped to executable memory, eliminating the need for runtime creation. The final step is the creation of a JIT that constructs a memory segment containing values for use by these gadgets before transitioning to the next one.

This inventive approach allows the creation of complete programs by reusing existing code, a technique historically employed for creating exploits. It provides a creative solution for implementing JIT compilers in architectures that disallow the allocation of executable memory. See commit 4de86e.

UTM already merges the above qemu-tcg-tcti effort (see patches directory):

UTM SE ("slow edition") uses a threaded interpreter which performs better than a traditional interpreter but still slower than JIT. This technique is similar to what iSH does for dynamic execution. As a result, UTM SE does not require jailbreaking or any JIT workarounds and can be sideloaded as a regular app.

Report on Apr 1 2021:

TCI (normal interpreter) boot this VM in 130 seconds
TCTI (threaded interpreter) boots it in only 22 seconds

Reference:

jserv · 2023-09-06T09:58:30Z

pylbbv is a lazy basic block versioning + copy and patch JIT interpreter for CPython.

The copy-and-patch JIT compiler uses a stencil compiler.

At runtime, each basic block, except the branches (exits) are compiled to machine code.
If compilation is successful, execution jumps into the machine code rather than the interpreter bytecode.
The branches remain as CPython interpreter bytecode, to faciliatate easy branching.
Upon encountering a branch, the interpreter leaves the machine code to go back into bytecode.
Execution thus interleaves between machine code and the interpreter.

jserv · 2023-09-10T16:56:28Z

luajit-remake transforms an LLVM function to make it suitable for compilation and back-parsing into a copy-and-patch stencil.

The transformation process involves the following steps:

The function is split into two parts: the fast path and the slow path. The identification of the slow path logic is done using BlockFrequencyInfo, and proper annotations are added to the LLVM IR to enable the identification and separation of the slow path during assembly generation.

A simple heuristic is applied to modify the assembly code. This modification ensures that the function falls through to the next bytecode when it might be beneficial.
The pass comprises two phases: one at the LLVM IR level (IR to IR transformation) and another at the ASM (.s file) level (ASM to ASM transformation).

It is important to note that the IR-level rewrite pass should be executed immediately before the LLVM module is compiled to assembly. Once this pass is applied, no further transformations to the LLVM IR are allowed.

jserv · 2023-09-21T09:38:10Z

Jonathan Müller has an excellent talk on A deep dive into dispatching techniques. He compared the manual jump table and the one generated by optimizing compiler.

Manual jump table

movzx eax, byte ptr [rbx] ; rax := ip->op
jmp qword ptr [r13 + 8*rax] ; goto *execute_table[rax]

Switch jump table

movzx eax, byte ptr [rbx] ; rax := ip->op
movsxd rax, dword ptr [r13 + 4*rax] ; rax := execute_table[rax]
add rax, r13 ; rax := rax + &execute_table
jmp rax ;goto

Compiler generates jump table with 4 byte relative offsets, not 8 byte absolute offsets, resulting faster execution on Intel Core i5-1145G7.

jserv · 2023-10-12T02:42:51Z

WebAssembly Micro Runtime (WAMR) is a lightweight standalone WebAssembly (Wasm) runtime with small footprint, high performance and highly configurable features for applications cross from embedded devices.

jserv · 2023-11-11T18:18:01Z

Possible lightweight JIT framework:

TildeBackend (Tilde or TB for short)
Cwerg
dstogov/ir used by PHP

jserv · 2023-11-30T18:04:22Z

The core of the security concern lies in the inherent complexity of the system. Even extensively used and battle-tested tools like wasmtime have experienced severe vulnerabilities, such as the recent critical bug that could potentially lead to remote code execution (as seen in Guest-controlled out-of-bounds read/write on x86_64 · bytecodealliance/wasmtime).

The strategy employed here, assuming it progresses beyond the experimental phase, comprises three key elements to ensure robust security:

Simplicity: A commitment to keeping all components as straightforward as possible. Simplicity enhances security by reducing the attack surface.
Thorough Testing: The application of exhaustive and comprehensive testing procedures across all aspects of the system, leaving no stone unturned.
Rigorous Sandboxing: Implementing stringent sandboxing mechanisms akin to a highly secure facility, to ensure that even if an attacker were to break out of the virtual machine and achieve remote execution, their capabilities would be severely restricted, potentially limited to consuming the host's CPU resources at most.

jserv · 2023-11-30T18:20:18Z

dstogov/ir used by PHP

An experimental JIT for PHP, built upon dstogov/ir project, has been developed and can be found in the master branch of the php-src repository.

By following the provided build instructions, we can build a development version of PHP, which will display the following:

$ sapi/cli/php --version
PHP 8.4.0-dev (cli) (built: Dec  1 2023 01:59:26) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.4.0-dev, Copyright (c) Zend Technologies

Check if opcache s loaded

$ sapi/cli/php -v | grep -i opcache

Disable opcache.JIT

$ sapi/cli/php -d opcache.jit=off Zend/bench.php
..
Total              0.310

(unit: second)

Enable opcache.JIT

$ sapi/cli/php -d opcache.jit=tracing Zend/bench.php
...
Total              0.089

jserv · 2023-12-25T18:13:44Z

Whose baseline compiler is it anyway? by Ben L. Titzer

We show the design of a new single-pass compiler for a research Wasm engine that integrates with an in-place
interpreter and host garbage collector using value tags, while also supporting flexible instrumentation. In experiments, we measure the effectiveness of optimizations targeting value tags and find, somewhat surprisingly, that the runtime overhead can be reduced to near zero. We also assess the relative compile speed and execution time of six baseline compilers and place these baseline compilers in a two-dimensional tradeoff space with other execution tiers for Wasm.

jserv · 2023-12-30T08:54:12Z

The concept of delay slot in MIPS was initially a straightforward solution to manage pipeline hazards in five-stage pipelines. However, it became a challenge for processors with longer pipelines and the ability to issue multiple instructions per clock cycle. From a software perspective, delay slot has drawbacks, making programs harder to read and often less efficient due to frequently inserting nop (no operation) instructions in the delay slot.

Historically, in the 1980s, the idea of branch delay slot made sense for pipelines consisting of 5 or 6 stages, as it helped to mitigate the one-cycle branch penalty inherent in these systems. But with the evolution of processor architectures, this approach has become outdated. For instance, in modern Pentium microarchitectures, the branch penalty can range from 15 to 25 cycles, rendering a single instruction delay slot ineffective. Implementing a delay slot that could accommodate a 15-instruction delay would be impractical and would disrupt the compatibility of instruction sets.

Advancements in technology have introduced more efficient solutions. Branch prediction, now a mature technology, has proven to be more efficient. The rate of misprediction with current branch predictors is significantly lower than the occurrence of branches with a nop delay slot. This holds true even in systems with a relatively short 6-cycle delay, like the Nios II architecture.

Given these considerations, both in terms of hardware and software efficiency, delay slots are less advantageous. Therefore, modern architectures like RISC-V have chosen to omit the delay slot feature, aligning with current technological capabilities and requirements.

The lightrec, a MIPS recompiler that employs GNU Lightning for code emission, must handle the delay slot characteristic of MIPS. This feature, however, is not present in RISC-V and other more recent RISC designs, which typically exclude the delay slot. This omission reflects a broader trend in newer RISC architectures to move away from this once-common design element.

jserv · 2024-01-08T15:01:08Z

rv64_emulator is a RISC-V ISA emulation suite which contains a full system emulator and an ELF instruction frequency analyzer, with JIT compiler for Arm64.

This was referenced Dec 18, 2022

Investigate interpreter dispatch methods #97

Closed

Deliver an integrated program to illustrate SDL-oriented system calls #101

Closed

jserv mentioned this issue Apr 19, 2023

Rework IR as a graph of extended basic blocks #104

Closed

jserv assigned qwe661234 Apr 20, 2023

jserv mentioned this issue May 27, 2023

Introduce preliminary macro operation fusion #132

Merged

jserv mentioned this issue Nov 30, 2023

Implement fast tier-1 JIT compiler #283

Closed

jserv added the research Study certain topics label Dec 25, 2023

jserv assigned vacantron Jun 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use portable JIT compilation for accelerating RISC-V emulation #81

Use portable JIT compilation for accelerating RISC-V emulation #81

jserv commented Nov 4, 2022 •

edited

Loading

jserv commented Nov 17, 2022

Ma-Y commented Nov 17, 2022

jserv commented Nov 23, 2022

jserv commented Nov 23, 2022 •

edited

Loading

jserv commented Nov 28, 2022 •

edited

Loading

qwe661234 commented Apr 21, 2023

qwe661234 commented May 3, 2023

qwe661234 commented May 30, 2023

jserv commented Jun 15, 2023

qwe661234 commented Jul 4, 2023 •

edited by jserv

Loading

jserv commented Jul 5, 2023

jserv commented Jul 10, 2023 •

edited

Loading

jserv commented Aug 2, 2023 •

edited

Loading

jserv commented Aug 3, 2023 •

edited

Loading

jserv commented Aug 16, 2023 •

edited

Loading

jserv commented Sep 6, 2023

jserv commented Sep 10, 2023

jserv commented Sep 21, 2023

jserv commented Oct 12, 2023

jserv commented Nov 11, 2023

jserv commented Nov 30, 2023

jserv commented Nov 30, 2023

jserv commented Dec 25, 2023

jserv commented Dec 30, 2023 •

edited

Loading

jserv commented Jan 8, 2024

Use portable JIT compilation for accelerating RISC-V emulation #81

Use portable JIT compilation for accelerating RISC-V emulation #81

Comments

jserv commented Nov 4, 2022 • edited Loading

jserv commented Nov 17, 2022

Ma-Y commented Nov 17, 2022

jserv commented Nov 23, 2022

jserv commented Nov 23, 2022 • edited Loading

jserv commented Nov 28, 2022 • edited Loading

qwe661234 commented Apr 21, 2023

qwe661234 commented May 3, 2023

qwe661234 commented May 30, 2023

clang

mir

jserv commented Jun 15, 2023

qwe661234 commented Jul 4, 2023 • edited by jserv Loading

Ongoing

jserv commented Jul 5, 2023

jserv commented Jul 10, 2023 • edited Loading

jserv commented Aug 2, 2023 • edited Loading

Test hardware: Desktop PC – Core i7 7700k, Windows 10

jserv commented Aug 3, 2023 • edited Loading

jserv commented Aug 16, 2023 • edited Loading

jserv commented Sep 6, 2023

jserv commented Sep 10, 2023

jserv commented Sep 21, 2023

jserv commented Oct 12, 2023

jserv commented Nov 11, 2023

jserv commented Nov 30, 2023

jserv commented Nov 30, 2023

jserv commented Dec 25, 2023

jserv commented Dec 30, 2023 • edited Loading

jserv commented Jan 8, 2024

jserv commented Nov 4, 2022 •

edited

Loading

jserv commented Nov 23, 2022 •

edited

Loading

jserv commented Nov 28, 2022 •

edited

Loading

qwe661234 commented Jul 4, 2023 •

edited by jserv

Loading

jserv commented Jul 10, 2023 •

edited

Loading

jserv commented Aug 2, 2023 •

edited

Loading

jserv commented Aug 3, 2023 •

edited

Loading

jserv commented Aug 16, 2023 •

edited

Loading

jserv commented Dec 30, 2023 •

edited

Loading