-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aborts in trace due to loop unroll limit and leaving loop in root trace #1239
base: wingo-next
Are you sure you want to change the base?
Conversation
Clear the penalty slot associated with a bytecode after it is traced successfully. This prevents penalties from accumulating for bytecodes that are traced frequently, such as branchy subroutines that are called from many different root traces and will need to record a side-trace for each. Especially intended to handle the special case when applications are generating code at runtime that creates a "fairly large" number of root traces and that need correspondingly many side-traces to be recorded without prematurely and unproductively blacklisting things.
Increase HOTCOUNT_MAX so that the JIT will make more attempts to trace a bytecode before it blacklists. Expand the HotPenalty.val from 16-bit to 32-bit to accommodate the larger value. HOTCOUNT_MAX is increased from 60,000 to 6,000,000. This is a 100x increase but the effect should be much smaller, log2(100) times, because the penalty value is increased exponentially. I don't entirely understand the existing design of the hotcount penalty: - Why initialize HOTCOUNT_MIN at 36*2? - Why increase the penalty exponentially instead of incrementally? - Why add random entropy to the increases? So I only hope that this patch doesn't break any important properties.
I have worked on this some more and have some additional observations. To prevent the I am getting around 6mpps when running this and can see from
The
|
After adding one more app at commit b7b14cc the performance dropped drastically to around 600kpps. Most of the time snabb spent on interpreted code. I am wondering what could have caused all the traces to be aborted and run in interpreted mode. Dump file is attached, The
|
Just wondering which aborted trace could be behind the high use of interpreted code? How can that be identified? It may be very well be possible for many trace aborts to be normal and not have a bearing on performance, but some could be. feeling kind of lost here :( |
Thank you for using a Pull Request to raise this issue. It is very helpful to have the exact code you are working with connected with the bug report. That way the source code line numbers in the trace can be looked up in the right code version. 👍 The JIT log does not explicitly say what is being blacklisted, but it does include a lot of aborts for the same region of code, and that seems likely to be the problem (or in any event something worth understanding.)
My working hypothesis is that this code is being blacklisted and this is causing some important code path to run interpreted. Could be that raptorjit/raptorjit#102 is the solution i.e. make the JIT give up on the unrolling optimization after a while instead of giving up on the whole trace. We have seen other instances of blacklistings for code starting in intel_mp lately right? Does anybody already have some insight into these? (I have "swapped out" the traces that I was looking at before vacation but maybe somebody can refresh my memory? cc @alexandergall @wingo @eugeneia) |
I am seeing quite a lot of trace aborts related to loop, not just in
`intel_mp`, but in other modules also. The code in PR is just a small
sample we can use for diagnosis.
…On Thu, Nov 2, 2017 at 5:51 PM, Luke Gorrie ***@***.***> wrote:
Thank you for using a Pull Request to raise this issue. It is very helpful
to have the exact code you are working with connected with the bug report.
That way the source code line numbers in the trace can be looked up in the
right code version. 👍
The JIT log does not explicitly say what is being blacklisted, but it does
include a lot of aborts for the same region of code, and that seems likely
to be the problem (or in any event something worth understanding.)
130930:---- TRACE 112 start 95/18 intel_mp.lua:595
134405:---- TRACE 112 abort intel_mp.lua:596 -- loop unroll limit reached
134408:---- TRACE 112 start 95/18 intel_mp.lua:595
137883:---- TRACE 112 abort intel_mp.lua:596 -- loop unroll limit reached
137886:---- TRACE 112 start 95/18 intel_mp.lua:595
141361:---- TRACE 112 abort intel_mp.lua:596 -- loop unroll limit reached
141364:---- TRACE 112 start 95/18 intel_mp.lua:595
144839:---- TRACE 112 abort intel_mp.lua:596 -- loop unroll limit reached
My working hypothesis is that this code is being blacklisted and this is
causing some important code path to run interpreted. Could be that
raptorjit/raptorjit#102
<raptorjit/raptorjit#102> is the solution i.e.
make the JIT give up on the unrolling optimization after a while instead of
giving up on the whole trace.
We have seen other instances of blacklistings for code starting in
intel_mp lately right? Does anybody already have some insight into these?
(I have "swapped out" the traces that I was looking at before vacation but
maybe somebody can refresh my memory? cc @alexandergall
<https://github.com/alexandergall> @wingo <https://github.com/wingo>
@eugeneia <https://github.com/eugeneia>)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1239 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAB3whF9LCWyu7R_SCdnwejFx6_xrBPVks5sybOxgaJpZM4PsWIb>
.
|
Create an execution barrier that a JIT trace cannot cross. During recording, the current trace must end when it reaches a barrier and then a new root trace starts immediately afterwards. This is implemented as a simple "nop" C library function. The existing "trace stitching" mechanism provides the required semantics i.e. stop current trace, call nop function, start new trace with linkage.
Merged lukego/jit-tracebarrier and ran the tests again. |
I am using this problem as a test case for improved profiler functions. Have updated the profiler backend in raptorjit/raptorjit#124 with information that should point to a solution, and now I need to update the Studio frontend so that we can see what it means :). |
raptorjit: amend prev. commit, also clear trace exit hotcounts
Creating a PR for loop aborts I am seeing so that @lukego can take a look.
This PR is based on
wingo-next
and has merged inlukego/luajit-reset-penalty-cache
.The test app is at
program/jit_loop/
and a helper app is atapps/jit_loop/
. The app is basically a bridge which copies packets from one interface to another after changing mac and sends to next hop. Before running the app edit the PCI info, and MAC address injit_loop.lua
.I am running this as:
sudo ./snabb snsh -jdump=+rsxaA,dump.txt -jtprof -p jit_loop
Getting lots of trace abort as shown below: