-
Notifications
You must be signed in to change notification settings - Fork 7.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Link to the compiled function to improve performance #12182
base: master
Are you sure you want to change the base?
Conversation
This patch has no conflict with #12079 |
Benchmark shows a 1.59% regression for Zend/bench.php JIT. That benchmark is generally the most stable, so I would consider this legitimate. Symfony Demo and Wordpress show improvements (-0.65% and -0.07%, respectively). |
Tracing over the already compiled function was done on purpose. This opens possibilities for new specializations and optimizations (similar to LuaJIT). I'll take a look a bit later (probably next week). I think, the patch may be improved using a bit smarter heuristic - link to previous trace only if the trace of the inlined function become too long. |
Maybe we can add a parameter to link to the previous trace only if the trace of the inlined function becomes too long. |
Yeah. You can of course. You may add something like |
98d7480
to
da96e5f
Compare
Yeah, I tried it in my experiments. The smaller value opcache.jit_trace_inline_limit
I update this ~ |
2da8d10
to
8ffa0e9
Compare
} else if ( idx > JIT_G(jit_trace_inline_func_limit) && \ | ||
backtrack_link_to_inline_func < 0 && \ | ||
(ZEND_OP_TRACE_INFO(opline, offset)->trace_flags & ZEND_JIT_TRACE_JITED)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you check idx
improperly and in wrong place.
It should be checked in the next chunk, like
} else if (backtrack_link_to_inline_func > 0 &&
idx - baktrack_link_to_inline_func > JIT_G(jit_trace_inline_func_limit)) {
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we also don't use backslashes in multi-line if
conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, if we successfully inlined function into trace we should reset backtrack_link_to_inline_func
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea is that when the trace is too long and idx exceeds the limit value, we check whether the inline function has been compiled at the start of the inline function.
Do you mean to just judge the length of inline functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea is that when the trace is too long and idx exceeds the limit value
Then the name opcache.jit_trace_inline_func_limit
doesn't reflect what you are doing and you might stop tracing directly without "backtracking".
I think your idea is less obvious and efficient.
We should be able to form quite long traces with many short getters and setters inlined.
Do you mean to just judge the length of inline functions?
yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dear maintainer, Hope to get your reply~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else if (ZEND_OP_TRACE_INFO(opline, offset)->trace_flags & ZEND_JIT_TRACE_JITED) { backtrack_link_to_inline_func = idx; link_to_inline_func_opline = opline; } if (backtrack_link_to_inline_func > 0 && idx - baktrack_link_to_inline_func > JIT_G(jit_trace_inline_func_limit)) { break; }
It hard to say without a full patch.
something similar, but you do break
without setting end_opline
and stop
. Do I miss something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, Let me have an update. Each time we enter a function when recording, we judge the length of the inline function. I get a 1% TPS gain on WordPress benchmark. I hope to get your review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else if (ZEND_OP_TRACE_INFO(opline, offset)->trace_flags & ZEND_JIT_TRACE_JITED) { backtrack_link_to_inline_func = idx; link_to_inline_func_opline = opline; } if (backtrack_link_to_inline_func > 0 && idx - baktrack_link_to_inline_func > JIT_G(jit_trace_inline_func_limit)) { break; }
It hard to say without a full patch. something similar, but you do
break
without settingend_opline
andstop
. Do I miss something?
I have updated the patch and how about that ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the patch and how about that ?
I'll able to review this only on Monday
When JIT is recording, backtrack the trace if encountering a compiled inline function and link to this function later. This reduces the runtime compilation overhead and duplicated JITTed code. Smaller code size has better cache efficiency, which brings 1.0% performance gain in our benchmark on x86. Signed-off-by: Wang, Xue <[email protected]> Signed-off-by: Yang, Lin A <[email protected]> Signed-off-by: Su, Tao <[email protected]>
8ffa0e9
to
72f219b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wxue1 could you please test the behaviour of your patch
test.php
<?php
class Foo {
private $x = 0, $y = 0;
function getX() {
return $this->x * $this->x + $this->y * $this->y;
}
}
$o = new Foo();
for ($i = 0; $i < 10; $i++) {
$o->getX($i);
}
?>
$ sapi/cli/php -d opcache.jit=1254 -d opcache.jit_hot_func=2 -d opcache.jit_hot_loop=2 -d opcache.jit_trace_inline_func_limit=3 -d opcache.jit_debug=0x80000 test.php
---- TRACE 1 TSSA start (loop) $main() /home/dmitry/php/php-master/CGI-RELEASE-64/test.php:9
;#0.CV0($o) [!undef, ref, rc1, rcn, any]
;#1.CV1($i) [undef, ref, rc1, rcn, any]
LOOP:
;#3.CV1($i) [!undef, ref, rc1, rcn, any] = Phi(#1.CV1($i) [undef, ref, rc1, rcn, any], #13.CV1($i) [undef, ref, rc1, rcn, any])
0009 #4.T2 [bool] = IS_SMALLER #3.CV1($i) [!undef, ref, rc1, rcn, any] int(10) ; op1(int)
0010 ;JMPNZ #4.T2 [bool] 0005
0005 INIT_METHOD_CALL 1 #0.CV0($o) [!undef, ref, rc1, rcn, any] string("getX") ; op1(object of class Foo)
>init Foo::getX
0006 SEND_VAR_EX #3.CV1($i) [!undef, ref, rc1, rcn, any] -> #5.CV1($i) [!undef, ref, rc1, rcn, any] 1 ; op1(int)
0007 DO_FCALL
>enter Foo::getX
0000 #6.T0 [!long] = FETCH_OBJ_R THIS string("x") ; val(int)
0001 #7.T2 [!long] = FETCH_OBJ_R THIS string("x") ; val(int)
0002 #8.T1 [!long] = MUL #6.T0 [!long] #7.T2 [!long] ; op1(int) op2(int)
0003 #9.T0 [!long] = FETCH_OBJ_R THIS string("y") ; val(int)
0004 #10.T3 [!long] = FETCH_OBJ_R THIS string("y") ; val(int)
0005 #11.T2 [!long] = MUL #9.T0 [!long] #10.T3 [!long] ; op1(int) op2(int)
0006 #12.T0 [!long] = ADD #8.T1 [!long] #11.T2 [!long] ; op1(int) op2(int)
0007 RETURN #12.T0 [!long] ; op1(int)
<back /home/dmitry/php/php-master/CGI-RELEASE-64/test.php
0008 PRE_INC #5.CV1($i) [!undef, ref, rc1, rcn, any] -> #13.CV1($i) [undef, ref, rc1, rcn, any] ; op1(int)
---- TRACE 1 TSSA stop (loop)
Your patch is intended to limit inlining of function above specified length (3), but it doesn't do it (function of length 8 is inlined). What is wrong?
Since you propose this as a performance improvement, it would be great to see some benchmark results. I'll need repeat that benchmark sand rerun my own ones to confirm the improvement.
For this case where the function is only inlined once, this patch allows inlining. I know you want to backtrack to FuncA as long as FuncB is too long whether or not the function has been JITTed. Or maybe we could return to the original easy code? patch2 Actually, this patch "Link to the compiled function to improve performance" is different from the previous patch about JIT long inline functions ( PR #10897 ) WordPress JIT Memory 1212kb -> 1019kb |
When JIT is recording, backtrack the trace if encountering a compiled inline function and link to this function later. This reduces the runtime compilation overhead and duplicated JITTed code. Smaller code size has better cache efficiency, which brings 1.7% performance gain in our benchmark on x86.