-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(composeMatrix): 25% improv by restoring v5 implementation #9851
Conversation
Review or Edit in CodeSandboxOpen the branch in Web Editor • VS Code • Insiders |
i know the old was faster because the mutliplyArray has an extra array and a free wasted multiplication by identity. I think i tried to add this conversation in the past too. Please leave a comment in both places of the code linking to the old PR: #8894 and writing down that this code has been reverted after changes because of performance reasons and that if we want to change it back we first need to be sure the new implementation has some gains compared to the old one When we change some code and then we change it back for performance reason i tried to add the habit to leave a small note in a benchmark folder that compares the old code with the new one, i will add that so you don't have to bother |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who is to blame? Array#reduce or the array creation or both?
I am interested
I would argue then that perhaps we should do a manual calculation in this hot path not relaying on multiplyTransformMatrices
if that increases perf even more.
It is not a lot of code
let a,b,c,d,e,f
e = translateX
f = translateY
...
It's the array creation and multiplication. Mostly multiplication.
You can try it out quickly in the CodeSandbox to see if it improves FPS with 8/16k objects. |
to blame is generic code over specialized one, in general. Is not reduce per se, is adding a function stack for each multiplication and a free entire multiplication In the case of translation only:
We arrive here with an array that we created to wrap the matrix, we start a reduce and we multiply the translate matrix by the identity matrix that serves as initial value. When N goes up the initial mutliplication cost is diluted, but for a single matrix is 100% . But given the fact that our top case is 3-4 matrices, this extra multiplication is always a consistent percentage The old implementation in the best case scenario just pass the matrix through the functions and returns it. The other possible improvement is to change the code so that translate matrix is created only when there is no rotation but that remains not useful till we have originX and originY |
let matrix = createScaleMatrix( | ||
flipX ? -scaleX : scaleX, | ||
flipY ? -scaleY : scaleY | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i m sure at some point we were checking for scale !== 1 and flip thruty to also do this.
I m not sure is worth to create it rather than returning the iMatrix as a reference and then check by strict equality.
calcOwnMatrix has been split for reusal across fabric but maybe was a bad idea, it was good as a code block doing its thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I m not sure is worth to create it rather than returning the iMatrix as a reference and then check by strict equality.
I thought about that but it's dangerous in fabric returning the same reference, as the iMatrix
array could be mutated with terrible consequences
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iMatrix is frozen and will throw if you try to mutate it, we can't return it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but maybe it is quicker to create it than to clone it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't mutate matrix we always create new i don't see any issue with returning iMatrix when necessary. but anyway this PR is a revert basically, so is fine as it is it does not need more changes apart the code comments
But it shouldn't do more multiplications, should it? It should skip if an entry is falsy |
The initial value is always multiplicated by the first matrix that comes from the array. So the first matrix is always multiplied by Imatrix |
Nice catch! that is the problem here, the initial value.
|
Bear in mind that
|
We do not need to dig into this is a matter of time to dedicate to other things. |
A simple gain. Here is the fix:
|
@jiayihu please validate that the perf is the same compared to v5 |
Yes you can fix multiplyMatricesArray, but shouldn't be in hot code. |
I find this ugly, I'd keep this PR and additionally do this only: export const multiplyTransformMatrixArray = (
matrices: (TMat2D | undefined | null | false)[],
is2x2?: boolean
) =>
matrices.reduceRight(
(product: TMat2D|undefined, curr) =>
curr && product? fabric.util.multiplyTransformMatrices(curr, product, is2x2) : curr||product,
undefined
) || fabric.iMatrix.concat(); |
I don't mind cleaning up the code a bit, e.g. exposing a util |
I think the old code is simpler to understand and it worked well. Extracting a function typically doesn't solve the problem, you're just adding more indirection. |
Though I believe we should fix bugs, do as you please. |
^^^ This is a comment left in rush. The change for the extra multiplication is one topic. If there are bugs in those 2 topics are eventually introduced by swapping the order of multiplications of matrices and float precision. Both of the open PRs do not address any known bug |
Can you clarify why you can't fix the implementation of multiplyArray and at the same time we can switch important code to a more lean implementation? What is the issue there? |
No issue, I just don't understand why. |
Because this loop over matrices in hot code is not a good idea. It was brought in as a cleanup and i let go discussing the consequences of it after many messages in the old pr. The perf hit surfaced the occasion to re-discuss it and we can fix the utility and restore the old more verbose, more byte consuming but maybe more appropriate for the situation code. for me It was a bad idea also to have the calcTransformMatrix in a method that calls an utility that compose the matrices, because even just creating the object to pass down the information and then trash it is wasted time and resources. When we will get to performances i think we will rollback a bunch of other functions to uglier code |
Any update on the final decision? |
Sorry i took a little break and i just re-started working on the website today. |
Removing the spread operator from the options seems to bring the difference to up to 50%. |
The wild thing is that if apply the fix to the transform matrix array multiplication in this way
The benchmark is more or less the same in percentage, but everything drops to half, and i can't explain that. |
Maybe because you're allocating an array each time and it must be garbage collected. On the benchmark (if you refer to the node.js one) maybe it's not significant because the GC doesn't run during it, it can be delayed the sync benchmark iterations has finished. On the browser however it has to do it between frames, dropping them by half. |
Based on the findings of fabricjs/canvas-engines-comparison#1 and my own experiments because I'm writing an article about analysing performance (soon to be published 😏), I saw that
composeMatrix
performance can be significantly improved for non-rotated/scaled objects by just returningcreateTranslateMatrix(translateX, translateY)
.Then I remembered that @ShaMan123 mentioned to me that he changed the functions to be more declarative in v6, so I looked back at how it was and noticed indeed that it was more performant before: d0d0cfb#diff-c771d7d885099c09929056e6d495faa0cd32bd1269dda11623cd1dddf4122b59
Previously we would avoid uselessly multiplying matrixes if not needed. In my tests this simple change can result in ~25% improved performance, without any API change.
You can test it yourself @asturur, I've prepare a benchmark clone to try out changes https://codesandbox.io/p/sandbox/fabric-bench-forked-4xfvnd. Just change this line in
OptimisedRect#calcOwnMatrix
:To
The top-right FPS meter is by no means accurate but it's decent indicator of % of improvement. The Chrome Devtools flame chart showed me that
renderCanvas
time changed from 45ms to 35ms for 16k objects, around 28% improvement. Alternatively you can use the Chrome FPS meter. Whatever you use to measure, there's no doubt about the performance change.You can also try out the other changes yourself, such as removing the cache key completely in
calcOwnMatrix
improves the FPS meter by 50% for 16k objects, 80% for 8k objects. But changing cache key is less trivial so we can discuss that on a separate PR/issue.