Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(composeMatrix): 25% improv by restoring v5 implementation #9851

Merged
merged 7 commits into from
May 19, 2024

Conversation

jiayihu
Copy link
Contributor

@jiayihu jiayihu commented May 5, 2024

Based on the findings of fabricjs/canvas-engines-comparison#1 and my own experiments because I'm writing an article about analysing performance (soon to be published 😏), I saw that composeMatrix performance can be significantly improved for non-rotated/scaled objects by just returning createTranslateMatrix(translateX, translateY).

Then I remembered that @ShaMan123 mentioned to me that he changed the functions to be more declarative in v6, so I looked back at how it was and noticed indeed that it was more performant before: d0d0cfb#diff-c771d7d885099c09929056e6d495faa0cd32bd1269dda11623cd1dddf4122b59

Previously we would avoid uselessly multiplying matrixes if not needed. In my tests this simple change can result in ~25% improved performance, without any API change.

You can test it yourself @asturur, I've prepare a benchmark clone to try out changes https://codesandbox.io/p/sandbox/fabric-bench-forked-4xfvnd. Just change this line in OptimisedRect#calcOwnMatrix:

const value = composeMatrix(options);

To

const value = fabric.util.composeMatrix(options);

The top-right FPS meter is by no means accurate but it's decent indicator of % of improvement. The Chrome Devtools flame chart showed me that renderCanvas time changed from 45ms to 35ms for 16k objects, around 28% improvement. Alternatively you can use the Chrome FPS meter. Whatever you use to measure, there's no doubt about the performance change.

You can also try out the other changes yourself, such as removing the cache key completely in calcOwnMatrix improves the FPS meter by 50% for 16k objects, 80% for 8k objects. But changing cache key is less trivial so we can discuss that on a separate PR/issue.

Copy link

codesandbox bot commented May 5, 2024

Review or Edit in CodeSandbox

Open the branch in Web EditorVS CodeInsiders

Open Preview

@jiayihu jiayihu changed the title perf(composeMatrix): restore v5 implementation perf(composeMatrix): 25% improv by restoring v5 implementation May 5, 2024
@asturur
Copy link
Member

asturur commented May 6, 2024

i know the old was faster because the mutliplyArray has an extra array and a free wasted multiplication by identity. I think i tried to add this conversation in the past too.

Please leave a comment in both places of the code linking to the old PR: #8894 and writing down that this code has been reverted after changes because of performance reasons and that if we want to change it back we first need to be sure the new implementation has some gains compared to the old one

When we change some code and then we change it back for performance reason i tried to add the habit to leave a small note in a benchmark folder that compares the old code with the new one, i will add that so you don't have to bother

@asturur asturur requested a review from ShaMan123 May 6, 2024 07:28
Copy link
Contributor

@ShaMan123 ShaMan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who is to blame? Array#reduce or the array creation or both?
I am interested
I would argue then that perhaps we should do a manual calculation in this hot path not relaying on multiplyTransformMatrices if that increases perf even more.
It is not a lot of code

let a,b,c,d,e,f
e = translateX
f = translateY
...

@jiayihu
Copy link
Contributor Author

jiayihu commented May 6, 2024

Who is to blame? Array#reduce or the array creation or both?

It's the array creation and multiplication. Mostly multiplication.

I would argue then that perhaps we should do a manual calculation in this hot path not relaying on multiplyTransformMatrices if that increases perf even more.

You can try it out quickly in the CodeSandbox to see if it improves FPS with 8/16k objects.

@asturur
Copy link
Member

asturur commented May 6, 2024

to blame is generic code over specialized one, in general.

Is not reduce per se, is adding a function stack for each multiplication and a free entire multiplication

In the case of translation only:

export const multiplyTransformMatrixArray = (
  matrices: (TMat2D | undefined | null | false)[],
  is2x2?: boolean
) =>
  matrices.reduceRight(
    (product: TMat2D, curr) =>
      curr ? multiplyTransformMatrices(curr, product, is2x2) : product,
    iMatrix
  );

We arrive here with an array that we created to wrap the matrix, we start a reduce and we multiply the translate matrix by the identity matrix that serves as initial value.

When N goes up the initial mutliplication cost is diluted, but for a single matrix is 100% . But given the fact that our top case is 3-4 matrices, this extra multiplication is always a consistent percentage
On top of that for each mutliplication we have to enter in and out of the stack of the function.

The old implementation in the best case scenario just pass the matrix through the functions and returns it.

The other possible improvement is to change the code so that translate matrix is created only when there is no rotation but that remains not useful till we have originX and originY

Comment on lines +280 to 283
let matrix = createScaleMatrix(
flipX ? -scaleX : scaleX,
flipY ? -scaleY : scaleY
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i m sure at some point we were checking for scale !== 1 and flip thruty to also do this.
I m not sure is worth to create it rather than returning the iMatrix as a reference and then check by strict equality.

calcOwnMatrix has been split for reusal across fabric but maybe was a bad idea, it was good as a code block doing its thing

Copy link
Contributor Author

@jiayihu jiayihu May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I m not sure is worth to create it rather than returning the iMatrix as a reference and then check by strict equality.

I thought about that but it's dangerous in fabric returning the same reference, as the iMatrix array could be mutated with terrible consequences

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iMatrix is frozen and will throw if you try to mutate it, we can't return it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but maybe it is quicker to create it than to clone it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't mutate matrix we always create new i don't see any issue with returning iMatrix when necessary. but anyway this PR is a revert basically, so is fine as it is it does not need more changes apart the code comments

@ShaMan123
Copy link
Contributor

Who is to blame? Array#reduce or the array creation or both?

It's the array creation and multiplication. Mostly multiplication.

But it shouldn't do more multiplications, should it? It should skip if an entry is falsy

@asturur
Copy link
Member

asturur commented May 6, 2024

Who is to blame? Array#reduce or the array creation or both?

It's the array creation and multiplication. Mostly multiplication.

But it shouldn't do more multiplications, should it? It should skip if an entry is falsy

The initial value is always multiplicated by the first matrix that comes from the array. So the first matrix is always multiplied by Imatrix

@ShaMan123
Copy link
Contributor

We arrive here with an array that we created to wrap the matrix, we start a reduce and we multiply the translate matrix by the identity matrix that serves as initial value.

Nice catch! that is the problem here, the initial value.
So I wish to try out the reduce without an initial value because since there is always an initial value the condition never applies to reduce calculation! curr is always define!
I am to blame!

export const multiplyTransformMatrixArray = (
  matrices: (TMat2D | undefined | null | false)[],
  is2x2?: boolean
) =>
  matrices.reduceRight(
    (product: TMat2D, curr) =>
      curr ? multiplyTransformMatrices(curr, product, is2x2) : product,
    iMatrix
  );

@jiayihu
Copy link
Contributor Author

jiayihu commented May 6, 2024

Bear in mind that multiplyTransformMatrixArray is not used anymore in composeMatrix in this PR so this is true:

But it shouldn't do more multiplications, should it? It should skip if an entry is falsy

@asturur
Copy link
Member

asturur commented May 6, 2024

We do not need to dig into this is a matter of time to dedicate to other things.
We figured out there is a subtantial difference between new and old, we put back the old.
We move on.
We know we will need to remove originX and originY and the matrix logic will change again, there is also matrix caching in the middle of this conversation, this code needs to be re-evaluated again.
Now trying to add code inside the reduce function to skip the initial multiplication, but keep the function stack and the array loop, why? what gain does it bring?

@ShaMan123
Copy link
Contributor

ShaMan123 commented May 6, 2024

A simple gain.
We understand what was wrong and we fix.
Why not?
multiplyTransformMatrixArray is heavily used everywhere. Why leave this bug? What gain does that give?

Here is the fix:


export const multiplyTransformMatrixArray = (
  matrices: (TMat2D | undefined | null | false)[],
  is2x2?: boolean
) =>
  matrices.reduceRight(
    (product: TMat2D|undefined, curr) =>
      curr && product? fabric.util.multiplyTransformMatrices(curr, product, is2x2) : curr||product,
    undefined
  ) || fabric.iMatrix.concat();

const composeMatrix = ({
  translateX,
  translateY,
  angle,
  scaleX, 
  scaleY, skewX,skewY,flipX,flipY,
}) => {
  return multiplyTransformMatrixArray([
    (translateX || translateY) && createTranslateMatrix(translateX, translateY),
    angle && createRotateMatrix({ angle }),
    ((scaleX && scaleX !== 1) || (scaleY && scaleY !== 1) || skewX || skewY || flipX || flipY) && calcDimensionsMatrix({ scaleX, scaleY, skewX,skewY,flipX,flipY,})
  ]);
};

@ShaMan123
Copy link
Contributor

ShaMan123 commented May 6, 2024

@jiayihu please validate that the perf is the same compared to v5
I do not see any difference using https://codesandbox.io/p/sandbox/fabric-bench-composematrix-forked-r2w2sz

@asturur
Copy link
Member

asturur commented May 6, 2024

Yes you can fix multiplyMatricesArray, but shouldn't be in hot code.

@jiayihu
Copy link
Contributor Author

jiayihu commented May 6, 2024

((scaleX && scaleX !== 1) || (scaleY && scaleY !== 1) || skewX || skewY || flipX || flipY) && calcDimensionsMatrix({ scaleX, scaleY, skewX,skewY,flipX,flipY,})

I find this ugly, I'd keep this PR and additionally do this only:

export const multiplyTransformMatrixArray = (
  matrices: (TMat2D | undefined | null | false)[],
  is2x2?: boolean
) =>
  matrices.reduceRight(
    (product: TMat2D|undefined, curr) =>
      curr && product? fabric.util.multiplyTransformMatrices(curr, product, is2x2) : curr||product,
    undefined
  ) || fabric.iMatrix.concat();

@ShaMan123
Copy link
Contributor

ShaMan123 commented May 6, 2024

I don't mind cleaning up the code a bit, e.g. exposing a util hasMatrixDomensionsProps
I did it as such due to your v8 optimization comments.
The demo proves that with the fix there is no difference between the v5 code and the v6 so now I believe you guys should defend this PR. You wish to revert to old code that is less compact and adds no benefit.
Why? Please explain

@jiayihu
Copy link
Contributor Author

jiayihu commented May 6, 2024

I think the old code is simpler to understand and it worked well. Extracting a function typically doesn't solve the problem, you're just adding more indirection.

@ShaMan123
Copy link
Contributor

Though I believe we should fix bugs, do as you please.
Feel free to merge or close #9860

@ShaMan123 ShaMan123 closed this May 6, 2024
@ShaMan123 ShaMan123 reopened this May 6, 2024
@asturur
Copy link
Member

asturur commented May 6, 2024

Though I believe we should fix bugs, do as you please.

^^^ This is a comment left in rush.

The change for the extra multiplication is one topic.
Not using loops of functions over matrices in hot code is another topic

If there are bugs in those 2 topics are eventually introduced by swapping the order of multiplications of matrices and float precision.

Both of the open PRs do not address any known bug

@asturur
Copy link
Member

asturur commented May 6, 2024

Can you clarify why you can't fix the implementation of multiplyArray and at the same time we can switch important code to a more lean implementation? What is the issue there?

@ShaMan123
Copy link
Contributor

No issue, I just don't understand why.
You guys found a perf hit.
Then we found the bug that caused it.
I opened a PR to fix it and it seems to have fixed the perf issue that is the reason for this PR.
If there is no benefit why change the code?

@asturur
Copy link
Member

asturur commented May 7, 2024

Because this loop over matrices in hot code is not a good idea. It was brought in as a cleanup and i let go discussing the consequences of it after many messages in the old pr.
Is good for an utility to write terse code but not in one of the most run piece of code.

The perf hit surfaced the occasion to re-discuss it and we can fix the utility and restore the old more verbose, more byte consuming but maybe more appropriate for the situation code.

for me It was a bad idea also to have the calcTransformMatrix in a method that calls an utility that compose the matrices, because even just creating the object to pass down the information and then trash it is wasted time and resources.

When we will get to performances i think we will rollback a bunch of other functions to uglier code

@jiayihu
Copy link
Contributor Author

jiayihu commented May 15, 2024

Any update on the final decision?

@asturur
Copy link
Member

asturur commented May 17, 2024

Sorry i took a little break and i just re-started working on the website today.
Definitely want to merge this and the other.
I ll clean up what is left to clean up and i ll merge

@asturur
Copy link
Member

asturur commented May 19, 2024

Removing the spread operator from the options seems to bring the difference to up to 50%.

@asturur
Copy link
Member

asturur commented May 19, 2024

The wild thing is that if apply the fix to the transform matrix array multiplication in this way

util2.multiplyTransformMatrixArray = (matrices, is2x2) =>
  matrices.reduceRight(
    (product, curr) =>
      curr && product
        ? util2.multiplyTransformMatrices(curr, product, is2x2)
        : curr || product,
    undefined
  ) || [1, 0, 0, 1, 0, 0];

The benchmark is more or less the same in percentage, but everything drops to half, and i can't explain that.

@asturur asturur merged commit 759ee7c into fabricjs:master May 19, 2024
20 of 22 checks passed
@jiayihu
Copy link
Contributor Author

jiayihu commented May 19, 2024

The benchmark is more or less the same in percentage, but everything drops to half, and i can't explain that

Maybe because you're allocating an array each time and it must be garbage collected. On the benchmark (if you refer to the node.js one) maybe it's not significant because the GC doesn't run during it, it can be delayed the sync benchmark iterations has finished. On the browser however it has to do it between frames, dropping them by half.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants