Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(optimization): Tiny optimization for pointsInternal #60138

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lbartoletti
Copy link
Member

Description

This PR optimizes the pointsInternal method by caching trigonometric calculations used in ellipse point generation. The implementation precomputes sine and cosine values before generating ellipse points, reducing the number of expensive trigonometric function calls in the main loop.
Performance testing shows significant improvements for large segment counts (>50), while introducing negligible overhead for the default segment count (~32). This trade-off is acceptable as it optimizes the most computationally intensive cases where high-resolution ellipses are required.

benchmark:

236: Segments: 4
236: toLineString Time: 2 µs
236: toLineString2 Time: 3 µs
236: ------------------------------------
236: Segments: 8
236: toLineString Time: 2 µs
236: toLineString2 Time: 2 µs
236: ------------------------------------
236: Segments: 16
236: toLineString Time: 2 µs
236: toLineString2 Time: 2 µs
236: ------------------------------------
236: Segments: 32
236: toLineString Time: 4 µs
236: toLineString2 Time: 6 µs
236: ------------------------------------
236: Segments: 64
236: toLineString Time: 6 µs
236: toLineString2 Time: 5 µs
236: ------------------------------------
236: Segments: 128
236: toLineString Time: 12 µs
236: toLineString2 Time: 9 µs
236: ------------------------------------
236: Segments: 256
236: toLineString Time: 19 µs
236: toLineString2 Time: 17 µs
236: ------------------------------------
236: Segments: 512
236: toLineString Time: 37 µs
236: toLineString2 Time: 33 µs
236: ------------------------------------
236: Segments: 1024
236: toLineString Time: 73 µs
236: toLineString2 Time: 72 µs
236: ------------------------------------
236: Segments: 2048
236: toLineString Time: 153 µs
236: toLineString2 Time: 132 µs
236: ------------------------------------
236: Segments: 4096
236: toLineString Time: 303 µs
236: toLineString2 Time: 268 µs
236: ------------------------------------
236: Segments: 8192
236: toLineString Time: 607 µs
236: toLineString2 Time: 552 µs
236: ------------------------------------
236: Segments: 16384
236: toLineString Time: 1200 µs
236: toLineString2 Time: 1123 µs
236: ------------------------------------
236: Segments: 32768
236: toLineString Time: 2510 µs
236: toLineString2 Time: 2177 µs
236: ------------------------------------
236: Segments: 65536
236: toLineString Time: 4882 µs
236: toLineString2 Time: 4486 µs
236: ------------------------------------
236: Segments: 131072
236: toLineString Time: 10016 µs
236: toLineString2 Time: 9122 µs
236: ------------------------------------
236: Segments: 262144
236: toLineString Time: 19958 µs
236: toLineString2 Time: 18301 µs
236: ------------------------------------
236: Segments: 524288
236: toLineString Time: 39929 µs
236: toLineString2 Time: 36671 µs
236: ------------------------------------

@github-actions github-actions bot added this to the 3.42.0 milestone Jan 14, 2025
@lbartoletti lbartoletti self-assigned this Jan 14, 2025
@lbartoletti lbartoletti added API API improvement only, no visible user interface changes backport release-3_34 backport release-3_40 labels Jan 14, 2025
Copy link

github-actions bot commented Jan 14, 2025

🪟 Windows builds

Download Windows builds of this PR for testing.
Debug symbols for this build are available here.
(Built from commit 712b820)

🪟 Windows Qt6 builds

Download Windows Qt6 builds of this PR for testing.
(Built from commit 712b820)

@uclaros
Copy link
Contributor

uclaros commented Jan 14, 2025

Are those results on a release build?
I'd expect the compiler/cpu to render this kind of optimization unnecessary.

@rouault
Copy link
Contributor

rouault commented Jan 14, 2025

I'd expect the compiler/cpu to render this kind of optimization unnecessary.

I'm not sure the compiler (before C++26 which will declare them constexpr: https://en.cppreference.com/w/cpp/numeric/math/cos) can infer that the output of std::cos() doesn't change for a given output.
But instead of creating an array, which takes time, I'd suggest to just to just create temporary c = std::cos(t[i]) and s= std::sin(t[i]) variables in the loop.

reducing the number of expensive trigonometric function calls in the main loop.

hum, I would expect that your optimization would just divide the time by 2, not by 10. Did you look at the dissembled code to understand why you get such perf gain ? Which compiler do you use ? Maybe OpenMP is used and the array computation is multithreaded ? In any case adding a comment in the code explaining why the optimization works and under which circumstances would make later maintenance easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API API improvement only, no visible user interface changes backport release-3_34 backport release-3_40
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants