-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizing the size of WidgetState #706
Comments
Do you know how values are aligned in Rust? I looked it up and it's platform specific, but alignment could reduce any benefit from switching from f64 to f32 or f16. |
In general aggregates are aligned with their most-aligned field. So a (Also, just because a struct is N-aligned doesn't mean its fields are N-aligned. So for instance WidgetState is 8-aligned, but individual bool flags of WidgetState are 1-aligned.) |
Also Rust is able to do some optimizations, so for That said, I'd say benching and profiling first, before attempting potentially premature-optimizations (while I'm currently doing pretty much that in a different context^^). My guess is, that we have greater potential for optimization elsewhere for a long time. |
I think (?) the struct-of-lists pattern would make it so that WidgetIds no longer have a clear owner/corresponding Widget (not too sure what value a list of ids has). This would make it so Widget implementations themselves would have to store their WidgetId and have a function like .id() that returns it. Do you think that's worth it? Of course, I could be mistaken. |
The current pattern is that |
The update_widget_tree function only would have like 30 ‘.get’s for each widget. If I remember correctly each call is around 500 ps. Is that acceptable? Not saying it’s bad just wondering. |
Not sure what you're asking. |
I mean, you would have to do a lot of array indexing, once for each state you want to access. The performance cost might add up? In reality you'll probably use SecondaryMap or similar for each state field I'm guessing. |
The short answer is we'd have to test and measure it to know for sure. The long answer is that accessing N fields in M items requires as many load instructions in struct-of-array and array-of-struct, and that most of the function calls involved would likely get inlined in optimized builds. The difference is in how the load patterns deal with the CPU caches. |
Can you give me an example case of where you would use this pattern to improve performance theoretically? When I look at passes, the way I think of it is we could store bitflags in a SecondaryMap and then iterate through widget keys to update those flags. A potential problem is, Slot/SecondaryMap iterate in arbitrary order. I still need to find the best arena crate 🤔 |
Right now WidgetState is a pretty chunky struct. VSCode with Rust Analyzer currently tells us:
Yeah. Three and a half widgets are enough to fill a kilobyte. Your usual L1 cache is about 64KB (from a quick Wikipedia check), which is 216 WidgetStates.
That's probably pretty bad. A lot of realistic applications have more than two hundred widgets, and we don't want them to overflow the L1 cache. (For reference, this github issue after two answers has 2467 html elements.)
What can we do to reduce the size? Some ideas:
Bitflags
There are 26 boolean flags.
They take 26 bytes, which we could get down to 4 bytes with bitflags.
However, bitflags are more annoying to use that booleans, and switching to bitflags would only be enough for a <10% size reduction.
Replace f64 values with f32s
We use a lot of f64 types:
size
: 2.origin
: 2window_origin
: 2paint_insets
: 4local_paint_rect
: 4baseline_offset
: 1ime_area
: 4clip_path
: 4translation
: 2That's a total of 25 f64 values, which is 200 bytes, two thirds of our size.
Realistically, virtually all of them could be replaced with f32s with almost no code change, saving us about 100 bytes.
Most of them deal with local coordinates where we expect that most of the range of f64s will be wasted. A widget is never gonna have billions of pixels of paint insets, for instance.
If we ever decide to switch to fractional coordinates, we could store a lot of them as
u16
, which would be yet more size savings.Optional fields
The fields
ime_area
andclip_path
are optional. Moreover, fields likepaint_insets
,baseline_offset
andtranslation
aren't optional, but their value is almost always zero.(
cursor
too, but the type itself is very lightweight.)We could store these fields in an
Option<Box<...>>
, and save memory for 90% of widgets.Struct-of-array
The most radical change would be to store WidgetStates in a struct-of-arrays pattern.
The problem with WidgetState right now is that a single instance occupies multiple cache lines. That means that accessing two widget states is always going to require touching multiple cache lines.
If we could instead store WidgetStates by field, the cruft would be a lot less of a problem: code iterating over flags would keep the flag array in L1 and wouldn't care that some massive amount of layout bytes is being kept in main memory somewhere.
Switching to struct-of-arrays would be a very ambitious change, probably requiring some radical improvements to the architecture of TreeArena. It would probably impact every single line of code currently using WidgetState.
Benchmark?
Before we do any of that, we probably want to check that Masonry's memory usage is actually bad, is actually bottlenecked on WidgetState, and that L1 misses have an actual impact on our performance. I'm not sure how we'd check that, suggestions are welcome.
I don't expect us to address these problems overnight. We should expect this to be a long-term issue.
The text was updated successfully, but these errors were encountered: