-
-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xxx regex cache #1770
base: master
Are you sure you want to change the base?
xxx regex cache #1770
Conversation
let's see if this blows anything up. i want to compare this to |
bd8e1de
to
5698121
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable, I would like to see the before/after numbers on the 3 workloads we have (parse_help, synthetic with #regexes > cache size, and #regexes < cache size)
And I think we should have unit tests in libc_test.cc
, which means moving the declaration into the header
I made a comment about the data layout ... reducing indirection by making CacheEntry a value.
That does make shifting more expensive, though now I kinda wonder about make_heap
and operator >
The struct size on 64-bit machines would be 8+8+4 = 20 * (N=100) = 2000 bytes, which doesn't seem horrible
cpp/libc.cc
Outdated
} | ||
|
||
size_t capacity_; | ||
std::vector<CacheEntry*> access_list_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it can be vector<CacheEntry>
And then the CacheEntry contains a pointer to string, pointer to regex_t
, and int hash
Then I feel like there's more locality when doing the find_if
.
But then it costs more when we do erase()
I suppose, because instead of a pointer, it's a struct with 2 pointers and an int.
cpp/libc.cc
Outdated
CacheEntry* TakeEntry(BigStr* pat) { | ||
auto it = std::find_if(access_list_.begin(), access_list_.end(), | ||
[pat](CacheEntry* entry) { | ||
return hash(pat) == entry->pat_hash_ && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to pull out hash(pat)
from the loop
Even though it doesn't hash every time, it still checks if it's hashed every time
Overall I think this is fine, I think the main reason to look at any more optimizations (Dict<K, V> or make_heap()) would be if the (# regex > cache size) case exhibits a big slowdown, like 50% slower than not having a cache at all |
87a80d2
to
dddb4da
Compare
dddb4da
to
60e8862
Compare
not ready for review. needs some cleanup