Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync performance improvements and misc. bugfixes from upstream #23

Merged
merged 51 commits into from
Sep 12, 2023

Conversation

silentbicycle
Copy link

@silentbicycle silentbicycle commented Sep 11, 2023

This brings us up to date as of katef#443. (Edit: Changed from katef#441 to include katef#442 (a few more misc fixes) and katef#443 (address a CI issue).

This brings in several misc. fixes, but the primary motivation is the performance improvements in katef#441, which should help with excessive VCL compilation times that have been blocking increasing vcc's max_regex_cohort_compilation default from 5 to 6.

silentbicycle and others added 30 commits June 12, 2023 15:12
Re-adding it during each pass of the for loop led to a bunch of
warnings during the bulid.
Previously these were split between src/libfsm/internal.h and
include/common/check.h, but some of the internal.h definitions
were used in libre.

Also, add `BUILD_FOR_FUZZER`, which sets more restrictive limits for
cases that lead to uninteresting failure modes that get reported over
and over during fuzzing.
This is a step closer to removing:

    #include "libfsm/internal.h" /* XXX */

but ast_compile.c still needs it for the internal interface
`fsm_mergeab`. If either that or `fsm_unionxy` is reworked to make a
clearer library boundary, libre could stop depending on internal details
of libfsm.
Otherwise, this can lead to uninitialized memory in combine_info
when either a or b has zero states.

Found via scan-build.
The fuzzing itself is run unconditionally; the idea is the cache is restored each time, and then each run adds a little more.

I'm also presenting the corpus as an asset so it can be grabbed to run fuzzing locally.
It's a little awkward to have CI bundle two things together for the way I've written it, so I've been iterating over them as a set. But that's taking way too long, so here I'm combining the two.

I think this is semantically identical to -DNDEBUG; we'd never want to test with just assertions without these expensive checks, and we'd never want to run in production with them. But after some discussion we're keeping them separate for practical reasons.

The main idea is that EXPENSIVE_CHECKS is supposed to be for stuff that doesn't change the algorithmic complexity of an operation. That means probably more stuff should move under EXPENSIVE_CHECKS rather than as assertions.
…an environment variable named `UBSAN_OPTIONS`
Here I'm also saving the seeds unconditionally on error, so we don't need to re-find the same seeds for each bug. Hopefully this should make fuzzing reproducible in CI.
katef and others added 15 commits June 15, 2023 13:40
The description says "Return where an item would be, if it were
inserted", but it was returning the last element <= rather than
the first element >=, then the call to `state_set_cmpval` later
was shifting i by 1 for that specific case. Handle it correctly
inside the search function instead. Two other all call sites
need to check whether the result refers to the append position
(one past the end of the array) before checking `set->a[i] == state`,
update them.

Add a fast path upfront: It's VERY common to append states in order
to the state array, so before we binary search each first compare
against the last entry (unless empty).
In -O0 this can become pretty expensive (~25% of overall runtime for
`time ./re -rpcre -C '^[ab]{0,2000}$'`), but when built with -O3 very
little overhead remains. I'm adding this comment because every time I
see this it seems to me like it should have `EXPENSIVE_CHECKS` around
it, but profiling is telling me it really doesn't matter.
This is a major hotspot when doing epsilon removal over large runs of
potentially skipped states (as might appear from `^[ab]{0,2000}$`).

Add a fast path for appending, which is also very common.

Extract the edge set destination search into its own function,
`find_state_position`, and add a `#define` to switch between linear
search, binary search, or calling both and comparing the result.
I will remove linear search in the next commit, but am checking this
in as an intermediate step for checking & benchmarking.
When I run `time ./re -rpcre -C '^[ab]{0,2000}$'` locally for -O3:
- linear search: 2.991s
- binary search: 1.521s
After the other changes in this PR, calls to qsort from
`sort_and_dedup_dst_buf` are one of the largest remaining hotspots in
the profile. We can often avoid calling qsort, though:

- If there is <= 1 entry, just return, it's sorted.

- Otherwise, first do a sweep through the array noting the min and max
  values. Unless there is a huge range between them, it's much faster to
  build a bitset from them in a small (max 10KB) stack-allocated array
  and then unpack the bitset (now sorted and unique). Only the needed
  portion of the array is initialized.

I have not done a lot of experimentation to find a cutoff point where
the bitset becomes slower than qsort (it may be much larger), I picked
10KB because it's likely to be safe to stack-allocate.

I tried changing the bitset unpacking to use an 8 or 16 bit mask and
jump forward faster through large sub-word ranges of 0 bits, but any
improvement was lost among random variation, so I decided it wasn't
worth the extra complexity. We already skip whole words that are 0.
If min and max are exactly 64 states apart the upper value was getting
silently dropped due to an incorrect `words` value here.

One of the patterns in the PCRE suite triggers this:

    ./re -rpcre '(?:c|d)(?:)(?:aaaaaaaa(?:)(?:bbbbbbbb)(?:bbbbbbbb(?:))(?:bbbbbbbb(?:)(?:bbbbbbbb)))' "caaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"

This should match, but did not.
silentbicycle and others added 6 commits September 11, 2023 19:14
determinise: It's not possible to find a cached result in the hash
table without allocating a to-set buffer first, so assert that it
will be non-NULL.

fsm_findmode: This should never be used on a state without edges.

vm/v1.c and vm/v2.c: Free allocated return value on error.
@silentbicycle silentbicycle merged commit 11d725e into fastly:main Sep 12, 2023
322 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants