[pull] master from rust-lang:master #8

pull · 2020-01-09T20:22:55Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

Otherwise it's possible for the fuzzer to build a regex that is big enough to timeout on a big haystack.

The fuzzer keeps finding regexes that just fit into the limit, have a Unicode word boundary assertion and gives a decent sized haystack. This in turn results in slowish searches. The searches are horrificly slow, but they become much slower with the sanitizers enabled it looks like. So... drop the size limit down even more.

This fixes a bug where the calculation for the min/max length of a regex could overflow if the counted repetitions in the pattern are big enough. The panic only happens when debug assertions are enabled, which means there is no panic by default in release mode. One may wonder whether other bad things happen in release mode though, since in that case, the arithmetic will wrap around instead. Since this is in new code and since the regex crate doesn't yet utilize the min/max attributes of an Hir, the wrap around in this case is completely innocuous. Fixes #995

And also enable debug assertions to try and catch more bugs.

We keep beating back the OSS-fuzz timeouts. It keeps finding bigger and bigger haystacks with even smallish regexes that have Unicode word boundaries in them. This results in using the PikeVM which is just slow. There's really nothing to be done other than to tell the fuzzer: "this is OK."

This bug results in some regexes reporting matches at every position even when it should't. The bug happens because the internal literal optimizer winds up using an "empty" searcher that reports a match at every position. This is technically correct whenever the literal searcher is used as a prefilter (although slower than necessary), but an optimization added later enabled the searcher to run on its own and not as a prefilter. i.e., Without the confirm step by the regex engine. In that context, the "empty" searcher is totally incorrect. So this bug corresponds to a code path where the "empty" literal searcher is used, but is also in a case where the literal searcher is used directly to find matches and not as a prefilter. I believe at least the following are required to trigger this path: * The literals extracted need to be "complete." That is, the language described by the regex is small and finite. * There needs to be at least 26 distinct starting bytes among all of the elements in the language described by the regex. * There needs to be fewer than 26 distinct ending bytes among all of the elements in the language described by the regex. * Possibly other criteria... The actual fix is to change the code that selects the literal searcher. Indeed, there was even a comment in the erroneous code saying that the path was impossible, but of course, it isn't. We change that path to return None, as it should have long ago. This in turn results in the case outlined above not using a literal searcher and just the regex engine. Fixes #999

This regex failed to compile in `regex <1.8`, but the migration to regex-automata tweaked the rules in a subtle way that permitted it to compile despite the fact that the old/status-quo matching engines can't handle it correctly. By that, I mean that they may permit the \B to match between code units. That in turn results in panicking when slicing a &str. In `regex 1.9`, this regex will actually be able to be compiled, but the matching engines will correctly and robustly never report matches that split UTF-8 code units. For now, we just add code that causes `regex 1.8` to have the same behavior as previous releases. Fixes #1006

This essentially copied the visit_alternation_in methods, but for concatenations. This is useful for some niche use cases where one wants to visit concatenations in reverse. PR #1017

This effectively copies my regex-automata work into this crate and does a bunch of rejiggering to make it work. In particular, we wire up its new test harness to the public regex crate API. In this commit, that means the regex crate API is being simultaneously tested using both the old and new test suites. This does *not* get rid of the old regex crate implementation. That will happen in a subsequent commit. This is just a staging commit to prepare for that.

If we need this again, we should just rewrite it in Rust and put it in 'regex-cli'.

All of the old tests should be covered by either porting them over explicitly, or in the TOML test suite.

We're going to drop the old benchmark suite in favor of rebar, but it's worth recording some final results. This ensures we get a fair comparison with the regex crate before and after its internals have been rewritten.

We are going to remove the old benchmark harness, but it seems like a good idea to save the old measurements. In the future, benchmarks will be maintained by rebar: https://github.com/BurntSushi/rebar

As stated in a previous commit, we'll be moving to rebar. (rebar isn't actually published at time of writing, but it's essentially ready to go.)

PR #1198

PR #1203

We had previously release regex 1.10.4 but omitted a changelog entry for it. So this adds it.

This is an update from a change made to the trait: rust-lang/rust#127481 There shouldn't be any behavior changes here. PR #1219

rustc seems to warn about this. And I would prefer writing the lifetime here anyway. That it wasn't was probably an oversight.

It looks like rustc picks this up now but didn't before.

This complements `matched_any` with a means to check if a set of patterns all matched the haystack. PR #1228

This was an oversight omission when porting the old generator shell script to regex-cli. This hasn't been an issue because I don't think we've generated data for a new release of Unicode with this new infrastructure yet. This was flagged by unit tests that failed because \d was no longer a subset of \w.

I am teetering on removing this cursed implementation. Fixes #1231

This adds a new predicate that supports very minimal introspection ability into why DFA construction failed. Closes #1236

pull bot added the ⤵️ pull label Jan 9, 2020

pull bot added the merge-conflict Resolve conflicts manually label Feb 19, 2020

BurntSushi and others added 28 commits April 21, 2023 07:58

regex-syntax-0.7.1

31c8452

deps: bump regex-syntax to 0.7.1

8a7cb64

1.8.1

4e29fce

changelog: fix some typos

1872bdf

PR #987

syntax: fix typo

98be16a

PR #992

fuzz: set a size limit

a3978b2

Otherwise it's possible for the fuzzer to build a regex that is big enough to timeout on a big haystack.

changelog: 1.8.2

40cbe1d

regex-syntax-0.7.2

4f664b9

deps: bump regex-syntax to 0.7.2

709248c

1.8.2

6fb1810

fuzz: OSS-fuzz build scripts into this repo

8afffab

And also enable debug assertions to try and catch more bugs.

changelog: 1.8.3

710222d

1.8.3

a1a9ebe

changelog: 1.8.4

407f6d3

1.8.4

5a34a39

api: add visit_concat_in method to the Visitor traits

1d9ce15

This essentially copied the visit_alternation_in methods, but for concatenations. This is useful for some niche use cases where one wants to visit concatenations in reverse. PR #1017

impl: cut over to regex-automata

ff27ce0

scripts: remove 'frequencies' script

fd19324

If we need this again, we should just rewrite it in Rust and put it in 'regex-cli'.

tests: drop old tests

20f3e13

All of the old tests should be covered by either porting them over explicitly, or in the TOML test suite.

bench: record last results with old benchmark suite

146bf5c

We're going to drop the old benchmark suite in favor of rebar, but it's worth recording some final results. This ensures we get a fair comparison with the regex crate before and after its internals have been rewritten.

bench: move the old recordings to 'record' directory

0328ba2

We are going to remove the old benchmark harness, but it seems like a good idea to save the old measurements. In the future, benchmarks will be maintained by rebar: https://github.com/BurntSushi/rebar

bench: remove the old harness

78b865e

As stated in a previous commit, we'll be moving to rebar. (rebar isn't actually published at time of writing, but it's essentially ready to go.)

purrden and others added 30 commits June 2, 2024 19:30

doc: fix duplicate phrasing typo

ab4c8d1

PR #1198

bytes: escape invalid UTF-8 bytes in debug output for Match

1f9f9cc

PR #1203

changelog: 1.10.4

1430b65

regex-syntax-0.8.4

4757b5f

regex-automata-0.4.7

68c4f0b

changelog: 1.10.4 and 1.10.5

377463b

We had previously release regex 1.10.4 but omitted a changelog entry for it. So this adds it.

1.10.5

0718fc5

regex-lite-0.1.6

1288b83

regex-test: bump toml dependency

c2f9ca4

regex-test-0.1.1

c4c76a1

regex-cli-0.2.1

8856fe3

unstable: fit Pattern trait implementation

2970d29

This is an update from a change made to the trait: rust-lang/rust#127481 There shouldn't be any behavior changes here. PR #1219

changelog: 1.10.6

76f2d30

1.10.6

ab88aa5

automata: add explicit lifetime annotation

92efe4a

rustc seems to warn about this. And I would prefer writing the lifetime here anyway. That it wasn't was probably an oversight.

cli: remove some dead code

d3d3ff7

It looks like rustc picks this up now but didn't before.

api: add SetMatches::matched_all

b790aa5

This complements `matched_any` with a means to check if a set of patterns all matched the haystack. PR #1228

data: update to UCD 16

9239e7e

changelog: 1.11.0

1533257

regex-syntax-0.8.5

cba0fbc

deps: bump regex-syntax

4bb1e3d

regex-automata-0.4.8

58e16f5

deps: bump regex-automata

9e17e56

1.11.0

bcbe403

unstable: fix Pattern trait implementation

991ba88

I am teetering on removing this cursed implementation. Fixes #1231

changelog: 1.11.1

80df54e

1.11.1

9870c06

automata/dfa: add BuildError::is_size_limit_exceeded

225c94c

This adds a new predicate that supports very minimal introspection ability into why DFA construction failed. Closes #1236

regex-automata-0.4.9

1a069b9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from rust-lang:master #8

[pull] master from rust-lang:master #8

pull bot commented Jan 9, 2020 •

edited

Loading

[pull] master from rust-lang:master #8

Are you sure you want to change the base?

[pull] master from rust-lang:master #8

Conversation

pull bot commented Jan 9, 2020 • edited Loading

pull bot commented Jan 9, 2020 •

edited

Loading