Skip to content

Commit

Permalink
eval: prefer longer candidateMatch when removing overlaps (#727)
Browse files Browse the repository at this point in the history
When thinking about transforming queries like 'foo bar' into '(foo bar)
or "foo bar"' we would want to keep the phrase candidateMatch and not
throw it away in gatherMatches. By sorting longer matches before others
that start at the same offset we end up keeping those.

Note: this only affects ChunkMatch, since for LineMatch we merge when we
find overlaps.

Test Plan: This was quite hard to test with our existing e2e tests due
to them not recording offsets, only matching lines. So instead I am just
relying on the fact we didn't break anything and once we add proper
support for phrases we will have a test then.
  • Loading branch information
keegancsmith authored Jan 25, 2024
1 parent 92cb885 commit cdb1665
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions eval.go
Original file line number Diff line number Diff line change
Expand Up @@ -537,6 +537,10 @@ type sortByOffsetSlice []*candidateMatch
func (m sortByOffsetSlice) Len() int { return len(m) }
func (m sortByOffsetSlice) Swap(i, j int) { m[i], m[j] = m[j], m[i] }
func (m sortByOffsetSlice) Less(i, j int) bool {
if m[i].byteOffset == m[j].byteOffset { // tie break if same offset
// Prefer longer candidates if starting at same position
return m[i].byteMatchSz > m[j].byteMatchSz
}
return m[i].byteOffset < m[j].byteOffset
}

Expand Down

0 comments on commit cdb1665

Please sign in to comment.