You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Certain queries such as [lemma="cat"] [lemma!="dog"]{10} can produce a bunch of overlapping hits (cat followed by 1 non-dog; cat followed by 2 non-dogs; etc.). For certain queries, you want all the possibilities, but for others, you would prefer it if these hits were filtered to just include the ones most relevant to you.
This is somewhat similar to how regex engines usually have greedy, reluctant and possessive matching modes (see e.g. here), although replicating those exact behaviours in BlackLab would be challenging, because it finds matches in a different way, using the reverse index.
There are many ways BlackLab could filter out certain overlapping hits, e.g.:
keep everything (this is how it currently works)
for hits with the same start position, discard all but the longest (or shortest) (but giving start position a special meaning seems arbitrary)
when two hits overlap, keep the one that starts the earliest in the document; discard the other (again, seems arbitrary)
discard any hits that are fully contained in another hit (or that fully contain another hit)
when two hits (partially or fully) overlap, keep the longest (or shortest); discard the other
We should try to support some of the most helpful modes.
Certain queries such as
[lemma="cat"] [lemma!="dog"]{10}
can produce a bunch of overlapping hits (cat followed by 1 non-dog; cat followed by 2 non-dogs; etc.). For certain queries, you want all the possibilities, but for others, you would prefer it if these hits were filtered to just include the ones most relevant to you.This is somewhat similar to how regex engines usually have greedy, reluctant and possessive matching modes (see e.g. here), although replicating those exact behaviours in BlackLab would be challenging, because it finds matches in a different way, using the reverse index.
There are many ways BlackLab could filter out certain overlapping hits, e.g.:
We should try to support some of the most helpful modes.
(via @franklandsbergen)
The text was updated successfully, but these errors were encountered: