Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get Fashion Mnist 96% recall up to 200 queries/second #611

Open
alexklibisz opened this issue Nov 29, 2023 · 6 comments
Open

Get Fashion Mnist 96% recall up to 200 queries/second #611

alexklibisz opened this issue Nov 29, 2023 · 6 comments

Comments

@alexklibisz
Copy link
Owner

I'd like to optimize Elastiknn such that the Fashion Mnist benchmark performance exceeds 200 qps at 96% recall. Currently it's at 180 qps. So this would be about an 11% improvement. There are already several issues under the performance label with ideas towards this goal. I've already merged a few PRs. I'm just opening this issue to formalize the effort and to aggregate PRs that don't otherwise have an issue.

@alexklibisz
Copy link
Owner Author

We're now above 200 qps. Last thing left is to just submit a PR to ann-benchmarks w/ the updated versions.

@alexklibisz
Copy link
Owner Author

Status update here, after releasing 8.12.2.1.

The non-containerized benchmark is reliably over 200qps @ 96% recall, around 210 qps. Latest update here: ddf637a

The containerized benchmark (running ann-benchmarks and elastiknn in the same container) has improved from ~160qps to ~180qps.

Here are the results using 8.6.2.0:

Model Parameters Recall Queries per Second
eknn-l2lsh L=100 k=4 w=1024 candidates=500 probes=0 0.378 304.111
eknn-l2lsh L=100 k=4 w=1024 candidates=1000 probes=0 0.445 246.319
eknn-l2lsh L=100 k=4 w=1024 candidates=500 probes=3 0.635 245.977
eknn-l2lsh L=100 k=4 w=1024 candidates=1000 probes=3 0.716 201.608
eknn-l2lsh L=100 k=4 w=2048 candidates=500 probes=0 0.767 265.545
eknn-l2lsh L=100 k=4 w=2048 candidates=1000 probes=0 0.846 218.673
eknn-l2lsh L=100 k=4 w=2048 candidates=500 probes=3 0.921 184.178
eknn-l2lsh L=100 k=4 w=2048 candidates=1000 probes=3 0.960 160.437

Here are the results using 8.12.2.1:

Model Parameters Recall Queries per Second
eknn-l2lsh L=100 k=4 w=1024 candidates=500 probes=0 0.378 314.650
eknn-l2lsh L=100 k=4 w=1024 candidates=1000 probes=0 0.446 247.659
eknn-l2lsh L=100 k=4 w=1024 candidates=500 probes=3 0.634 258.834
eknn-l2lsh L=100 k=4 w=1024 candidates=1000 probes=3 0.716 210.380
eknn-l2lsh L=100 k=4 w=2048 candidates=500 probes=0 0.767 271.442
eknn-l2lsh L=100 k=4 w=2048 candidates=1000 probes=0 0.846 221.127
eknn-l2lsh L=100 k=4 w=2048 candidates=500 probes=3 0.921 199.353
eknn-l2lsh L=100 k=4 w=2048 candidates=1000 probes=3 0.960 171.614

@alexklibisz
Copy link
Owner Author

Latest update here: the non-containerized benchmark is hovering around ~195 qps. It dropped below 200 qps when I re-ran the benchmark for Elasticsearch 8.15.0: bbbaeea

I've tried several other ideas to accelerate the ArrayHitCounter. Some examples: #721, #615, #598. None of it really makes a dent.

I'm thinking a major issue might be that the current LSH parameters end up matching the vast majority of documents in the index. When I sample the 0.96 benchmark in VisualVM, it's spending ~30% of its time in countHits:

private HitCounter countHits(LeafReader reader) throws IOException {
Terms terms = reader.terms(field);
// terms seem to be null after deleting docs. https://github.com/alexklibisz/elastiknn/issues/158
if (terms == null) {
return new EmptyHitCounter();
} else {
TermsEnum termsEnum = terms.iterator();
PostingsEnum docs = null;
HitCounter counter = new ArrayHitCounter(reader.maxDoc());
for (HashAndFreq hf : hashAndFrequencies) {
// We take two different paths here, depending on the frequency of the current hash.
// If the frequency is one, we avoid checking the frequency of matching docs when
// incrementing the counter. This yields a ~5% to ~10% speedup.
// See https://github.com/alexklibisz/elastiknn/pull/612 for details.
if (hf.freq == 1) {
if (termsEnum.seekExact(new BytesRef(hf.hash))) {
docs = termsEnum.postings(docs, PostingsEnum.NONE);
while (docs.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) {
counter.increment(docs.docID());
}
}
} else {
if (termsEnum.seekExact(new BytesRef(hf.hash))) {
docs = termsEnum.postings(docs, PostingsEnum.NONE);
while (docs.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) {
counter.increment(docs.docID(), (short) min(hf.freq, docs.freq()));
}
}
}
}
return counter;
}
}

A good chunk of that is spend in seekExact.

So I think I see two possible paths for the next speedup:

  • Using Lucene more efficiently.
  • Find some better parameters for the LSH model. Maybe there's some area of the parameter space that avoids matching the vast majority of documents in the index.

@alexklibisz
Copy link
Owner Author

alexklibisz commented Aug 30, 2024

I ran a grid search and found some promising parameters. Verified these on AWS:

Model Parameters Recall Queries per Second
eknn-l2lsh L=96 k=8 w=4096 candidates=1024 probes=0 0.905 250.333
eknn-l2lsh L=150 k=8 w=4000 candidates=800 probes=0 0.929 264.922
eknn-l2lsh L=128 k=8 w=4096 candidates=1024 probes=0 0.935 255.407
eknn-l2lsh L=150 k=8 w=4000 candidates=1000 probes=0 0.942 249.797

Some other parameters that were promising but I haven't verified on AWS:

eknn-l2lsh-L=96-k=4-w=4096_candidates=1024_probes=0        0.954       68.276
eknn-l2lsh-L=64-k=4-w=2048_candidates=1024_probes=4        0.942       66.007
eknn-l2lsh-L=64-k=4-w=4096_candidates=1024_probes=0        0.913       85.683
eknn-l2lsh-L=96-k=4-w=2048_candidates=1024_probes=2        0.944       66.841
eknn-l2lsh-L=96-k=2-w=2048_candidates=1024_probes=0        0.906       62.746
eknn-l2lsh-L=64-k=8-w=4096_candidates=1024_probes=2        0.936       97.321
eknn-l2lsh-L=96-k=8-w=4096_candidates=1024_probes=0        0.905      138.512
eknn-l2lsh-L=128-k=8-w=4096_candidates=512_probes=2        0.949       58.870
eknn-l2lsh-L=96-k=2-w=1024_candidates=1024_probes=2        0.910       62.992
eknn-l2lsh-L=128-k=4-w=2048_candidates=512_probes=2        0.925       61.073
eknn-l2lsh-L=128-k=4-w=4096_candidates=1024_probes=0        0.971       60.615
eknn-l2lsh-L=128-k=8-w=4096_candidates=1024_probes=0        0.935      119.305
eknn-l2lsh-L=96-k=8-w=4096_candidates=1024_probes=2        0.964       68.093
eknn-l2lsh-L=64-k=4-w=2048_candidates=1024_probes=2        0.906      112.086
eknn-l2lsh-L=128-k=4-w=4096_candidates=512_probes=0        0.925       63.552
eknn-l2lsh-L=128-k=8-w=4096_candidates=1024_probes=2        0.978       53.779
eknn-l2lsh-L=96-k=8-w=4096_candidates=512_probes=2        0.926       84.319
eknn-l2lsh-L=96-k=8-w=4096_candidates=512_probes=4        0.950       57.320
eknn-l2lsh-L=96-k=4-w=2048_candidates=512_probes=4        0.934       61.600
eknn-l2lsh-L=64-k=8-w=4096_candidates=1024_probes=4        0.959       69.827
eknn-l2lsh-L=64-k=8-w=4096_candidates=512_probes=4        0.915       92.171

@alexklibisz
Copy link
Owner Author

I managed to find some parameters that get 239 QPS at 0.96 recall. There are a ton of results in this commit: c7efcf1

@alexklibisz
Copy link
Owner Author

alexklibisz commented Aug 31, 2024

The fully-dockerized ann-benchmarks results are still quite pitiful:

Model Parameters Recall Queries per Second
eknn-l2lsh L=175 k=7 w=3900 candidates=100 probes=0 0.607 233.326
eknn-l2lsh L=175 k=7 w=3900 candidates=500 probes=0 0.921 200.319
eknn-l2lsh L=175 k=7 w=3900 candidates=1000 probes=0 0.962 169.238

I went ahead and opened a PR to get the latest parameters and Elastiknn version into ann-benchmarks: erikbern/ann-benchmarks#544

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant