Skip to content

Commit

Permalink
Merge branch 'add_replace_and_runbooks' of github.com:harsha-simhadri…
Browse files Browse the repository at this point in the history
…/big-ann-benchmarks into add_replace_and_runbooks
  • Loading branch information
magdalendobson committed Aug 26, 2024
2 parents 408205f + aa2aed8 commit 9617948
Showing 1 changed file with 7 additions and 7 deletions.
14 changes: 7 additions & 7 deletions neurips23/streaming/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,19 +8,19 @@ The streaming runbooks support four operations: `search`, `insert`, `delete`, an

Each vector is assumed to have a unique *id* which never changes throughout the course of a runbook. In the case of replaces, each vector is also assigned a numeric *tag*. The underlying vector id corresponding to a tag may change throughout the runbook when a vector is replaced. In the runbooks here, the tag of a vector is assumed to correspond to the vector id when a vector is first inserted, and then remains constant when the vector is replaced. For example, a vector with id #245 is first inserted with tag #245. If the vector is later replaced with vector id #1067, tag #245 now corresponds to vector id #1067. Upon another replace, tag #245 might next correspond to vector id #2428. This distinction leads us to define the semantics of each operation in terms of ids and tags:

1. `search` takes as input a set of query vectors, and returns an array of tags corresponding to the nearest index vectors to each query vector.
2. `insert` takes as input a range of vector ids, whose tags correspond to their vector ids, to insert into the index.
3. `delete` inputs a range of existing tags which are to be deleted from the index and not to be returned as answers to queries henceforth.
4. `range` inputs a range of existing tags and a range of vector ids, such that each tag should henceforth correspond to the new vector id.
1. `search` provides a set of query vectors, and returns an array of tags corresponding to the nearest index vectors to each query vector. In this repository, each call to `search` in one runbook refers to the same set of query vectors.
2. `insert` provides a range of vector ids, whose tags are identical to their vector ids, to insert into the index.
3. `delete` provides a range of existing tags whose underlying data is to be deleted from the index and no longer returned as answers to queries.
4. `replace` provides a range of existing tags and a range of vector ids, such that each tag should henceforth correspond to the new vector id.

## Available Runbooks

Now that the number of runbooks has started to increase significantly, here we list the available runbooks with a brief description of each.

1. `simple_runbook.yaml`: A runbook executing a short sequences of insertions, searches, and deletions to aid with debugging and sanity checks.
2. `simple_replace_runbook.yaml`: A runbook executing a short sequence of replaces to aid with debugging and sanity checks.
1. `simple_runbook.yaml`: A runbook executing a short sequences of insertions, searches, and deletions to aid with debugging and testing.
2. `simple_replace_runbook.yaml`: A runbook executing a short sequence of inserts, searches, and replaces to aid with debugging and testing.
3. `clustered_runbook.yaml`: A runbook taking a clustered dataset (options are `random-xs-clustered` and `msturing-10M-clustered`) and inserting points in clustered order.
4. `delete_runbook.yaml`: A runbook executing all steps in the clustered runbook, but which then deletes a fraction of each cluseter.
4. `delete_runbook.yaml`: A runbook executing all steps in the clustered runbook, but which then deletes a fraction of each cluster.
5. `final_runbook.yaml`: The NeurIPS 2023 streaming challenge final runbook. It takes the `msturing-30M-clustered` dataset and performs several rounds of insertion and deletion in clustered order.
6. `msmarco-100M_expirationtime_runbook.yaml`: A runbook using the `msmarco-100M` dataset which inserts each point with a randomly chosen expiration time: never, in 200 steps, or in 50 steps.
7. `neurips23/streaming/wikipedia-35M_expirationtime_runbook.yaml`: A runbook using the `wikipedia-35M` dataset which inserts each point with a randomly chosen expiration time: never, in 200 steps, or in 50 steps.
Expand Down

0 comments on commit 9617948

Please sign in to comment.