diff --git a/neurips23/streaming/README.md b/neurips23/streaming/README.md index a4ffa1e8..1b66e222 100644 --- a/neurips23/streaming/README.md +++ b/neurips23/streaming/README.md @@ -8,19 +8,19 @@ The streaming runbooks support four operations: `search`, `insert`, `delete`, an Each vector is assumed to have a unique *id* which never changes throughout the course of a runbook. In the case of replaces, each vector is also assigned a numeric *tag*. The underlying vector id corresponding to a tag may change throughout the runbook when a vector is replaced. In the runbooks here, the tag of a vector is assumed to correspond to the vector id when a vector is first inserted, and then remains constant when the vector is replaced. For example, a vector with id #245 is first inserted with tag #245. If the vector is later replaced with vector id #1067, tag #245 now corresponds to vector id #1067. Upon another replace, tag #245 might next correspond to vector id #2428. This distinction leads us to define the semantics of each operation in terms of ids and tags: -1. `search` takes as input a set of query vectors, and returns an array of tags corresponding to the nearest index vectors to each query vector. -2. `insert` takes as input a range of vector ids, whose tags correspond to their vector ids, to insert into the index. -3. `delete` inputs a range of existing tags which are to be deleted from the index and not to be returned as answers to queries henceforth. -4. `range` inputs a range of existing tags and a range of vector ids, such that each tag should henceforth correspond to the new vector id. +1. `search` provides a set of query vectors, and returns an array of tags corresponding to the nearest index vectors to each query vector. In this repository, each call to `search` in one runbook refers to the same set of query vectors. +2. `insert` provides a range of vector ids, whose tags are identical to their vector ids, to insert into the index. +3. `delete` provides a range of existing tags whose underlying data is to be deleted from the index and no longer returned as answers to queries. +4. `replace` provides a range of existing tags and a range of vector ids, such that each tag should henceforth correspond to the new vector id. ## Available Runbooks Now that the number of runbooks has started to increase significantly, here we list the available runbooks with a brief description of each. -1. `simple_runbook.yaml`: A runbook executing a short sequences of insertions, searches, and deletions to aid with debugging and sanity checks. -2. `simple_replace_runbook.yaml`: A runbook executing a short sequence of replaces to aid with debugging and sanity checks. +1. `simple_runbook.yaml`: A runbook executing a short sequences of insertions, searches, and deletions to aid with debugging and testing. +2. `simple_replace_runbook.yaml`: A runbook executing a short sequence of inserts, searches, and replaces to aid with debugging and testing. 3. `clustered_runbook.yaml`: A runbook taking a clustered dataset (options are `random-xs-clustered` and `msturing-10M-clustered`) and inserting points in clustered order. -4. `delete_runbook.yaml`: A runbook executing all steps in the clustered runbook, but which then deletes a fraction of each cluseter. +4. `delete_runbook.yaml`: A runbook executing all steps in the clustered runbook, but which then deletes a fraction of each cluster. 5. `final_runbook.yaml`: The NeurIPS 2023 streaming challenge final runbook. It takes the `msturing-30M-clustered` dataset and performs several rounds of insertion and deletion in clustered order. 6. `msmarco-100M_expirationtime_runbook.yaml`: A runbook using the `msmarco-100M` dataset which inserts each point with a randomly chosen expiration time: never, in 200 steps, or in 50 steps. 7. `neurips23/streaming/wikipedia-35M_expirationtime_runbook.yaml`: A runbook using the `wikipedia-35M` dataset which inserts each point with a randomly chosen expiration time: never, in 200 steps, or in 50 steps.