diff --git a/README.md b/README.md
index 04a19d09b7..ab96899832 100644
--- a/README.md
+++ b/README.md
@@ -15,9 +15,13 @@ See [Yang et al. (SIGIR 2017)](https://dl.acm.org/authorize?N47337) and [Yang et
## 🎬 Getting Started
-Many Anserini features are exposed in the [Pyserini](http://pyserini.io/) Python interface.
+Most Anserini features are exposed in the [Pyserini](http://pyserini.io/) Python interface.
+If you're more comfortable with Python, start there, although Anserini forms an important building block of Pyserini, so it remains worthwhile to learn about Anserini.
+
+
You'll need Java 11 and Maven 3.3+ to build Anserini.
Clone our repo with the `--recurse-submodules` option to make sure the `eval/` submodule also gets cloned (alternatively, use `git submodule update --init`).
@@ -27,10 +31,6 @@ Then, build using using Maven:
mvn clean package appassembler:assemble
```
-Note that on Windows, tests may fail due to encoding issues, see [#1466](https://github.com/castorini/anserini/issues/1466).
-A simple workaround is to skip tests by adding `-Dmaven.test.skip=true` to the above `mvn` command.
-See [#1121](https://github.com/castorini/pyserini/discussions/1121) for additional discussions on debugging Windows build errors.
-
The `tools/` directory, which contains evaluation tools and other scripts, is actually [this repo](https://github.com/castorini/anserini-tools), integrated as a [Git submodule](https://git-scm.com/book/en/v2/Git-Tools-Submodules) (so that it can be shared across related projects).
Build as follows (you might get warnings, but okay to ignore):
@@ -39,7 +39,17 @@ cd tools/eval && tar xvfz trec_eval.9.0.4.tar.gz && cd trec_eval.9.0.4 && make &
cd tools/eval/ndeval && make && cd ../../..
```
-With that, you should be ready to go!
+With that, you should be ready to go.
+The onboarding path for Anserini starts [here](docs/start-here.md)!
+
+
+Windows tips
+
+Note that on Windows, tests may fail due to encoding issues, see [#1466](https://github.com/castorini/anserini/issues/1466).
+A simple workaround is to skip tests by adding `-Dmaven.test.skip=true` to the above `mvn` command.
+See [#1121](https://github.com/castorini/pyserini/discussions/1121) for additional discussions on debugging Windows build errors.
+
+
## ⚗️ Regression Experiments (+ Reproduction Guides)
diff --git a/docs/Prometheus-Model.png b/docs/Prometheus-Model.png
new file mode 100644
index 0000000000..5b2e4e4286
Binary files /dev/null and b/docs/Prometheus-Model.png differ
diff --git a/docs/start-here.md b/docs/start-here.md
index 1e097aea59..2be3707e3b 100644
--- a/docs/start-here.md
+++ b/docs/start-here.md
@@ -1,7 +1,7 @@
# Anserini: Start Here
This page provides the entry point for an introduction to information retrieval (i.e., search).
-It also serves as an [onboarding path](https://github.com/lintool/guide/blob/master/ura.md) for University of Waterloo undergraduate (and graduate) students who wish to join my research group.
+It also serves as an [onboarding path](https://github.com/castorini/onboarding) for University of Waterloo undergraduate and graduate students who wish to join my research group.
As a high-level tip for anyone going through these exercises: try to understand what you're actually doing, instead of simply [cargo culting](https://en.wikipedia.org/wiki/Cargo_cult_programming) (i.e., blindly copying and pasting commands into a shell).
By this, I mean, actually _read_ the surrounding explanations, understand the purpose of the commands, and use this guide as a springboard for additional explorations (for example, dig deeper into the code).
@@ -27,7 +27,44 @@ This problem has been given various names, e.g., the search problem, the informa
In most contexts, "ranking" and "retrieval" are used interchangeably.
Basically, this is what _search_ (i.e., information retrieval) is all about.
-Let's try to unpack the definition a bit.
+## Interlude: Who Cares?
+
+At this point, it's worthwhile to pause and answer the question: Who cares?
+
+LLMs are cool.
+ChatGPT is cool.
+Generative AI is cool.
+But _search_?
+That's so... last millennium!
+
+Well, not quite.
+You might have heard of this thing called "retrieval augmentation"?
+That's just a fancy way of describing the technique of fetching pieces of content (e.g., paragraphs) from some external source (e.g., a collection of documents), and stuffing them into the prompt of an LLM to improve its generative capabilities.
+How do we "fetch" those pieces of content?
+Well, that's retrieval!
+(You might have also heard about something called vector search? We'll cover exactly that later in this onboarding path.)
+
+In fact, retrieval augmentation is exactly how the new Bing search works.
+You don't have to take my word: you can directly read the blog post on [building the new Bing](https://blogs.bing.com/search-quality-insights/february-2023/Building-the-New-Bing) and find the following diagram:
+
+
+
+Search comprises "internal queries" to fetch content ("Bing results") that are then fed into an LLM (i.e., stuffed into the prompt) to generate answers.
+If you want more evidence, here's a [NeurIPS 2020 paper](https://arxiv.org/abs/2005.11401) that basically says the same thing.
+
+Thus, retrieval forms the foundation of answer generation with LLMs.
+In fact, it's critical to the quality of the output.
+We all know the adage "[garbage in, garbage out](https://en.wikipedia.org/wiki/Garbage_in,_garbage_out)", which highlights the importance of retrieval.
+If the retrieval quality ain't good, the LLM output will be garbage.
+
+How do we do retrieval effectively?
+Well, that's why you should read on.
+Later, we'll also see that transformers (the same neural network model that underlies LLMs) form a fundamental building block of converting content into representation vectors (called "embeddings"), which underlie vector search.
+
+## Back to the Retrieval Problem
+
+Hopefully, you're convinced that retrieval is important, or at least sufficiently so to read on.
+Now, let's get back to the retrieval problem and try to unpack the definition a bit.
A **"query"** is a representation of an information need (i.e., the reason you're looking for information in the first place) that serves as the input to a retrieval system.