Multi-stage Conversational Passage Retrieval: An Approach to Fusing Term Importance Estimation and Neural Query Rewriting
-
Make sure Java 11+ and Python 3.7+ are installed
-
Install the
chatty-goose
PyPI module
pip install chatty-goose
-
If you are using T5 or BERT, make sure to install PyTorch 1.4.0 - 1.7.1 using your specific platform instructions. Note that PyTorch 1.8 is currently incompatible due to the
transformers
version we currently use. Also make sure to install the corresponding torchtext version. -
Download the English model for spaCy
python -m spacy download en_core_web_sm
The following example shows how to initialize a searcher and build a ConversationalQueryRewriter
agent from scratch using HQE and T5 as first-stage retrievers, and a BERT reranker. To see a working example agent, see chatty_goose/agents/chat.py.
First, load a searcher
from pyserini.search import SimpleSearcher
# Option 1: load a prebuilt index
searcher = SimpleSearcher.from_prebuilt_index("INDEX_NAME_HERE")
# Option 2: load a local Lucene index
searcher = SimpleSearcher("PATH_TO_INDEX")
searcher.set_bm25(0.82, 0.68)
Next, initialize one or more first-stage CQR retrievers
from chatty_goose.cqr import Hqe, Ntr
from chatty_goose.settings import HqeSettings, NtrSettings
hqe = Hqe(searcher, HqeSettings())
ntr = Ntr(NtrSettings())
Load a reranker
from chatty_goose.util import build_bert_reranker
reranker = build_bert_reranker()
Create a new RetrievalPipeline
from chatty_goose.pipeline import RetrievalPipeline
rp = RetrievalPipeline(searcher, [hqe, ntr], searcher_num_hits=50, reranker=reranker)
And we're done! Simply call rp.retrieve(query)
to retrieve passages, or call rp.reset_history()
to reset the conversational history of the retrievers.
-
Clone the repo and all submodules (
git submodule update --init --recursive
) -
Clone and build Anserini for evaluation tools
-
Install dependencies
pip install -r requirements.txt
- Follow the instructions under docs/cqr_experiments.md to run experiments using HQE, T5, or fusion.
To run an interactive conversational search agent with ParlAI, simply run chat.py
. By default, we use the CAsT 2019 pre-built Pyserini index, but it is possible to specify other indexes using the --from_prebuilt
flag. See the file for other possible arguments:
python -m chatty_goose.agents.chat
Alternatively, run the agent using ParlAI's command line interface:
python -m parlai interactive --model chatty_goose.agents.chat:ChattyGooseAgent
We also provide instructions to deploy the agent to Facebook Messenger using ParlAI under examples/messenger
.