"Castorini" is the GitHub organization of Jimmy Lin's research group at the University of Waterloo. The name is a portmanteau of castor, which is the genus name for a beaver, and anserini, which is the genus name for a goose. It's difficult to come up with two animals that are more quintessentially Canadian than those!
This repository contains onboarding resources for researchers who would like to work with us, which include new graduate students and undergraduates at the University of Waterloo.
Undergraduates at the University of Waterloo: If you're interested in working with our group, read this guide first.
This onboarding path provides the starting point of working in our group and comprises the following lessons:
- Begin your journey here.
- BM25 Baselines for MS MARCO Passage Ranking in Anserini.
- BM25 Baseline for MS MARCO Passage Ranking in Pyserini.
- A Conceptual Framework for Retrieval
- Contriever Baseline for NFCorpus
- A Deeper Dive into Dense and Sparse Representations
When you are proceeding along the onboarding path, please don't send a separate pull request for each file. Instead, consolidate your edits into a single pull request for each repo.
This repository introduces several methods for users without local GPU resources.
- Transform Google Colab to a GPU instance with full SSH access
- Guide to ComputeCanada GPU resources
- Guide to use UW GPU resources
This is the guide to fine-tuning monoBERT on MS MARCO Passage dataset, based on Capreolus toolkit. For Compute Canada users, you may need to set up the environment following this guide.