Search a random selection of Wikipedia articles.
This was originally a tech challenge I was asked to do for a short term contract. The original spec stated I should spend 3 hours max and that I should not implement any sort of DB (so the index and articles should be loaded on application start and any info needed should be kept in local memory) and without the help of any information retrieval packages (e.g. no Elasticsearch allowed, both indexing and a ranking algorithm had to be implemented manually). The spec asked for the API first and that, if time allowed, some frontend would be nice.
I have since added to this project to better showcase my abilities as a fullstack engineer. I have included a TypeScript React frontend project using Redux, and Dockerized the stack to work with a Postgres instance. The FastAPI backend has been improved quite a bit also.
On launch the app will fetch a number of random articles from the Wikipedia API, store them in the Postgres DB instance, and then index them in an inverted index. You can then search for terms and the app will return the relevant article titles from those fetched, as ranked by a BM25 ranking algorithm.
This project uses pre-commit to run automated checks on code before it is committed. To install pre-commit hooks, follow these steps:
- Install pre-commit:
pip install pre-commit
- Install pre-commit hooks:
pre-commit install
- Run pre-commit hooks:
pre-commit run -a
Environment variables will need to be set up for the app to run. You can do this by copying the .env.template
to a
file called .env
in the envs
directory, and then setting the values as required.
The POSTGRES_HOST
variable will need to be set to db
if you are running the app using Docker, or to localhost
if
you are running the app locally. To run the app locally, you will also need to set the POSTGRES_USER
and
POSTGRES_PASSWORD
variables to the username and password of your Postgres user.
To set variables locally, you can use Dotenv to load the variables from the .env
file. To do this, install Dotenv
using pip install python-dotenv
, then add the following to the top of the main.py
file or settings.py
file:
from dotenv import load_dotenv
load_dotenv("../envs/.env")
Alternatively you can manually set the variables in your terminal before running the app, e.g.:
export POSTGRES_USER=postgres
export POSTGRES_PASSWORD=password
...
or in Windows:
set POSTGRES_USER=postgres
set POSTGRES_PASSWORD=password
...
Ensure that Docker is installed and that the Docker daemon is running (it will typically be running automatically, if not the following command will complain at you and you can start it manually then). In the root directory of the project, run:
docker-compose up --build
This will build the Docker image and run the backend app on port 8000, and the frontend app on port 3000 by default.
You can change the ports by editing the docker-compose.yml
file.
The compose file will also create a Postgres container and automatically run the Alembic migrations to create the
database and tables.
To run the database locally you will need to install Postgres. Version 15.6 is used in this app, and it's recommended you use the same for this to work. You can find instructions for your OS at the Postgres website.
Once set up, you can use whatever database, user & password you like (you will need to set these in the .env
file or
manually, see above). With your Postgres instance running, you can create the tables by running the
Alembic migrations. Navigate to the backend
folder and run:
alembic upgrade head
Requires Python3.9+. Navigate to the backend
folder and create and activate your virtual environment
(venv recommended), then install the required packages using:
pip install -Ur requirements/local.txt
Once requirements are installed, navigate to the src
folder and run:
uvicorn main:app --reload
This will launch the app on port 8000 by default.
Requires Node.js. Navigate to the frontend
folder and install the required packages using:
npm install
Once packages are installed, run:
npm start
This will start the React app on port 3000 by default.
The frontend React app will show a list of the random articles that have been fetched from Wikipedia. If you input a search term in the search field, the application will highlight the relevant articles based on the search term. You can then click on a highlighted article to view it in full on Wikipedia. To load more articles from Wikipedia, and add them to the index, click the React icon.
Full list of endpoints and details can be found at http://127.0.0.1:8000/docs
Example search (you may need to change the query to see some results, since you will receive a different set of random articles): http://127.0.0.1:8000/search/?query=act
Example result:
{
"results":
[
{"title": "Statue of Cosimo I", "ranking": 1.6436972762369082},
{"title": "Trihalomethane", "ranking": 1.4771313555617704},
{"title": "Joseph Henry Morris House", "ranking": 1.3022414039433408},
{"title": "Neuadd Dwyfor", "ranking": 1.1306068876621977},
{"title": "Gutierre Vermúdez", "ranking": 1.0185666732810175},
{"title": "Great Bakersfield Fire of 1889", "ranking": 1.000698806614099},
{"title": "Eureka Street (novel)", "ranking": 0.911915592081272},
{"title": "The Real World: San Francisco", "ranking": 0.7130057951771621},
{"title": "Laid Back", "ranking": 0.6889075703961695},
{"title": "Yana Milev", "ranking": 0.3727990368545083}
]
}
To run the automated tests locally, navigate to the backend
directory and install the BE project as an editable
dependency:
pip install -e .
Ensure you have the test requirements installed:
pip install -Ur requirements/test.txt
Then simply execute pytest
.