Skip to content

Full-stack app that fetches some Wikipedia articles at random, indexes them (in local memory), and serves endpoints for searching the index using a BM25 ranking algorithm. Originally an eng challenge I was asked to spend 3hrs on, w/o a DB of any form, I have since added persistence by Dockerizing the app with a Postgres instance. Front-end WIP.

Notifications You must be signed in to change notification settings

LukeScales1/wikipedia-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

wikipedia-search

Search a random selection of Wikipedia articles.

This was originally a tech challenge I was asked to do for a short term contract. The original spec stated I should spend 3 hours max and that I should not implement any sort of DB (so the index and articles should be loaded on application start and any info needed should be kept in local memory) and without the help of any information retrieval packages (e.g. no Elasticsearch allowed, both indexing and a ranking algorithm had to be implemented manually). The spec asked for the API first and that, if time allowed, some frontend would be nice.

I have since added to this project to better showcase my abilities as a fullstack engineer. I have included a TypeScript React frontend project using Redux, and Dockerized the stack to work with a Postgres instance. The FastAPI backend has been improved quite a bit also.

On launch the app will fetch a number of random articles from the Wikipedia API, store them in the Postgres DB instance, and then index them in an inverted index. You can then search for terms and the app will return the relevant article titles from those fetched, as ranked by a BM25 ranking algorithm.

Development

Install pre-commit hooks

This project uses pre-commit to run automated checks on code before it is committed. To install pre-commit hooks, follow these steps:

  1. Install pre-commit: pip install pre-commit
  2. Install pre-commit hooks: pre-commit install
  3. Run pre-commit hooks: pre-commit run -a

Setup

Environment variables

Environment variables will need to be set up for the app to run. You can do this by copying the .env.template to a file called .env in the envs directory, and then setting the values as required.

The POSTGRES_HOST variable will need to be set to db if you are running the app using Docker, or to localhost if you are running the app locally. To run the app locally, you will also need to set the POSTGRES_USER and POSTGRES_PASSWORD variables to the username and password of your Postgres user.

To set variables locally, you can use Dotenv to load the variables from the .env file. To do this, install Dotenv using pip install python-dotenv, then add the following to the top of the main.py file or settings.py file:

from dotenv import load_dotenv

load_dotenv("../envs/.env")

Alternatively you can manually set the variables in your terminal before running the app, e.g.:

export POSTGRES_USER=postgres
export POSTGRES_PASSWORD=password
...

or in Windows:

set POSTGRES_USER=postgres
set POSTGRES_PASSWORD=password
...

Docker (recommended)

Ensure that Docker is installed and that the Docker daemon is running (it will typically be running automatically, if not the following command will complain at you and you can start it manually then). In the root directory of the project, run:

docker-compose up --build

This will build the Docker image and run the backend app on port 8000, and the frontend app on port 3000 by default. You can change the ports by editing the docker-compose.yml file. The compose file will also create a Postgres container and automatically run the Alembic migrations to create the database and tables.

Locally (not recommended, but instructions are here anyway)

Database

To run the database locally you will need to install Postgres. Version 15.6 is used in this app, and it's recommended you use the same for this to work. You can find instructions for your OS at the Postgres website.

Once set up, you can use whatever database, user & password you like (you will need to set these in the .env file or manually, see above). With your Postgres instance running, you can create the tables by running the Alembic migrations. Navigate to the backend folder and run:

alembic upgrade head

Backend

Requires Python3.9+. Navigate to the backend folder and create and activate your virtual environment (venv recommended), then install the required packages using: pip install -Ur requirements/local.txt

Once requirements are installed, navigate to the src folder and run:

uvicorn main:app --reload

This will launch the app on port 8000 by default.

Frontend

Requires Node.js. Navigate to the frontend folder and install the required packages using:

npm install

Once packages are installed, run:

npm start

This will start the React app on port 3000 by default.

Using the app

The frontend React app will show a list of the random articles that have been fetched from Wikipedia. If you input a search term in the search field, the application will highlight the relevant articles based on the search term. You can then click on a highlighted article to view it in full on Wikipedia. To load more articles from Wikipedia, and add them to the index, click the React icon.

Backend API

Full list of endpoints and details can be found at http://127.0.0.1:8000/docs

Example search (you may need to change the query to see some results, since you will receive a different set of random articles): http://127.0.0.1:8000/search/?query=act

Example result:

{
    "results":
         [
           {"title": "Statue of Cosimo I", "ranking": 1.6436972762369082},
           {"title": "Trihalomethane", "ranking": 1.4771313555617704},
           {"title": "Joseph Henry Morris House", "ranking": 1.3022414039433408},
           {"title": "Neuadd Dwyfor", "ranking": 1.1306068876621977},
           {"title": "Gutierre Vermúdez", "ranking": 1.0185666732810175},
           {"title": "Great Bakersfield Fire of 1889", "ranking": 1.000698806614099},
           {"title": "Eureka Street (novel)", "ranking": 0.911915592081272},
           {"title": "The Real World: San Francisco", "ranking": 0.7130057951771621},
           {"title": "Laid Back", "ranking": 0.6889075703961695},
           {"title": "Yana Milev", "ranking": 0.3727990368545083}
         ]
}

Running tests

To run the automated tests locally, navigate to the backend directory and install the BE project as an editable dependency:

pip install -e .

Ensure you have the test requirements installed:

pip install -Ur requirements/test.txt

Then simply execute pytest.

About

Full-stack app that fetches some Wikipedia articles at random, indexes them (in local memory), and serves endpoints for searching the index using a BM25 ranking algorithm. Originally an eng challenge I was asked to spend 3hrs on, w/o a DB of any form, I have since added persistence by Dockerizing the app with a Postgres instance. Front-end WIP.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published