This project was made as a final exercise for a data-oriented programming course at the University of Gdańsk.
It scrapes data from 2 big real-state listing portals, saves it into Azure Blob Storage in a JSON format, and lets you train and use different ML models based on fetched data. We also provided an API layer to let end-users easily interact with a system. Below you can see a diagram representing the system design.
The architecture of the system is depicted in the diagram below:
This project is designed to showcase our Python skills. We prioritize model clarity over complexity, and while the unit test coverage is minimal, the provided examples demonstrate our understanding of effective testing. The API design is intentionally streamlined for clarity and ease of understanding. Similarly, the data scraping component is kept straightforward to highlight its potential capabilities.
- Object-oriented programming (OOP)
- Type hints
- Generators
- Regular expressions
- dvc (with Azure Blob Storage)
- Git
- Azure Blob Storage
- FastAPI
- BeautifulSoup4
- Pydantic
- Docker
- GitHub Actions
- .pre-commit hooks
- Static analysis tools
- Unit tests
- Logging
- Environment variables
- Run
docker pull gdahuks/housing_price_prediction
to pull the Docker image from Docker Hub. Alternatively, you can build the image yourself (needed if target platform is not linux/amd64 or linux/arm64) by runningdocker build -t housing_price_prediction .
in the project's root directory. - Create a
.env
file and update it with credentials for the container storing scraping results following the.env.template
file. Alternatively, you can pass the environment variables directly to thedocker run
command (see step 3). - Run
docker run -d --publish 8000:8000 --env-file .env housing_price_prediction
to run Docker container on port 8000. - Visit http://0.0.0.0:8000/docs to explore the API documentation. (Since Swagger does not support Body in GET, for predictions you should use another tool such as Postman).
- Install requirements from the
requirements.txt
file. - Create a
.env
file and update it with credentials for the container storing scraping results following the.env.template
file. Alternatively, you can pass the environment variables directly to thedocker run
command (see step 3). - Run
docker run -d --publish 8000:8000 --env-file .env housing_price_prediction
to run Docker container on port 8000. - Visit http://0.0.0.0:8000/docs to explore the API documentation. (Since Swagger does not support Body in GET, for predictions you should use another tool such as Postman).
- Create a
config.local
file in the.dvc/
directory and update it with your credentials to trained model in Azure Blob Storage (url with SAS token) the.dvc/config.local.template
file. - Run
dvc pull
to download data from Azure Blob Storage.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.