This repository contains archived source code for automatically web scraping price data from largest Vietnamese e-commerce websites.
The scraping algorithm utilizes BeautifulSoup
and selenium
(when needed, see e.g. py/prices_lazadavn.py).
Each website requires its own scraper, defined by a corresponding file in the /py folder. An interface for managing web scraping processes is built using tmuxp. The scraping for each website starts daily at a random time between 0AM and 1AM, finishes its tasks, then hibernates until the next day.
This repository requires existing installations of Python 3.8, pipenv
and tmuxp
.
After cloning this repository locally, run the following command from the project source to set up a Pipenv environment and initiate all scrapers:
tmuxp load ./src/tmuxp-scraping-session.yaml