GitHub - hieutkt/price-webscraping: Price webscaping from Vietnam e-commerce websites (and more)

Introduction

This repository contains archived source code for automatically web scraping price data from largest Vietnamese e-commerce websites. The scraping algorithm utilizes BeautifulSoup and selenium (when needed, see e.g. py/prices_lazadavn.py).

Each website requires its own scraper, defined by a corresponding file in the /py folder. An interface for managing web scraping processes is built using tmuxp. The scraping for each website starts daily at a random time between 0AM and 1AM, finishes its tasks, then hibernates until the next day.

Setting up

This repository requires existing installations of Python 3.8, pipenv and tmuxp. After cloning this repository locally, run the following command from the project source to set up a Pipenv environment and initiate all scrapers:

tmuxp load ./src/tmuxp-scraping-session.yaml

Name		Name	Last commit message	Last commit date
Latest commit History 145 Commits
bin		bin
py		py
scr		scr
.gitignore		.gitignore
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Setting up

About

Releases

Packages

Contributors 2

Languages

hieutkt/price-webscraping

Folders and files

Latest commit

History

Repository files navigation

Introduction

Setting up

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages