Skip to content

Yellow Pages scraper webapp made with Flask, requests, and beautifulsoup. Outputs results to users as csvs. Uses random proxy and user-agent for each request.

Notifications You must be signed in to change notification settings

ZaxR/YP_scraper

Repository files navigation

YP_scraper

YP_scraper is a Yellow Pages scraper webapp. It's built on Flask, with scraping via requests and lxml, and tasks being handled by Redis and Celery. Each request uses a random proxy and user-agent. Results are output as csvs, for local download or e-mail.

A hosted version of the project can be found at https://yp-scraper.herokuapp.com/

Running locally

  1. Create a .env file. This file must have the following variables, with the values for Postgres-related variables being suggested defaults:
POSTGRES_USER=postgres
POSTGRES_PASSWORD=password
POSTGRES_DB=postgres
POSTGRES_URL=db
POSTGRES_PORT=5432

REDIS_URL=redis://redis:6379/0

[email protected]
MAIL_PASSWORD=your_password
  1. Build/run all services: docker-compose up --build

Running on Heroku

Heroku will read the heroku.yml file on deploying from this repo. To securely store the env vars/secrets in .env, Heroku has "Config Vars" that you can configure through the heroku cli or the UI.

Navigate to https://dashboard.heroku.com/apps/what_you_name_the_app/settings . Click "Reveal Config Vars" and manually enter the key=value pairs related to e-mail setup as are in the .env for running locally. Since Postgres and Redis are set up as add-on services, they will automatically have their config vars set up.

About

Yellow Pages scraper webapp made with Flask, requests, and beautifulsoup. Outputs results to users as csvs. Uses random proxy and user-agent for each request.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published