Skip to content

Scrapy crawler with mail sending for new and updated items

Notifications You must be signed in to change notification settings

mirzadelic/crawler

Repository files navigation

Crawler

Scrapy crawler fetches ads from filtered URL of one of the websites and send an email with new and updated ads.

The crawler is sending new ads(ads that are fetched the first time for that site/URL), and updated ads(which are already fetched but the price changed so you can see previous and new price).

It uses the Postgres database to store ads and sites/URLs.

Available websites:

  • polovniautomobili.com
  • kupujemprodajem.com
  • nekretnine.rs

Instalation

Create virtualenv and install requirements:

pip install -r requirements.txt

Create a crawler/.env file. Use crawler/.env.example to see all possible variables.

Usage

Create site settings

Create a new site/URL with:

python create_site_url.py

Run crawler

python run.py

This will run crawler and send an email with new and updated ads based on the last visit.

Cronjob

You can set up a cronjob to run the command every couple of hours.

For example, this is my cronjob on server:

0 10,20 * * * cd /opt/crawler/crawler/ && ../venv/bin/python3 run.py

Will run every day at 10:00 and 20:00 to get email about new and updated ads.

Info

This is made for personal usage. Feel free to use it, add more spiders or improve code.

Contact

About

Scrapy crawler with mail sending for new and updated items

Topics

Resources

Stars

Watchers

Forks