Skip to content

Collection of product scrapers for various websites.

Notifications You must be signed in to change notification settings

rtunazzz/Craper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Craper

A collection of product image scrapers for various websites.

What does this do?

This script can scrape product images from various websites (listed below) by their product IDs. Those product IDs then can be used to get early links/ early PIDs for each website.

When the command is ran, it looks for new products, saves any new ones into a database and sends you a Discord webhook for each new product found.



Supported sites

Support for more websites yet to come.

Website name Command parameter Website URL
Footpatrol footpatrol https://www.footpatrol.com/
Size size https://www.size.co.uk/
JDSports (EU) jdsports https://www.jdsports.co.uk/
TheHipStore thehipstore https://www.thehipstore.co.uk/
Solebox solebox https://solebox.com/
Snipes snipes https://snipes.com/
Onygo onygo https://onygo.com/
Courir courir https://www.courir.com/



Setup

Python 3.9+ is required!

  1. Clone this repository
git clone https://github.com/rtunazzz/Craper
  1. Create required files
./bin/config.sh
  1. Add your webhooks, footer & color preferences into the craper/config/config.json file.
  2. (Optional) Add proxies to the craper/config/proxies.txt file

If you're struggling with setting up these configuration files, I recommend checking out these examples!

Note

Proxy usage is not required but recommended for websites that ban often, such as Solebox, Snipes or Onygo.



Installation

Make sure to have everything set up properly before installing.

python setup.py install

Then you can go ahead and start using the command:

# Show the usage info
craper -h

# Start a Footpatrol scraper
craper footpatrol

# Start 10 Footpatrol scrapers, each scraping 100 product IDs
craper footpatrol -t10 -n100

# Start one scraper with proxies, starting from pid 01925412
craper solebox -pt 1 -s 01925412



Example

craper size -t10 -n5 -s 10

Example of the running command.

Contributing

If you'd like to contribute, feel free to open a pull request!

Adding sites

Adding sites should be relatively easy. All you need to do, is add a model (ideally into a separate file) into the models directory. Afterwards, make sure to import it into the init file to ensure easy importing into the main scraper.py file. Afterwards, just update the SITES variable and that should be it!

About

Collection of product scrapers for various websites.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published