Skip to content

geonolis/skroutz-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spiders

  • The categories Spider searches for all the categories and saves the URLs in a .txt file.
  • The pic Spider downloads all the product images of a particular category (starting from the urls
    listed in the start_urls attribute). You can modify start_urls attribute in pic.py to process more categories and the IMAGES_STORES attribute in settings.py to change the download location.

To run a spider (for example pic) : scrapy crawl pic

Pipelines

CustomImagesPipeline overrides the default ImagesPipeline functionality. It stores the images in
dedicated directories for every item (product) after the download process has been completed.

Requirements

Skroutz-scraper requires Scrapy and Pillow in order to work.

Disclaimer

Always respect the policy of the website and the restrictions of robots.txt.
Change the USER_AGENT variable in settings.py to identify yourself (and your website).


About

A Scrapy project to scrape data from https://www.skroutz.gr

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages