GitHub - geonolis/skroutz-scraper: A Scrapy project to scrape data from https://www.skroutz.gr

Spiders

The categories Spider searches for all the categories and saves the URLs in a .txt file.
The pic Spider downloads all the product images of a particular category (starting from the urls
listed in the start_urls attribute). You can modify start_urls attribute in pic.py to process more categories and the IMAGES_STORES attribute in settings.py to change the download location.

To run a spider (for example pic) : scrapy crawl pic

Pipelines

CustomImagesPipeline overrides the default ImagesPipeline functionality. It stores the images in
dedicated directories for every item (product) after the download process has been completed.

Requirements

Skroutz-scraper requires Scrapy and Pillow in order to work.

Disclaimer

Always respect the policy of the website and the restrictions of robots.txt.
Change the USER_AGENT variable in settings.py to identify yourself (and your website).

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
skroutz		skroutz
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spiders

Pipelines

Requirements

Disclaimer

About

Releases

Packages

Languages

License

geonolis/skroutz-scraper

Folders and files

Latest commit

History

Repository files navigation

Spiders

Pipelines

Requirements

Disclaimer

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages