Grandha Product Scraper

Note: This README was generated with ChatGPT

This project is a scraper to collect information and images of products listed on the Grandha website. The script accesses product pages, extracts detailed data and images, and organizes this information into a CSV file and image subfolders.

Features

Data Extraction: Collects detailed information on each product, including name, reference, line, price, description, barcode, usage instructions, composition, and product link.
Image Extraction: Downloads all product images and stores them in subfolders organized by the product name.
Data Structure: The data is saved in a CSV file in the data/ directory, while images are stored in subfolders within data/images.

Requirements

Python 3.10+
Required libraries:
- requests
- pandas
- beautifulsoup4
- lxml

Install Dependencies

Install dependencies by running:

pip install -r requirements.txt

Note: Create a requirements.txt file with the following content:
requests
pandas
beautifulsoup4
lxml

Project Structure

grandha_product_scraper/
├── data/
│   ├── images/
│   │   └── Product_Name/
│   └── Produtos_da_Grandha.csv
├── grandha_combined_scraper.py
├── grandha_scrapper.py
├── grandha_img_scrapper.py
└── README.md

Note: The files grandha_scraper.py and grandha_img_scraper.py were used for study and prototype.

Usage

Place the grandha_combined_scraper.py script in your local environment.
In the terminal, run the script:
```
python grandha_combined_scraper.py
```
The script will:
- Go through all specified product pages.
- Extract data for each product and save it in the data/Produtos_da_Grandha.csv file.
- Create a subfolder for each product in data/images/ where it will save the respective images.
Check the CSV file and downloaded images in the data/ directory.

Notes

Ensure a stable internet connection, as the script relies on HTTP requests to access data.
Script execution may take some time, depending on the number of products and your internet speed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Grandha Product Scraper

Features

Requirements

Install Dependencies

Project Structure

Usage

Notes

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
.gitignore		.gitignore
README.md		README.md
grandha_combined_scraper.py		grandha_combined_scraper.py
grandha_img_scraper.py		grandha_img_scraper.py
grandha_scraper.py		grandha_scraper.py
requirements.txt		requirements.txt

andrade-renan/mining-grandha-products-info

Folders and files

Latest commit

History

Repository files navigation

Grandha Product Scraper

Features

Requirements

Install Dependencies

Project Structure

Usage

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages