AI Based Web-Scraping Agent

Project Overview

This project is an AI-based web-scraping agent designed to extract data from various websites efficiently. It leverages modern web technologies and machine learning algorithms to provide a robust and scalable solution for web scraping tasks.

Features

Automated Data Extraction: Automatically extracts data from specified websites.
AI Integration: Uses AI algorithms to enhance data extraction accuracy.
Scalability: Designed to handle large-scale web scraping tasks.
Customizable: Easily configurable to target different websites and data points.
Environment Management: Utilizes environment variables for secure API key management.
CSV Data Handling: Supports scraping data from CSV files.
Enhanced Error Handling: Improved error handling and logging mechanisms.

Applications

Market Research: Gather data from competitors' websites for market analysis.
Price Monitoring: Track prices of products across various e-commerce platforms.
Content Aggregation: Aggregate content from multiple sources for news or blog websites.
Data Mining: Extract large datasets for machine learning and data analysis.

Setup Instructions

Prerequisites

Node.js (v14 or higher)
Python (v3.8 or higher)
pip (Python package installer)
virtualenv (Python virtual environment tool)

Backend Setup

Clone the Repository:

git clone https://github.com/justAbhinav/AI-based-web-scraping-agent
cd AI-based-web-scraping-agent/backend

Create and Activate Virtual Environment:

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install Dependencies:
```
pip install -r requirements.txt
```
Set Up Environment Variables: Create a .env file in the topmost backend directory and add your API keys and other environment variables:
```
GEMINI_API_KEY=     # Your Gemini API Key
SERPAPI_API_KEY=   # Your SerpApi API Key
```
Run the Backend Server:
```
python app.py
```

Frontend Setup

Navigate to Frontend Directory:
```
cd ../frontend
```
Install Dependencies:
```
npm install
```
Set Up Environment Variables: Create a .env file in the frontend directory and add your environment variables:
```
REACT_APP_GOOGLE_CLIENT_ID=your-google-client-id
REACT_APP_GOOGLE_API_KEY=your-google-api-key
```
Run the Frontend Server:
```
npm start
```

Testing

You may use the provided testing.csv and a prompt like: "What is the annual income of these companies?" to test the application. The application will scrape the data from the provided CSV file and display the results on the frontend.

Additional Information

Documentation: Detailed documentation is available in the docs directory.
Contributing: Contributions are welcome!
License: This project is licensed under the MIT License. See the LICENSE file for details.

Learn More

Create React App Documentation
React Documentation
Python Virtual Environments
Flask Documentation

Troubleshooting

For common issues and troubleshooting steps, please refer to the Troubleshooting Guide.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For any questions or inquiries, please contact 22ucs004@lnmiit.ac.in.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

AI Based Web-Scraping Agent

Project Overview

Features

Applications

Setup Instructions

Prerequisites

Backend Setup

Frontend Setup

Testing

Additional Information

Learn More

Troubleshooting

License

Contact

Files

README.md

Latest commit

History

README.md

File metadata and controls

AI Based Web-Scraping Agent

Project Overview

Features

Applications

Setup Instructions

Prerequisites

Backend Setup

Frontend Setup

Testing

Additional Information

Learn More

Troubleshooting

License

Contact