Skip to content

Latest commit

 

History

History
126 lines (86 loc) · 3.84 KB

README.md

File metadata and controls

126 lines (86 loc) · 3.84 KB

ai-agent-logo

AI Based Web-Scraping Agent

Project Overview

This project is an AI-based web-scraping agent designed to extract data from various websites efficiently. It leverages modern web technologies and machine learning algorithms to provide a robust and scalable solution for web scraping tasks.

Features

  • Automated Data Extraction: Automatically extracts data from specified websites.
  • AI Integration: Uses AI algorithms to enhance data extraction accuracy.
  • Scalability: Designed to handle large-scale web scraping tasks.
  • Customizable: Easily configurable to target different websites and data points.
  • Environment Management: Utilizes environment variables for secure API key management.
  • CSV Data Handling: Supports scraping data from CSV files.
  • Enhanced Error Handling: Improved error handling and logging mechanisms.

Applications

  • Market Research: Gather data from competitors' websites for market analysis.
  • Price Monitoring: Track prices of products across various e-commerce platforms.
  • Content Aggregation: Aggregate content from multiple sources for news or blog websites.
  • Data Mining: Extract large datasets for machine learning and data analysis.

Setup Instructions

Prerequisites

  • Node.js (v14 or higher)
  • Python (v3.8 or higher)
  • pip (Python package installer)
  • virtualenv (Python virtual environment tool)

Backend Setup

  1. Clone the Repository:

    git clone https://github.com/justAbhinav/AI-based-web-scraping-agent
    cd AI-based-web-scraping-agent/backend
  2. Create and Activate Virtual Environment:

    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
  3. Install Dependencies:

    pip install -r requirements.txt
  4. Set Up Environment Variables: Create a .env file in the topmost backend directory and add your API keys and other environment variables:

    GEMINI_API_KEY=     # Your Gemini API Key
    SERPAPI_API_KEY=   # Your SerpApi API Key
  5. Run the Backend Server:

    python app.py

Frontend Setup

  1. Navigate to Frontend Directory:

    cd ../frontend
  2. Install Dependencies:

    npm install
  3. Set Up Environment Variables: Create a .env file in the frontend directory and add your environment variables:

    REACT_APP_GOOGLE_CLIENT_ID=your-google-client-id
    REACT_APP_GOOGLE_API_KEY=your-google-api-key
  4. Run the Frontend Server:

    npm start

Testing

You may use the provided testing.csv and a prompt like: "What is the annual income of these companies?" to test the application. The application will scrape the data from the provided CSV file and display the results on the frontend.

Additional Information

  • Documentation: Detailed documentation is available in the docs directory.
  • Contributing: Contributions are welcome!
  • License: This project is licensed under the MIT License. See the LICENSE file for details.

Learn More

Troubleshooting

For common issues and troubleshooting steps, please refer to the Troubleshooting Guide.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For any questions or inquiries, please contact [email protected].