Skip to content

justAbhinav/AI-based-web-scraping-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ai-agent-logo

AI Based Web-Scraping Agent

Project Overview

This project is an AI-based web-scraping agent designed to extract data from various websites efficiently. It leverages modern web technologies and machine learning algorithms to provide a robust and scalable solution for web scraping tasks.

Features

  • Automated Data Extraction: Automatically extracts data from specified websites.
  • AI Integration: Uses AI algorithms to enhance data extraction accuracy.
  • Scalability: Designed to handle large-scale web scraping tasks.
  • Customizable: Easily configurable to target different websites and data points.
  • Environment Management: Utilizes environment variables for secure API key management.
  • CSV Data Handling: Supports scraping data from CSV files.
  • Enhanced Error Handling: Improved error handling and logging mechanisms.

Applications

  • Market Research: Gather data from competitors' websites for market analysis.
  • Price Monitoring: Track prices of products across various e-commerce platforms.
  • Content Aggregation: Aggregate content from multiple sources for news or blog websites.
  • Data Mining: Extract large datasets for machine learning and data analysis.

Setup Instructions

Prerequisites

  • Node.js (v14 or higher)
  • Python (v3.8 or higher)
  • pip (Python package installer)
  • virtualenv (Python virtual environment tool)

Backend Setup

  1. Clone the Repository:

    git clone https://github.com/justAbhinav/AI-based-web-scraping-agent
    cd AI-based-web-scraping-agent/backend
  2. Create and Activate Virtual Environment:

    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
  3. Install Dependencies:

    pip install -r requirements.txt
  4. Set Up Environment Variables: Create a .env file in the topmost backend directory and add your API keys and other environment variables:

    GEMINI_API_KEY=     # Your Gemini API Key
    SERPAPI_API_KEY=   # Your SerpApi API Key
  5. Run the Backend Server:

    python app.py

Frontend Setup

  1. Navigate to Frontend Directory:

    cd ../frontend
  2. Install Dependencies:

    npm install
  3. Set Up Environment Variables: Create a .env file in the frontend directory and add your environment variables:

    REACT_APP_GOOGLE_CLIENT_ID=your-google-client-id
    REACT_APP_GOOGLE_API_KEY=your-google-api-key
  4. Run the Frontend Server:

    npm start

Testing

You may use the provided testing.csv and a prompt like: "What is the annual income of these companies?" to test the application. The application will scrape the data from the provided CSV file and display the results on the frontend.

Additional Information

  • Documentation: Detailed documentation is available in the docs directory.
  • Contributing: Contributions are welcome!
  • License: This project is licensed under the MIT License. See the LICENSE file for details.

Learn More

Troubleshooting

For common issues and troubleshooting steps, please refer to the Troubleshooting Guide.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For any questions or inquiries, please contact [email protected].

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published