Skip to content

infsahitya/ai-web-scraper

Repository files navigation

Logo

AI Web Scraper

AI Web Scraper is an innovative and powerful AI powered web scraping tool designed to extract and compile data from websites with unparalleled efficiency, precision and super enhanced data quality. Utilizing cutting-edge technologies like NestJS and Cheerio.

It delves into the depths of nested web pages, seamlessly harvesting valuable information such as job descriptions, responsibilities, requirements, benefits, salaries, and tech stacks.

✨ Features

  • AI Enhanced
  • Enriched Data Quality
  • Customizable Scraping Patterns
  • Real-Time Data Processing
  • Robust Error Handling
  • Data Export Options

🧑🏻‍🔬 Contributors

⚙️ Installation

Install crowdfunding-dapp with npm

  git clone https://github.com/infsahitya/ai-web-scraper.git
  
  cd ./ai-web-scraper
  
  npm install

Set Environment Variables

  # Create .env file in the root of your project directory

  NODE_ENV="development"

  OPENAI_API_KEY=<your-openai-api-key>

  CRAWL_URLS=https://remoteok.com/remote-engineer-jobs,https://remoteok.com/remote-exec-jobs,https://remoteok.com/remote-senior-jobs,https://remoteok.com/remote-dev-jobs,https://remoteok.com/remote-finance-jobs

🛠️ Run Locally

Execute the following commands inside project directory.

Install dependencies:

  npm install

Start the server:

  npm run start:dev

🧪 Test

To run tests, open an API Platform of your choice (Postman, Insomnia, any browser)

  curl 'http://localhost:3000/v1/crawler'

Inside project directory

  cd ./data
  
  # Check the latest created folder

🔦 Tech Stack

Node, NestJS, OpenAI (GPT-4 Turbo Model) Cheerio, JSON2CSV, Axios

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published