AI Web Scraper is an innovative and powerful AI powered web scraping tool designed to extract and compile data from websites with unparalleled efficiency, precision and super enhanced data quality. Utilizing cutting-edge technologies like NestJS and Cheerio.
It delves into the depths of nested web pages, seamlessly harvesting valuable information such as job descriptions, responsibilities, requirements, benefits, salaries, and tech stacks.
- AI Enhanced
- Enriched Data Quality
- Customizable Scraping Patterns
- Real-Time Data Processing
- Robust Error Handling
- Data Export Options
Install crowdfunding-dapp with npm
git clone https://github.com/infsahitya/ai-web-scraper.git
cd ./ai-web-scraper
npm install
Set Environment Variables
# Create .env file in the root of your project directory
NODE_ENV="development"
OPENAI_API_KEY=<your-openai-api-key>
CRAWL_URLS=https://remoteok.com/remote-engineer-jobs,https://remoteok.com/remote-exec-jobs,https://remoteok.com/remote-senior-jobs,https://remoteok.com/remote-dev-jobs,https://remoteok.com/remote-finance-jobs
Execute the following commands inside project directory.
Install dependencies:
npm install
Start the server:
npm run start:dev
To run tests, open an API Platform of your choice (Postman, Insomnia, any browser)
curl 'http://localhost:3000/v1/crawler'
Inside project directory
cd ./data
# Check the latest created folder
Node, NestJS, OpenAI (GPT-4 Turbo Model) Cheerio, JSON2CSV, Axios