crawlers

Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.

java search-engine crawler flexible web-crawler crawlers filesystem-crawler collector-http collector-fs

Updated Nov 19, 2024
Java

ArchiveTeam / wget-lua

Star

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

crawler scraper downloader spider lua ftp scraping crawling archiving wget crawl zstd crawlers warc webarchiving archiveteam wget-lua

Updated Nov 19, 2024
C

narkhedesam / Proxy-List-Scrapper

Star

Proxy List Scrapper

Updated Feb 1, 2023
Python

jonasjacek / robots.txt

Star

Simple robots.txt template. Keep unwanted robots out (disallow). White lists (allow) legitimate user-agents. Useful for all websites.

search-engine whitelist user-agent seo crawling twitterbot robots-txt googlebot crawlers web-crawling bingbot robots-exclusion-standard blocking-bots web-robots search-engine-optimization baiduspider

Updated Feb 18, 2024

behitek / social-scraper

Star

Vietnamese text data crawler scripts for various sites (including Youtube, Facebook, 4rum, news, ...)

instagram crawler scraper youtube requests crawlers scraping-websites crawling-framework selenium-python

Updated Oct 25, 2022
Python

howie6879 / hproxy

Star

hproxy - Asynchronous IP proxy pool, aims to make getting proxy as convenient as possible.(异步爬虫代理池)

crawler schedule proxy sanic asyncio crawlers proxy-pool proxy-spider hproxy

Updated Dec 13, 2021
Python

Potelo / laravel-block-bots

Star

Block crawlers and high traffic users on your site by IP using Redis

laravel bots crawlers scrapper

Updated Sep 24, 2023
PHP

Sneakpeek is a framework that helps to quickly and conviniently develop scrapers. It’s the best choice for scrapers that have some specific complex scraping logic that needs to be run on a constant basis

python crawler scraper vue scraping crawling python3 scrapers scraper-engine crawlers crawling-framework website-crawler scraping-framework crawler-python scraper-api crawling-engine

Updated Aug 19, 2023
Python

BaseMax / GooglePlayWebServiceAPI

Sponsor

Star

Tiny script to crawl information of a specific application in the Google play/store base on PHP.

api php crawler google-play google-play-services crawlers hacktoberfest php-crawler google-play-store google-play-games google-play-service google-playstore hacktoberfest2020 google-play-api crawler-php

Updated May 21, 2023
PHP

Symbolexe / Raven

Star

Raven is a powerful and customizable web crawler written in Go.

golang crawler crawling pentesting bugbounty crawlers

Updated Sep 3, 2024
Go

peterbencze / serritor

Star

Serritor is an open source web crawler framework built upon Selenium and written in Java. It can be used to crawl dynamic web pages that require JavaScript to render data.