HTTP, SOCKS4, SOCKS5 proxies scraper and checker.
- Asynchronous.
- Uses regex to search for proxies (ip:port format) on a web page, which allows you to pull out proxies even from json without making any changes to the code.
- Supports determining the geolocation of the proxy exit node.
- Can determine if a proxy is anonymous.
You can get proxies obtained using this script in monosans/proxy-list.
- Download and unpack the archive with the program.
- Edit
config.ini
according to your preference. - Install Python (Windows 7 requires Python 3.8.X). During installation, be sure to check the box
Add Python to PATH
. - Install dependencies and run the script. There are 2 ways to do this:
- Automatic:
- On Windows run
start.cmd
- On Unix-like OS run
start.sh
- On Windows run
- Manual:
cd
into the unpacked folder- Install dependencies with the command
python -m pip install -U --no-cache-dir --disable-pip-version-check pip setuptools wheel; python -m pip install -U --no-cache-dir --disable-pip-version-check -r requirements.txt
- Run with the command
python -m proxy_scraper_checker
- Automatic:
When the script finishes running, the following folders will be created (this behavior can be changed in the config):
proxies
- proxies with any anonymity level.proxies_anonymous
- anonymous proxies.proxies_geolocation
- same asproxies
, but includes exit-node's geolocation.proxies_geolocation_anonymous
- same asproxies_anonymous
, but includes exit-node's geolocation.
Geolocation format is ip:port|Country|Region|City
.