Skip to content
/ torpedo Public

A simple python package to provide a TOR proxy for scraping sites

Notifications You must be signed in to change notification settings

mipo57/torpedo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TORpedo

A simple python package to provide a TOR proxy for scraping sites

Prerequisites

You need to have a docker installed on your system. Also, currently the package only works for linux

Instalation

pip3 install git+https://github.com/mipo57/torpedo.git

Proxy Usage

import torpedo

with torpedo.new_session() as session:
    print(session.get("http://api.myip.com/").text)

The session object is derivative of requests.Session so u can use it exactly like you would use requests.Session normally. Mind that initialization (torpedo.new_session()) can take some time, so it's best to use single session for as long as possible. Also keep in mind that requests going through tor can be MUCH slower than direct ones. It's best to use this package in distributed context, where you would have number of scraping processes running in pararell, so that you don't wait too long for single request.

How it works?

Under the hood, for every session new docker container is started. This docker container will provide a proxy that the http and https requests will go through.

Runner Usage

import torpedo

def scrape(request_result):
    # Your custom scraping function

    return {'price': 13, 'weight': 15, 'name': "meat"}


sites = [
    "https://example.com/example1",
    "https://example.com/example2",
    "https://example.com/example3",
    "https://example.com/example4"
]

torpedo.run(
    scraping_func = scrape,
    urls = sites,
    num_workers = 15,
    max_retries = 4,
    request_timeout = 5.0
)

# results = [
#   {'price': 13, 'weight': 15, 'name': "meat"},
#   {'price': 13, 'weight': 15, 'name': "meat"},
#   {'price': 13, 'weight': 15, 'name': "meat"},
#   {'price': 13, 'weight': 15, 'name': "meat"}
# ]

About

A simple python package to provide a TOR proxy for scraping sites

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published