mini_google/python_crawler at main · maxymkuz/mini_google

History

Name		Name	Last commit message	Last commit date
parent directory ..
database		database
old_files		old_files
Dockerfile		Dockerfile
README.md		README.md
__init__.py		__init__.py
build.sh		build.sh
crawler.py		crawler.py
requirements.txt		requirements.txt
websites.txt		websites.txt

README.md

Python Crawler

Implementation of crawlers and their manager written on python Crawlers can collect all text, structured data and links from the given list of webpages

Before

Install python libraries with

pip install --no-cache-dir -r requirements.txt

Usage (without Docker):

cd python-crawler
python main.py "in file" "max depth" "number of threads" "concurrent_tasks" "max_queue_size" "max_cycles" "delay"

Usage (with Docker):

To build container:

./build.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

python_crawler

python_crawler

README.md

Python Crawler

Before

Usage (without Docker):

Usage (with Docker):

Files

python_crawler

Directory actions

More options

Directory actions

More options

Latest commit

History

python_crawler

Folders and files

parent directory

README.md

Python Crawler

Before

Usage (without Docker):

Usage (with Docker):