Dorktuah is a powerful Python tool designed for advanced Google dorking and web scraping through proxy rotation. It leverages multiple search engines while maintaining anonymity through an extensive proxy system. The project aims to provide researchers, security professionals, and developers with a reliable tool for gathering information while avoiding rate limiting and IP blocks.
Key features:
- Automated proxy rotation system
- Support for multiple proxy types (HTTP, SOCKS4, SOCKS5)
- Built-in proxy scraper with 100+ sources
- Proxy health checking and validation
- Clean and structured search results
- Rate limit avoidance through proxy rotation
- Custom proxy list support
The name "Dorktuah" combines "dork" (referring to Google dorking) with "tuah" (meaning luck/fortune in Malay), signifying a fortunate/successful dorking tool (also referencing to hawktuah because this project is hawktuah).
Important
Remember to use this tool responsibly and in accordance with the target website's terms of service and applicable laws.
To use Dorktuah, follow these steps:
- Clone the repository:
git clone https://github.com/CantCode023/dorktuah.git
- Install required dependencies:
pip install -r dorktuah/requirements.txt
- Run CLI and you're done!
python dorktuah
- set up project structure
- write proxy rotation implementation
- proxy_pool as function
- requests.get(proxies=ProxyPool()) for easier rotation
- ProxyPool()
- type:Literal["socks4", "socks5", "http", "all"] = "all"
- get proxies from proxies.txt
- write a proxy checker to make sure the returned proxy is alive
- convert proxypool to class to allow argument inheritance for easier argument initalization across methods
- write engine implementation
- use BeautifulSoup
- Engine()
- proxy_pool implementation
- write etools scraping implementaiton
- do pagination to retrieve every results
- load_more_results, pagination, get_source, search
- combine those 4
-
- search, open ectools and search for the query
- WE ALSO WANT TO GIVE THE USER THE ABILITY TO GET NEXT RESULT
- def has_more_results()
- if has_more_results then show "click enter to go next" in cli
- if doesn't then don't show it.
- if click enter then load_more_results()
- get source and return
- do pagination to retrieve every results
- implement proxy pool in engine.py
- make it into a cli using colorama and rich maybe?
- make header "dorktuah"
- make subheader "Dork across search engines."
- put credentials (author, github, discord)
make textbox to ask for query using rich maybe
- add config in cli (write config.json file)
- add proxy support
- enable proxy (y/n)
- use custom proxy (y/n)
- proxy type (socks4/socks5/http/all)
- proxy path
- add path checking to check if it exists
- source_limit (1-100)
- add proxy support
- add scrape proxies and check proxies to ProxyPool to get newest proxies
- make proxy checker faster
- make proxy checking using asynchronous functions
- refactor to follow the SOLID principle
- fix error Exception in callback _ProactorBasePipeTransport._call_connection_lost(None) handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)> Traceback (most recent call last): File "C:\Users\cantc\AppData\Local\Programs\Python\Python312\Lib\asyncio\events.py", line 88, in _run self._context.run(self._callback, *self._args) File "C:\Users\cantc\AppData\Local\Programs\Python\Python312\Lib\asyncio\proactor_events.py", line 165, in _call_connection_lost self._sock.shutdown(socket.SHUT_RDWR) ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host