Release August 2021 - Make the code nicer, urlscan.io integration. · Lookyloo/lookyloo

New Features:

Integration with urlscan.io - Documentation
Trigger a capture from the URL - #248
Archiving: the captures more than 6 month old (configurable) are moved to an archive directory so they're not listed on the index anymore, but the captures can still be accessed by UUID (doesn't break permanent URLs)
Index file by directory for each captures (archived or not). Greatly reduces the I/O when initializing the known captures in redis.

Fixes:

Changes - This release is implementing a lot of back end changes :

The captures are now stored a by year and month (instead of in a single directory) to avoid having too many entries in the same directory (ext4 dislikes it). All the new captures are following this new architecture, but you need to run tools/change_captures_dir.py to move the existing ones to the new format (only useful if you feel restarting the app takes too much time)
Move all the capture-related code from Lookyloo to AsyncCapture
Move all the services management code to abstractmanager
Use redis pooling to manage connections to the database in Lookyloo and Indexing
New process to trigger occasional actions, currently: generate the daily user-agent file if Lookyloo is using the UAs of its own users.
Reinitialize the list of captures UUIDs when starting the app instead of the in website itself
Improvements in processes handling (TL;DR: don't stop redis until all the async captures processes are down)
Move some methods from Lookyloo to the helpers
Simplify code in Lookyloo to make it more readable, remove dead code.
Bump dependencies, add hiredis to speed up redis interactions
Return proper HTTP error codes (mostly 4XX), when appropriate

Provide feedback