August 2021 - Make the code nicer, urlscan.io integration.
New Features:
- Integration with urlscan.io - Documentation
- Trigger a capture from the URL - #248
- Archiving: the captures more than 6 month old (configurable) are moved to an archive directory so they're not listed on the index anymore, but the captures can still be accessed by UUID (doesn't break permanent URLs)
- Index file by directory for each captures (archived or not). Greatly reduces the I/O when initializing the known captures in redis.
Fixes:
- Missing 3rd party web dependencies in docker (thanks to @FafnerKeyZee)
Changes - This release is implementing a lot of back end changes :
- The captures are now stored a by year and month (instead of in a single directory) to avoid having too many entries in the same directory (ext4 dislikes it). All the new captures are following this new architecture, but you need to run
tools/change_captures_dir.py
to move the existing ones to the new format (only useful if you feel restarting the app takes too much time) - Move all the capture-related code from
Lookyloo
toAsyncCapture
- Move all the services management code to abstractmanager
- Use redis pooling to manage connections to the database in
Lookyloo
andIndexing
- New process to trigger occasional actions, currently: generate the daily user-agent file if Lookyloo is using the UAs of its own users.
- Reinitialize the list of captures UUIDs when starting the app instead of the in website itself
- Improvements in processes handling (TL;DR: don't stop redis until all the async captures processes are down)
- Move some methods from
Lookyloo
to the helpers - Simplify code in
Lookyloo
to make it more readable, remove dead code. - Bump dependencies, add
hiredis
to speed up redis interactions - Return proper HTTP error codes (mostly 4XX), when appropriate