Optimize repository and maintainer+repository events #58
Labels
Effort: simple
A few lines fix
Priority: someday
Normal priority
Topic: database
SQL code mostly
Topic: performance
Both optimizations and perf monitoring
Type: refactoring
Internal improvements which do not change behavior
We store a lot of events per repository (up to 1.2M as of now) and per maintainer+repository (up to 112k as of now) of which it's not possible to see more than 500 as feeds do not have pagination (and nobody have ever asked for it). Though we delete events older than a year, a lot of useless events still remain. We may want to introduce better cleanup procedure for these tables.
This would conserve a lot of space: events currently use around 1G of disk space (including indexes), and removing events starting with 500 per each repository/maintainer drops 97% repository entries and 52% repository + maintainer entries. We may want to introduce pagination though (esp for repository events) and bump it to e.g 10k, and even with that may still drop 80% and 30% resp.
Quick & dirty query to extract event ids to remove:
While here, we may consider adding id columns to event table indexes, which would fix the need to sort a lot of data in rare cases where there's a lot of events with single timestamp.
The text was updated successfully, but these errors were encountered: