Improved code utility and data validation
- Check labels match precursor
- Check for data leakage
- Verify residue vocabulary
- Added better residue support
- Fine-tuning trainer automatically updates model weights with new sizes
- Added Flash attention, torch.compile(), AMP (fp16)
- Added improved fast greedy search
- Improved test coverage
Added Spectrum Data Handler
- Supports lazy loading with asynchronous prefetching
- Filtering and sampling performed non-destructively (by updating the row filter)
- Two-fold shuffling strategy for training ensures optimal load times
Extended model checkpoint released. Trained on 32M spectra with additional PTMs:
- AC-PT
- Additional PRIDE dataset
- Additional phosphorylation dataset