Fix aggregate inclusion binding threshold and add new --aggregate-inclusion-count-limit parameter #1147
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes an issue we recently discovered where using the
--allele-specific-binding-thresholds
parameter would not use the--aggregate-inclusion-binding-threshold
but instead fall back on selecting entries to be included based on the--binding-threshold
as the cutoff. With this PR, the--aggregate-inclusion-binding-threshold
will now also be respected/used, even if the--allele-specific-binding-thresholds
is set.However after fixing this bug we discovered that it leads to a large number of peptide candidates being included with the default cutoff of 5000 (particularly for frameshift variants), exceeding GitHub file size limits for the pVACview demo data (specifically, the metrics file) and slowness when loading the data in pVACview.
To solve this, this PR also adds a new parameter
--aggregate-inclusion-count-limit
(default: 15) which, for variants exceeding 15 peptides passing the--aggregate-inclusion-binding-threshold
, will limit the included entries to those of the 15 best peptides (as defined by our algorithm for selecting the best peptide per variant).By limiting the number of peptides included for each variant, there are less outliers with a large number of candidates. This previously resulted in us limiting the number of candidates displayed in the pVACview anchor heatmap. With this update, this limitation has been removed and all included peptides candidates are now shown in the anchor heatmap.