Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide what to do with coverage reporting in presence of large deletions. #1193

Open
Donaim opened this issue Nov 1, 2024 · 1 comment
Open

Comments

@Donaim
Copy link
Member

Donaim commented Nov 1, 2024

Decide what to do with coverage reporting in presence of large deletions.

Currently, we can have following two cases:

  1. query aligned as 100M600D100M somewhere in the reference. Then coverage values for the big deletion in the middle are missing. (reference region is not covered by query)
  2. query aligned as 100M599D100M somewhere in the reference. Then coverage values for the big deletion in the middle are present (reference region is covered by query).

The threshold of 600 deletions is sort of arbitrary.

We would like to develop a better decision procedure on what to report as "coverage".
Possibly, one that looks into the individual reads (from fastq files) in order to see whether it was the reads that spanned the big deletion, or whether the query is two separate consensus sequences "stitched" together.

@Donaim Donaim added this to the far future milestone Nov 1, 2024
@Donaim
Copy link
Member Author

Donaim commented Nov 1, 2024

The current threshold is defined here:

MAX_GAP_SIZE = 600 # TODO: make this smaller?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant