Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use racon for de novo variant discovery #299

Merged
merged 138 commits into from
Nov 17, 2022
Merged

Use racon for de novo variant discovery #299

merged 138 commits into from
Nov 17, 2022

Conversation

mbhall88
Copy link
Member

@mbhall88 mbhall88 commented Oct 17, 2022

TODO

  • Sort out the merge conflicts
  • Define SAM file format (SAM file discussion #300)
  • Finish producing the debugging extra files (we create four extra files when mapping for each sample: a minimizer PAF file, a .minimatches file, and two files describing the clusters definition and filtering. These are debugging files, they seem very useful, but there is no need to thorough unit test. Their production will be hidden by default, and can be activate just with an env variable);
  • Check new index param: indexing_upper_bound;
  • Check and test racon discovery;
  • Test out the rectified version for clear runtime bugs
  • Write any further unit tests
  • Update documentation

leoisl added 30 commits May 14, 2021 11:55
Gathering forward declarations in a single file to avoid include loop, some method renaming, and logging refactoring
…e (more lenient)

bad clusters will be filtered out anyway
…descriptions...

... regarding strict insertions and deletions
As soon as we see a previously polished sequence, we stop
@leoisl
Copy link
Collaborator

leoisl commented Oct 31, 2022

Hey @mbhall88 , sorry, I just corrected two big bugs in the two most recent commits.

Bug 1: In this commit we started outputting SAM lines with flag 16, where the read maps as reverse complemented. However, when running discovery, we would just pull out reads where the flag was 0, i.e. I introduced a bug in this commit and de novo would just consider reads that mapped in the forward strand, effectively losing around half of the reads. This was fixed in this commit. This probably affects your results.

Bug 2: In this commit, I changed the order of them SAM extra fields so that is more pleasant to look at, so that very long extra fields (i.e. the flanks and the PRG path) are the last fields. Unfortunately, de novo racon code would find the left and the right flank fields by using predefined indexes, and not looking by the LF:Z and RF:Z tag. This was just fixed in this commit. This severely impacts your results, as you basically lost the left and right flank, but this commit was on 27th October, maybe you had not pulled these changes when you ran pandora...

@leoisl
Copy link
Collaborator

leoisl commented Oct 31, 2022

Finished memfd implementation (see #302) and improved the performance on some critical regions, which can yield improved runtime if pandora discover is ran with several threads. Just realised I have to do at least a few tests with different number of threads to see if the current code works as expected with several threads and how it scales with more threads

@mbhall88
Copy link
Member Author

mbhall88 commented Nov 1, 2022

pandora-e164973.gz

@mbhall88
Copy link
Member Author

mbhall88 commented Nov 1, 2022

Ooops I assume the tip of this PR is not functional yet? With that binary I got the following error

[racon::createPolisher] error: file /proc/874910/fd/4 has unsupported format extension (valid extensions: .fasta, .fasta.gz, .fna, .fna.gz, .fa, .fa.gz, .fastq, .fastq.gz, .fq, .fq.gz)!

Will retry with the commit you suggested in #303 9bb67c6

@mbhall88
Copy link
Member Author

mbhall88 commented Nov 1, 2022

pandora-9bb67c6.gz

@mbhall88
Copy link
Member Author

mbhall88 commented Nov 1, 2022

Ooops I assume the tip of this PR is not functional yet? With that binary I got the following error

[racon::createPolisher] error: file /proc/874910/fd/4 has unsupported format extension (valid extensions: .fasta, .fasta.gz, .fna, .fna.gz, .fa, .fa.gz, .fastq, .fastq.gz, .fq, .fq.gz)!

Will retry with the commit you suggested in #303 9bb67c6

Actually I get the same error with 9bb67c6 as well

@leoisl
Copy link
Collaborator

leoisl commented Nov 1, 2022 via email

@leoisl
Copy link
Collaborator

leoisl commented Nov 1, 2022

Hey @mbhall88 , I am also starting to run this code on the paper data for benchmarking purposes. You can find a precompiled binary for the latest commit attached.
pandora-linux-precompiled-glibc.gz

@leoisl
Copy link
Collaborator

leoisl commented Nov 1, 2022

Small update: started adding unit tests

@mbhall88
Copy link
Member Author

Are you happy with the testing I've done on this now @leoisl?

Anything else that needs to be done before merging?

@leoisl
Copy link
Collaborator

leoisl commented Nov 17, 2022

Sorry for the delay, but I think you were on vacations this week anyway, right?

Are you happy with the testing I've done on this now @leoisl?

Happy with your tests! Not happy with my lack of unit tests!

Anything else that needs to be done before merging?

Well, there are some misc stuff like fixing the example script, changelog, etc... Should we do for this alpha release?

@leoisl leoisl merged commit 945c025 into master Nov 17, 2022
@leoisl leoisl deleted the racon_denovo branch November 21, 2022 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants