Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resistance gene not assembled in primary contig file, but is present in alternate contig file #30

Open
sosie100 opened this issue Nov 6, 2023 · 5 comments

Comments

@sosie100
Copy link

sosie100 commented Nov 6, 2023

In a comparison of metagenomic assemblies made only from illumina short-read data (metaSPADES) to hifiasm-meta assemblies (the primary contig file, .p_ctg.gfa), we found an ARG assembled in a metaSPADES assembly that was not present in the hifiasm-meta assembly for that sample. ~20 HiFi reads align well to the ARG, but the ARG is not assembled with hifiasm-meta. However, the ARG is assembled in the alternate contig file made by hifiasm-meta (.a_ctg.gfa), along with the .r_ctg.gfa and .p_utg.gfa files. How would you recommend we run hifiasm-meta such that the ARG content we care about lies solely within the primary contig file, or should we find ARGs within both primary and alternate contig files? ARGs are usually surrounded by mobile genetic elements and usually belong to low-abundance species.

@xfengnefx
Copy link
Owner

In your case, please use both primary and alt contig files, i.e. p_ctg and a_ctg. Actually the "alternative contig" does not have a significant meaning here, it's more like a remnant from forking hifiasm: initially I wanted to put popped bubble edges all into the alt, but then felt it does not matter either way because short contigs are less useful.

As for why ARGs are in the alt, my guess is that there were haplotypes without the ARGs and perhaps with higher coverage, so graph cleaning dropped the ARGs. You can pick a couple of reads from an ARG contig and grep them in the r_utg graph to check.

@sosie100
Copy link
Author

sosie100 commented Nov 7, 2023

Our samples are fecal sample metagenomes with only haploid bacterial DNA, so haplotypes are not relevant in our case.

We found the ARG sequence within a 554 kb segment in the r_utg.gfa file. Out of that 554 Kb segment there is ~370Kb of its sequence not in the p_ctg.gfa file, and the full 554 kb is in the a_ctg.gfa file. Why is this sequence left out of the .p_ctg.gfa file?

@xfengnefx
Copy link
Owner

with only haploid bacterial DNA

Sorry for my confusing comment, meant to say "there were haplotypes closely related strains [with and] without the ARGs". For genomes that have less than 1% whole genome diversity, hifiasm-meta currently usually will not separate them.

a 554 kb segment in the r_utg.gfa file

Is that a unitig? (And it is divided up in the contig graphs?) What are the coverages of the segment and contigs? Coverage is the dp:f field of S lines in the gfa files. Bandage will show it on the right side when you select a node, too.

If you don't mind sharing the {r_utg, p_ctg, a_ctg}.gfa files via email, I can have a look.

@sosie100
Copy link
Author

sosie100 commented Nov 9, 2023 via email

@xfengnefx
Copy link
Owner

Got the mail, thanks! Will check tonight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants