Is there a way to separate genomes from .genomic.fna.gz files? #20

jmwhitha · 2020-08-31T19:55:06Z

Hi Vitor,

Thanks for making this application.

I was wondering if there is a way to use it so that I can separate the genomes once I've downloaded the genomic.fna.gz files? I have tried to use awk but the formatting varies a good bit for genomes. As you probably know, sometimes the descriptions have "sp." or "strain", sometimes they have "Scaffolds" or "contigs", etc., which makes it hard but not impossible to separate individual genomes.

If your application cannot separate the genomes either, are you familiar with any applications or scripts that can?

Thank you,
Jason

pirovc · 2020-09-01T07:19:57Z

Hi Jason,

There's currently no way to do that with genome_updater.

I believe you could parse the assembly_summary.txt file of the current version and get the information you need to separate the files. Check the fields 9 and 12, more info here: ftp://ftp.ncbi.nlm.nih.gov/genomes/README_assembly_summary.txt

In the assembly_summary.txt, the first column is the assembly accession which points you to the file downloaded with genome_updater if you use: {output_dir}/{version}/files/{assembly_accession}*genomic.fna.gz

I hope that helps, I will leave this issue open and mark this an enhancement so I may include some of those features in the next release.

Best
Vitor

jmwhitha · 2020-09-02T11:42:41Z

Thank you so much for pointing me to the assembly_summary.txt. This seems like a good starting point to a solution.

Looking forward to the enhancement!

pirovc added the enhancement label Sep 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way to separate genomes from .genomic.fna.gz files? #20

Is there a way to separate genomes from .genomic.fna.gz files? #20

jmwhitha commented Aug 31, 2020

pirovc commented Sep 1, 2020

jmwhitha commented Sep 2, 2020 •

edited

Loading

Is there a way to separate genomes from .genomic.fna.gz files? #20

Is there a way to separate genomes from .genomic.fna.gz files? #20

Comments

jmwhitha commented Aug 31, 2020

pirovc commented Sep 1, 2020

jmwhitha commented Sep 2, 2020 • edited Loading

jmwhitha commented Sep 2, 2020 •

edited

Loading