You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I am not sure if this is a question for nextclade or augur.
I am using augur ancestral as part of a pipeline to create a tree for use in a nextclade dataset: https://github.com/anna-parker/marburg-virus-tree/tree/main. I decided to simplify matters and use the same gff3 file I use in my nextclade dataset - with the goal of having CDS-regions named the same in the nextclade tree and alignment.
When creating a nextclade dataset only the CDS field is used and is given the name NP. Which makes sense according to the nextclade docs as: When a linked gene and CDS are present (CDSs specify their parents by listing the gene’s ID in the Parent attribute), the gene is effectively ignored for all purposes but display in the web UI. CDS segments are joined if they have the same ID, otherwise they are treated as independent.
However, when using this same file in augur ancestral the gene and not the CDS region is used (I can tell because the gene is longer and I get a lot more mutations). I then removed the gene and left only the CDS field, then augur ancestral did no ancestral reconstruction. Only when I renamed the CDS field to gene (see https://github.com/anna-parker/marburg-virus-tree/blob/main/config/reference.gff3) did augur ancestral reconstruct the CDS the same way as in nextclade.
Is this expected behavior? Does augur ancestral only perform ancestral reconstruction on genes? I couldn't find any docs on the way augur ancestral expects gff3 files to be formatted.
Side-note: I was previously using a genbank file and not a gff3 file, and there augur ancestral used the CDS and not the gene, my main reason for changing to a gff3 file was to rename the CDS.
The text was updated successfully, but these errors were encountered:
Yes the way Nextclade and augur process GFFs differ. Ivan's done a lot of good work on the nextclade side and we've had a bunch of discussions about how to leverage that work in Augur rather than trying to maintain our own version in parallel, but I don't think any changes are imminent here. Currently here's how Augur does it:
Hi! I am not sure if this is a question for nextclade or augur.
I am using
augur ancestral
as part of a pipeline to create a tree for use in a nextclade dataset: https://github.com/anna-parker/marburg-virus-tree/tree/main. I decided to simplify matters and use the same gff3 file I use in my nextclade dataset - with the goal of having CDS-regions named the same in the nextclade tree and alignment.However, I realized that
nextclade
andaugur ancestral
appear to read the gff3 file differently. For example this is my annotation for theNP
CDS (full file here: https://github.com/GenSpectrum/nextclade-datasets/blob/add_marburg/data/marburg/unreleased/genome_annotation.gff3 - it is the same as the gff3 file from genbank expect that I have renamed the CDS by adding aName=NP
field to the start of the annotations)When creating a nextclade dataset only the CDS field is used and is given the name NP. Which makes sense according to the nextclade docs as:
When a linked gene and CDS are present (CDSs specify their parents by listing the gene’s ID in the Parent attribute), the gene is effectively ignored for all purposes but display in the web UI. CDS segments are joined if they have the same ID, otherwise they are treated as independent.
However, when using this same file in
augur ancestral
the gene and not the CDS region is used (I can tell because the gene is longer and I get a lot more mutations). I then removed the gene and left only the CDS field, thenaugur ancestral
did no ancestral reconstruction. Only when I renamed the CDS field to gene (see https://github.com/anna-parker/marburg-virus-tree/blob/main/config/reference.gff3) didaugur ancestral
reconstruct the CDS the same way as in nextclade.Is this expected behavior? Does
augur ancestral
only perform ancestral reconstruction on genes? I couldn't find any docs on the wayaugur ancestral
expects gff3 files to be formatted.Side-note: I was previously using a genbank file and not a gff3 file, and there
augur ancestral
used the CDS and not the gene, my main reason for changing to a gff3 file was to rename the CDS.The text was updated successfully, but these errors were encountered: