forked from aseetharam/common_scripts
-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract exon sequence based on GFF3 end FASTA #3
Comments
You might be looking for the gff2fasta.pl script located here: https://github.com/ISUGIFsingularity/utilities/tree/master/utilities Please let me know if that is the case. |
On 23/04/20 at 05:29, Andrew Severin wrote:
You might be looking for the gff2fasta.pl script located here:
[1]https://github.com/ISUGIFsingularity/utilities/tree/master/utilities
Please let me know if that is the case.
Hi Andrew, Thank you for promptly replay.
I found that script on Biostar forum. I run it and got the files.
However, the output.exon.fasta [1] don't much the exons in the GFF3
[2].
Please, could you help me?
For exemple:
Inside GFF3 file are:
Chr01 phytozomev10 exon 2787014 2787767 . - . ID=Eucgr.A00001.1.v2.0.exon.12;Parent=Eucgr.A00001.1.v2.0;pacid=32049109
Chr01 phytozomev10 exon 2787803 2787834 . - . ID=Eucgr.A00001.1.v2.0.exon.11;Parent=Eucgr.A00001.1.v2.0;pacid=32049109
Chr01 phytozomev10 exon 2788190 2788300 . - . ID=Eucgr.A00001.1.v2.0.exon.10;Parent=Eucgr.A00001.1.v2.0;pacid=32049109
Chr01 phytozomev10 exon 2789313 2789399 . - . ID=Eucgr.A00001.1.v2.0.exon.9;Parent=Eucgr.A00001.1.v2.0;pacid=32049109
Chr01 phytozomev10 exon 2789765 2789884 . - . ID=Eucgr.A00001.1.v2.0.exon.8;Parent=Eucgr.A00001.1.v2.0;pacid=32049109
Chr01 phytozomev10 exon 2789985 2790162 . - . ID=Eucgr.A00001.1.v2.0.exon.7;Parent=Eucgr.A00001.1.v2.0;pacid=32049109
Chr01 phytozomev10 exon 2790477 2790694 . - . ID=Eucgr.A00001.1.v2.0.exon.6;Parent=Eucgr.A00001.1.v2.0;pacid=32049109
Chr01 phytozomev10 exon 2790774 2790880 . - . ID=Eucgr.A00001.1.v2.0.exon.5;Parent=Eucgr.A00001.1.v2.0;pacid=32049109
Chr01 phytozomev10 exon 2790969 2791089 . - . ID=Eucgr.A00001.1.v2.0.exon.4;Parent=Eucgr.A00001.1.v2.0;pacid=32049109
Chr01 phytozomev10 exon 2791278 2791373 . - . ID=Eucgr.A00001.1.v2.0.exon.3;Parent=Eucgr.A00001.1.v2.0;pacid=32049109
Chr01 phytozomev10 exon 2791468 2791696 . - . ID=Eucgr.A00001.1.v2.0.exon.2;Parent=Eucgr.A00001.1.v2.0;pacid=32049109
Chr01 phytozomev10 exon 2792210 2792340 . - . ID=Eucgr.A00001.1.v2.0.exon.1;Parent=Eucgr.A00001.1.v2.0;pacid=32049109
However, gff2fasta.pl script retrive only one exon, instead 12, for
gene Eucgr.A00001.1.v2.0.
Thank you so much!
1. https://www.dropbox.com/s/dicbsvo7hsuznq5/output.exon.fasta?dl=0
2. https://www.dropbox.com/s/b8ge28sa6nwcuhl/Egrandis_297_v2.0.gene_exons.gff3?dl=0
3. Reference genome (~700 MB) https://www.dropbox.com/s/4p0mxqak9erjil8/Egrandis_297_v2.0.fa?dl=0
…
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi, after a hard search on the net I found this awesome script. It works nice. However, I need to extract all exon sequence from a genome based on GFF3 and FASTA. Please, found attached a GFF3 sample file.
From that file I need to extract these sequences:
>Eucgr.A00001.1.v2.0.exon.1
ACTGTGACA......
>Eucgr.A00001.1.v2.0.exon.2
ACTGTGACA......
>Eucgr.A00001.1.v2.0.exon.3
ACTGTGACA......
(...)
>Eucgr.A00001.1.v2.0.exon.12
ACTGTGACA......
(...)
Could you help me?
Thank you so much!
sample_GFF3_tsv.txt
The text was updated successfully, but these errors were encountered: