Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gff output from infernal-tblout2gff.pl #1

Open
sanyalab opened this issue Aug 17, 2020 · 3 comments
Open

Gff output from infernal-tblout2gff.pl #1

sanyalab opened this issue Aug 17, 2020 · 3 comments

Comments

@sanyalab
Copy link

Hi Eric,

I have a cmscan tblout file that I am converting using the infernal-tblout2gff.pl script. I was wondering if this is a direct rearrangement of the columns of tblout to get the gff3 file. Will it be possible to get a gff3 output that adheres to Sequence Ontology, when the level 1 and 2 features are described?

Thanks
Abhijit

@nawrockie
Copy link
Owner

I'm not sure what you mean by level 1 and 2 features. If you provide an example of a cmscan tblout file and the corresponding GFF file in the format you want with the info you want, I can provide a better answer. Thanks.

@sanyalab
Copy link
Author

Hi Eric,

Thank you for writing back. Here is a cmscan output processed with the cmscan to gff script

Chr04 GSAP LSU_rRNA_eukarya 1049382 1052765 3229.7 + . evalue=0;idx=1;seqaccn=-;mdlaccn=RF02543;clan=CL00112;mdl=cm;mdlfrom=1;mdlto=3401;trunc=no;pass=1;gc=0.57;bias=57.2;inc=!;olp=^;anyidx=-;anyfrct1=-;anyfrct2=-;winidx=-;winfrct1=-;winfrct2=-;desc=Eukaryotic_large_subunit_ribosomal_RNA;RFAM; Chr04 GSAP LSU_rRNA_eukarya 1041221 1044604 3227.9 + . evalue=0;idx=2;seqaccn=-;mdlaccn=RF02543;clan=CL00112;mdl=cm;mdlfrom=1;mdlto=3401;trunc=no;pass=1;gc=0.57;bias=57.6;inc=!;olp=^;anyidx=-;anyfrct1=-;anyfrct2=-;winidx=-;winfrct1=-;winfrct2=-;desc=Eukaryotic_large_subunit_ribosomal_RNA;RFAM; Chr04 GSAP LSU_rRNA_eukarya 1057543 1060926 3227.9 + . evalue=0;idx=3;seqaccn=-;mdlaccn=RF02543;clan=CL00112;mdl=cm;mdlfrom=1;mdlto=3401;trunc=no;pass=1;gc=0.57;bias=57.6;inc=!;olp=^;anyidx=-;anyfrct1=-;anyfrct2=-;winidx=-;winfrct1=-;winfrct2=-;desc=Eukaryotic_large_subunit_ribosomal_RNA;RFAM; Chr04 GSAP LSU_rRNA_eukarya 1082027 1085410 3227.9 + . evalue=0;idx=4;seqaccn=-;mdlaccn=RF02543;clan=CL00112;mdl=cm;mdlfrom=1;mdlto=3401;trunc=no;pass=1;gc=0.57;bias=57.6;inc=!;olp=^;anyidx=-;anyfrct1=-;anyfrct2=-;winidx=-;winfrct1=-;winfrct2=-;desc=Eukaryotic_large_subunit_ribosomal_RNA;RFAM; Chr04 GSAP LSU_rRNA_eukarya 1105876 1109259 3227.9 + . evalue=0;idx=5;seqaccn=-;mdlaccn=RF02543;clan=CL00112;mdl=cm;mdlfrom=1;mdlto=3401;trunc=no;pass=1;gc=0.57;bias=57.6;inc=!;olp=^;anyidx=-;anyfrct1=-;anyfrct2=-;winidx=-;winfrct1=-;winfrct2=-;desc=Eukaryotic_large_subunit_ribosomal_RNA;RFAM;
LSU_rRNA_eukarya is not a SO term. But it is rRNA. Therefore the entry should read "rRNA_gene" from parent and "rRNA" as child. I was wondering how difficult would it be to code in that manner.

Thanks
Abhijit

@nawrockie
Copy link
Owner

That information (rRNA and rRNA_gene) is not in the cmsearch tblout output, so you'd need to write a script that adds that information to the GFF file after you run infernal-tblout2gff.pl. You'll likely need another input file to your script that maps the RNA families to the SO terms you want to add.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants