Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GTF file format #2

Open
bdeonovic opened this issue Aug 18, 2015 · 5 comments
Open

GTF file format #2

bdeonovic opened this issue Aug 18, 2015 · 5 comments

Comments

@bdeonovic
Copy link

I'm getting the following error when trying to run prepare-emase

Parsing refFlat_20150603.gtf...
Traceback (most recent call last):
  File "/Users/bdeonovic/miniconda/envs/emase/bin/prepare-emase", line 4, in <module>
    __import__('pkg_resources').run_script('emase==0.9.5', 'prepare-emase')
  File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 735, in run_script
  File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 1652, in run_script
  File "/Users/bdeonovic/miniconda/envs/emase/lib/python2.7/site-packages/emase-0.9.5-py2.7.egg/EGG-INFO/scripts/prepare-emase", line 279, in <module>
    sys.exit(main())
  File "/Users/bdeonovic/miniconda/envs/emase/lib/python2.7/site-packages/emase-0.9.5-py2.7.egg/EGG-INFO/scripts/prepare-emase", line 228, in main
    gdb, tdb = parse_gtf(gtffile)
  File "/Users/bdeonovic/miniconda/envs/emase/lib/python2.7/site-packages/emase-0.9.5-py2.7.egg/EGG-INFO/scripts/prepare-emase", line 131, in parse_gtf
    tdb[tid][feature].append((s, e))
KeyError: 'start_codon'

It looks like my GTF is not of the proper form. Here are the first few lines:

chr22   refFlat exon    16590758    16592810    .   -   .   gene_id "CCT8L2"; transcript_id "NM_014406"; exon_number "1"; exon_id "NM_014406.1"; gene_name "CCT8L2";
chr22   refFlat CDS 16590880    16592550    .   -   0   gene_id "CCT8L2"; transcript_id "NM_014406"; exon_number "1"; exon_id "NM_014406.1"; gene_name "CCT8L2";
chr22   refFlat start_codon 16592548    16592550    .   -   0   gene_id "CCT8L2"; transcript_id "NM_014406"; exon_number "1"; exon_id "NM_014406.1"; gene_name "CCT8L2";
chr22   refFlat stop_codon  16590877    16590879    .   -   0   gene_id "CCT8L2"; transcript_id "NM_014406"; exon_number "1"; exon_id "NM_014406.1"; gene_name "CCT8L2";

If this is not the proper format that your program is expecting please let me know what format the file should be in (an example of the first few lines of a GTF you use would be helpful)

@narayananr
Copy link
Contributor

I think GTF is the issue. We have tested with Ensembl GTF format extensively (upto release-68 (ftp://ftp.ensembl.org/pub/release-68/gtf/mus_musculus)). What GTF file did you use? Can you try using ensembl?

Thanks

@GlastonburyC
Copy link

I'm getting the same error. I'm using Homo_sapiens.GRCh37.68.gtf. And prepare-emase is giving me the following error:

KeyError: 'start_codon'

@narayananr
Copy link
Contributor

Thanks for trying out EMASE. Let me try to reproduce the error and get back to you.

@narayananr
Copy link
Contributor

Hi

Is there any specific reason you are interested in using Homo_sapiens.GRCh37.68.gtf? The annotation is from 2012. I would suggest to use the newer version. prepare-emase works fine for the last annotation Homo_sapiens.GRCh37.75.gtf of the GRC37 build.

Thanks
Narayanan

@27NRussell
Copy link

Hi,

I am trying to run emase and am getting the same error.
I get

File "/uufs/chpc.utah.edu/sys/installdir/anaconda/5.3.0/envs/emase/lib/python2.7/site-packages/emase-0.10.16-py2.7.egg-info/scripts/prepare-emase", line 132, in parse_gtf
    tdb[tid][feature].append((s, e))
KeyError: 'start_codon'

Previously you said it was possibly the gtf file, however, I am using the exact gtf file (ftp://ftp.ensembl.org/pub/release-68/gtf/mus_musculus) that you said you have tested previously. I am also using the corresponding genome.

Any advice as to how I can fix this would be greatly appreciated.

Thanks so much,

Nikki

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants