Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installation troubles #7

Open
satyakiprv opened this issue Jun 12, 2019 · 9 comments
Open

Installation troubles #7

satyakiprv opened this issue Jun 12, 2019 · 9 comments
Assignees

Comments

@satyakiprv
Copy link

Hi,
Thank you for all the effort you have put into this technique and the code.

I have unfortunately run into a few issues with installation. I ran installation with your annotation and genome files (containing just Chr2) and it works fine. However, it fails to write a transcriptome.fasta file if the path to the genome or GFF files are specified using the -A and -G options.

I then substituted the annotation file and fasta files in the resource folder with a full length TAIR10 and gff annotation file. It now writes a transcriptome.fasta file but throws out the following error:
"### GENERATE ANNOTATION CLASS REFERENCE FILES ###
Getting transcript-level exons
Getting 5'-most exons
Converting transcript-level to gene-level annotations
Reformatting to GFF files to BED files
Differing number of BED fields encountered at line: 444. Exiting...
Reformatting to GFF files to BED files
Differing number of BED fields encountered at line: 457. Exiting...
Cleaning up temporary files
Setup complete."

In this case, the class.exons_by_gene.bed, class.single_exon_genes.bed and class.terminal_exons_by_gene.bed are empty.

Its not clear to me if I am doing something wrong. Please let me know if you have any pointers.

Thanks

Satyaki

@maschon0
Copy link
Collaborator

Hello Satyaki,

Thank you for bringing this to my attention. In order for me to replicate the problem, can you send me a link to the exact full-length TAIR10 files you are using? I will take a look at the issue.

Additionally, it looks like I overlooked this in the README file, but the files passed by -A and -G need to be absolute filepaths. In my hands, passing -A resources/annotation.gff failed, but -A /full/path/to/nanoPARE/resources/annotation.gff ran in the way that passing no arguments would. I'll also work on a fix for this.

@maschon0 maschon0 self-assigned this Jun 13, 2019
@satyakiprv
Copy link
Author

satyakiprv commented Jun 13, 2019 via email

@maschon0
Copy link
Collaborator

This part of nanoPARE_setup.sh completed without errors on my machine using the two files you provided. I tried both passing them by command-line and swapping them with resources/annotation.gff and resources/genome.fasta.

Can you please share the full log file produced by nanoPARE_setup.sh, along with a description of your Bash environment and the software versions of Python, STAR and BEDtools you are using?

I am testing the script on a machine with Bash version 4.2.47, using Python 3.6.6, STAR 2.6.1c and bedtools v2.27.1.

@satyakiprv
Copy link
Author

satyakiprv commented Jun 13, 2019 via email

@maschon0
Copy link
Collaborator

The culprit appears to be BEDtools! I got the same two error messages after rolling back to v2.26.
It looks like the default behavior of bedtools sort changed to be more lenient in the column formatting. The way I was handling gene names for sense-overlapping genes erroneously resulted in a small number of lines of a temporary BED file having 4 columns instead of 5. Apparently BEDtools v2.27 will not complain about this, but it is fatal for v2.26.

You may be able to fix your problem by updating BEDtools, but alternatively I have pushed a fix to nanoPARE_setup.sh that corrects the improperly formatted lines.

I also noticed from the STAR log.out file you sent me that STAR was expecting a GTF file by default and got a GFF3 format. STAR infers splice junctions from the annotation file to aid alignment, but it failed to find any (hence the nearly 850,000 warning lines). This can be corrected by changing the strings STAR searches for to indicate parent-child relationships in the GFF3 file. In this last commit I also added new command-line arguments for this. To properly process the TAIR10 GFF3 file, run:

./nanoPARE_setup.sh --gtf_transcript_tag Parent  

This will pass the correct tags to STAR's --sjdbGTFtagExonParentTranscript. The TAIR10 GFF3 still has improperly formatted exon lines (they are missing an "ID="/"gene_id=" attribute), but with the command above STAR will correctly find splice junctions.

Did this resolve the issue?

@satyakiprv
Copy link
Author

satyakiprv commented Jun 14, 2019 via email

@maschon0
Copy link
Collaborator

I will take a closer look at how the pipeline handles GTF files soon.

As for EndClass, try passing the "Sample Type" from the reference table, rather than the "Library Type":

./endClass.sh -T flower

Sorry if that was unclear! I'm slowly putting together more thorough documentation that will give better details on actually operating these scripts. Hopefully this will clear things up down the road.

--Michael Schon

@satyakiprv
Copy link
Author

satyakiprv commented Jun 14, 2019 via email

@satyakiprv
Copy link
Author

satyakiprv commented Jun 14, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants