Installation troubles #7

satyakiprv · 2019-06-12T19:16:13Z

Hi,
Thank you for all the effort you have put into this technique and the code.

I have unfortunately run into a few issues with installation. I ran installation with your annotation and genome files (containing just Chr2) and it works fine. However, it fails to write a transcriptome.fasta file if the path to the genome or GFF files are specified using the -A and -G options.

I then substituted the annotation file and fasta files in the resource folder with a full length TAIR10 and gff annotation file. It now writes a transcriptome.fasta file but throws out the following error:
"### GENERATE ANNOTATION CLASS REFERENCE FILES ###
Getting transcript-level exons
Getting 5'-most exons
Converting transcript-level to gene-level annotations
Reformatting to GFF files to BED files
Differing number of BED fields encountered at line: 444. Exiting...
Reformatting to GFF files to BED files
Differing number of BED fields encountered at line: 457. Exiting...
Cleaning up temporary files
Setup complete."

In this case, the class.exons_by_gene.bed, class.single_exon_genes.bed and class.terminal_exons_by_gene.bed are empty.

Its not clear to me if I am doing something wrong. Please let me know if you have any pointers.

Thanks

Satyaki

maschon0 · 2019-06-13T08:24:40Z

Hello Satyaki,

Thank you for bringing this to my attention. In order for me to replicate the problem, can you send me a link to the exact full-length TAIR10 files you are using? I will take a look at the issue.

Additionally, it looks like I overlooked this in the README file, but the files passed by -A and -G need to be absolute filepaths. In my hands, passing -A resources/annotation.gff failed, but -A /full/path/to/nanoPARE/resources/annotation.gff ran in the way that passing no arguments would. I'll also work on a fix for this.

satyakiprv · 2019-06-13T15:18:49Z

Hello Michael, Thanks for your prompt reply. Our lab uses the standard TAIR10 annotations from the website. However, I have sent you a google drive link to both these files. I have tried changing the Chr notations here to Ath_chr to match your sample file but it has not helped. Thanks again for your help, Satyaki TAIR10.fa <https://drive.google.com/a/wi.mit.edu/file/d/1urkdqflbrM9ZqOG5ARMLe8zrTZkqLfMX/view?usp=drive_web> annotation_v1.gff <https://drive.google.com/a/wi.mit.edu/file/d/1htV7vC7Xw3qk1pG9y-1w1c-LdCZc-AjJ/view?usp=drive_web>

…

On Thu, Jun 13, 2019 at 4:24 AM Michael Schon ***@***.***> wrote: Hello Satyaki, Thank you for bringing this to my attention. In order for me to replicate the problem, can you send me a link to the exact full-length TAIR10 files you are using? I will take a look at the issue. Additionally, it looks like I overlooked this in the README file, but the files passed by -A and -G need to be absolute filepaths. In my hands, passing -A resources/annotation.gff failed, but -A /full/path/to/nanoPARE/resources/annotation.gff ran in the way that passing no arguments would. I'll also work on a fix for this. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#7?email_source=notifications&email_token=AMK5BXI7JMX24KOUTMCN3PLP2H74TA5CNFSM4HXPPUZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXS5WMQ#issuecomment-501603122>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMK5BXMVBI67JHM7R5XYFVLP2H74TANCNFSM4HXPPUZQ> .

-- Satyaki, Post-doctoral associate, Gehring Lab.

maschon0 · 2019-06-13T16:16:03Z

This part of nanoPARE_setup.sh completed without errors on my machine using the two files you provided. I tried both passing them by command-line and swapping them with resources/annotation.gff and resources/genome.fasta.

Can you please share the full log file produced by nanoPARE_setup.sh, along with a description of your Bash environment and the software versions of Python, STAR and BEDtools you are using?

I am testing the script on a machine with Bash version 4.2.47, using Python 3.6.6, STAR 2.6.1c and bedtools v2.27.1.

satyakiprv · 2019-06-13T17:40:13Z

My bash version is 4.4.19. My Python version is 3.6.7 STAR version is 2.7.1 and bedtools is 2.26. I have attached the log file. Its really weird because I am able to install just fine with your short chr2 file. Log.out <https://drive.google.com/a/wi.mit.edu/file/d/1uoUXPRABfTn79TSnP4sJwpea1b3pZeYw/view?usp=drive_web>

…

On Thu, Jun 13, 2019 at 12:16 PM Michael Schon ***@***.***> wrote: This part of *nanoPARE_setup.sh* completed without errors on my machine using the two files you provided. I tried both passing them by command-line and swapping them with resources/annotation.gff and resources/genome.fasta. Can you please share the full log file produced by *nanoPARE_setup.sh*, along with a description of your Bash environment and the software versions of Python, STAR and BEDtools you are using? I am testing the script on a machine with Bash version 4.2.47, using Python 3.6.6, STAR 2.6.1c and bedtools v2.27.1. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#7?email_source=notifications&email_token=AMK5BXKUBOYV63X2PYQ2C43P2JXEJA5CNFSM4HXPPUZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXUHAVI#issuecomment-501772373>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMK5BXMM4YE3GMCYFB5GODLP2JXEJANCNFSM4HXPPUZQ> .

-- Satyaki, Post-doctoral associate, Gehring Lab.

maschon0 · 2019-06-13T19:15:51Z

The culprit appears to be BEDtools! I got the same two error messages after rolling back to v2.26.
It looks like the default behavior of bedtools sort changed to be more lenient in the column formatting. The way I was handling gene names for sense-overlapping genes erroneously resulted in a small number of lines of a temporary BED file having 4 columns instead of 5. Apparently BEDtools v2.27 will not complain about this, but it is fatal for v2.26.

You may be able to fix your problem by updating BEDtools, but alternatively I have pushed a fix to nanoPARE_setup.sh that corrects the improperly formatted lines.

I also noticed from the STAR log.out file you sent me that STAR was expecting a GTF file by default and got a GFF3 format. STAR infers splice junctions from the annotation file to aid alignment, but it failed to find any (hence the nearly 850,000 warning lines). This can be corrected by changing the strings STAR searches for to indicate parent-child relationships in the GFF3 file. In this last commit I also added new command-line arguments for this. To properly process the TAIR10 GFF3 file, run:

./nanoPARE_setup.sh --gtf_transcript_tag Parent

This will pass the correct tags to STAR's --sjdbGTFtagExonParentTranscript. The TAIR10 GFF3 still has improperly formatted exon lines (they are missing an "ID="/"gene_id=" attribute), but with the command above STAR will correctly find splice junctions.

Did this resolve the issue?

satyakiprv · 2019-06-14T03:10:35Z

Its always samtools or bedtools that cause trouble! Thanks a ton! It installed correctly. I had initially supplied a GTF file. It threw the following exception at the end: Error with the gtf file File "/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/scripts/python_scripts/gtf_to_fasta.py", line 94, in <module> ref_transcripts = gu.parse_annotation(args.reference_GFF) File "/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/scripts/python_scripts/gff_utils.py", line 78, in parse_annotation return parse_gff3(path, mode) File "/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/scripts/python_scripts/gff_utils.py", line 95, in parse_gff3 return get_file_content(path, 'gff3', mode) File "/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/scripts/python_scripts/gff_utils.py", line 243, in get_file_content transcript.add_sample_name(sample_name) == I therefore just went forward with the correct installation that used the GFF file. I was running your sample data and endClass is crashing. It looks like the python script is unable to find the input file. /lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE$ ./endClass.sh -T BODY Config settings:

…

_____________________________ SETTINGS ----------------------------- Basic configuration: root_dir=/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE bash_dir=/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/scripts/bash_scripts python_dir=/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/scripts/python_scripts resource_dir=/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/resources temp_dir=/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/temp log_dir=/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/log results_dir=/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/results General settings: GENOME_FASTA=/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/resources/genome.fasta ANNOTATION_GFF=/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/resources/annotation.gff LMOD=0 RAM= CPUS=1 EndMap settings: LINE_NUMBER=-1 ICOMP= EndGraph settings: SAMPLE_NAME= RPM= KERNEL= BANDWIDTH= FRAGLEN= EndClass settings: SAMPLE_TYPE=BODY UUG=0.1 EndMask settings: SAMPLE_TYPE=BODY MASK_SOURCE=

_____________________________ Samples: Sample type: BODY Merging feature files... Bedgraph files +: Bedgraph files -: usage: bedgraph_combine.py [-h] [-i INPUT [INPUT ...]] [-s SCALE [SCALE ...]] [-o OUTPUT] bedgraph_combine.py: error: argument -i/--input: expected at least one argument usage: bedgraph_combine.py [-h] [-i INPUT [INPUT ...]] [-s SCALE [SCALE ...]] [-o OUTPUT] bedgraph_combine.py: error: argument -i/--input: expected at least one argument Bedgraph uuG files +: Bedgraph uuG files -: usage: bedgraph_combine.py [-h] [-i INPUT [INPUT ...]] [-s SCALE [SCALE ...]] [-o OUTPUT] bedgraph_combine.py: error: argument -i/--input: expected at least one argument usage: bedgraph_combine.py [-h] [-i INPUT [INPUT ...]] [-s SCALE [SCALE ...]] [-o OUTPUT] bedgraph_combine.py: error: argument -i/--input: expected at least one argument Merged coverage files generated. ***** ***** ERROR: Requested column 6, but database file - only has fields 1 - 0. Finding feature peak positions. Traceback (most recent call last): File "/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/scripts/python_scripts/bed_find_peaks.py", line 327, in <module> coverage_file = open(file) FileNotFoundError: [Errno 2] No such file or directory: 'BODY.plus.bedgraph' awk: fatal: cannot open file `BODY.peaks.bed' for reading (No such file or directory) Locating nearest annotation to each feature: overlapping single-exon transcripts overlapping 5'-terminal exons Between exons Upstream of annotations Contained within introns Antisense to existing annotations Antisense intronic awk: fatal: cannot open file `BODY.peaks.bed' for reading (No such file or directory) Traceback (most recent call last): File "/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/scripts/python_scripts/bed_deduplicate.py", line 116, in <module> print(output_best(get_unique(duplicate_lines),args.SELECT,args.STARTLINE,args.ENDLINE,args.STRANDLINE,args.SCORELINE)) File "/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/scripts/python_scripts/bed_deduplicate.py", line 64, in output_best return bed_lines[sorted_order[-1]] IndexError: list index out of range Traceback (most recent call last): File "/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/scripts/python_scripts/bed_deduplicate.py", line 116, in <module> print(output_best(get_unique(duplicate_lines),args.SELECT,args.STARTLINE,args.ENDLINE,args.STRANDLINE,args.SCORELINE)) File "/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/scripts/python_scripts/bed_deduplicate.py", line 64, in output_best return bed_lines[sorted_order[-1]] IndexError: list index out of range Splitting capped and noncapped features... Loading all plus... Traceback (most recent call last): File "/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/scripts/python_scripts/bed_uug_filter.py", line 171, in <module> for line in open(all_plus): FileNotFoundError: [Errno 2] No such file or directory: 'BODY.plus.bedgraph' Error: The requested file (BODY.capped.bed) could not be opened. Error message: (No such file or directory). Exiting! Error: The requested file (BODY.noncapped.bed) could not be opened. Error message: (No such file or directory). Exiting! Cap masking bedgraph files... Cap masking merged bedgraph... Internal mask: BODY.capped.bed Traceback (most recent call last): File "/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/scripts/python_scripts/bedgraph_mask.py", line 118, in <module> mask_file=open(args.BED_INSIDE) FileNotFoundError: [Errno 2] No such file or directory: 'BODY.capped.bed' Error: Unable to open file BODY.capped.bed. Exiting. grep: BODY.capped.bed: No such file or directory Calculating gene-level capped and noncapped read coverage... usage: bed_feature_coverage.py [-h] -L LENGTHS [-I INPUT [INPUT ...]] [-N NAMES [NAMES ...]] -F FEATURES [-O OUTPUT] [--bed_out] [-G GENOME] [--g_content G_CONTENT] bed_feature_coverage.py: error: argument -I/--input: expected at least one argument usage: bed_feature_coverage.py [-h] -L LENGTHS [-I INPUT [INPUT ...]] [-N NAMES [NAMES ...]] -F FEATURES [-O OUTPUT] [--bed_out] [-G GENOME] [--g_content G_CONTENT] bed_feature_coverage.py: error: argument -I/--input: expected at least one argument Sorry for all the trouble! Satyaki

On Thu, Jun 13, 2019 at 3:15 PM Michael Schon ***@***.***> wrote: The culprit appears to be BEDtools! I got the same two error messages after rolling back to v2.26. It looks like the default behavior of *bedtools sort* changed to be more lenient in the column formatting. The way I was handling gene names for sense-overlapping genes erroneously resulted in a small number of lines of a temporary BED file having 4 columns instead of 5. Apparently BEDtools v2.27 will not complain about this, but it is fatal for v2.26. You may be able to fix your problem by updating BEDtools, but alternatively I have pushed a fix to *nanoPARE_setup.sh* that corrects the improperly formatted lines. I also noticed from the STAR log.out file you sent me that STAR was expecting a GTF file by default and got a GFF3 format. STAR infers splice junctions from the annotation file to aid alignment, but it failed to find any (hence the nearly 850,000 warning lines). This can be corrected by changing the strings STAR searches for to indicate parent-child relationships in the GFF3 file. In this last commit I also added new command-line arguments for this. To properly process the TAIR10 GFF3 file, run: ./nanoPARE_setup.sh --gtf_transcript_tag Parent This will pass the correct tags to STAR's --sjdbGTFtagExonParentTranscript. The TAIR10 GFF3 still has improperly formatted exon lines (they are missing an "ID="/"gene_id=" attribute), but with the command above STAR will correctly find splice junctions. Did this resolve the issue? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#7?email_source=notifications&email_token=AMK5BXJFFJA2KW4UT2PQJTTP2KMGPA5CNFSM4HXPPUZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXUYFPY#issuecomment-501842623>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMK5BXIVHMQDIRJOKEUYWULP2KMGPANCNFSM4HXPPUZQ> .

-- Satyaki, Post-doctoral associate, Gehring Lab.

maschon0 · 2019-06-14T10:29:42Z

I will take a closer look at how the pipeline handles GTF files soon.

As for EndClass, try passing the "Sample Type" from the reference table, rather than the "Library Type":

./endClass.sh -T flower

Sorry if that was unclear! I'm slowly putting together more thorough documentation that will give better details on actually operating these scripts. Hopefully this will clear things up down the road.

--Michael Schon

satyakiprv · 2019-06-14T14:20:20Z

Thanks! I should have tried that before bugging you! Satyaki

…

On Fri, Jun 14, 2019 at 6:29 AM Michael Schon ***@***.***> wrote: I will take a closer look at how the pipeline handles GTF files soon. As for EndClass, try passing the "Sample Type" from the reference table, rather than the "Library Type": ./endClass.sh -T flower Sorry if that was unclear! I'm slowly putting together more thorough documentation that will give better details on actually operating these scripts. Hopefully this will clear things up down the road. --Michael Schon — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#7?email_source=notifications&email_token=AMK5BXJF72UZ7P6LWQZCZVTP2NXJPA5CNFSM4HXPPUZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXWM3SA#issuecomment-502058440>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMK5BXMOS75EKXSFFIZ2UJTP2NXJPANCNFSM4HXPPUZQ> .

-- Satyaki, Post-doctoral associate, Gehring Lab.

satyakiprv · 2019-06-14T15:30:03Z

One other issue. Endcut seems to be written with lots of addresses local to the Nodine lab such as with: export WSEQ_s=mRNA/180228/ export WSEQ=/lustre/scratch/users/michael.nodine/seq/$WSEQ_s export dataRoot=WSEQ export TEST=anno.mir.tas.fa.GSTAr export SHUFF=anno.mir.tas export BG=$WSEQ/bedFiles/$outDir_s/${NAME1}.$outDir_a.bedgraph export BG_norm=$BG.norm export BG_norm_overlap=$BG_norm.overlap export BG_norm_overlap_up=$BG_norm.overlap.up export BG_norm_overlap_down=$BG_norm.overlap.down export outDir=$WSEQ/results/$outDir_s/$NAME1/ I was wondering if you are planning to write a version of this script that can implemented in other places. Thanks again, Satyaki On Fri, Jun 14, 2019 at 10:19 AM Satyaki Rajavasireddy <[email protected]> wrote:

…

Thanks! I should have tried that before bugging you! Satyaki On Fri, Jun 14, 2019 at 6:29 AM Michael Schon ***@***.***> wrote: > I will take a closer look at how the pipeline handles GTF files soon. > > As for EndClass, try passing the "Sample Type" from the reference table, > rather than the "Library Type": > > ./endClass.sh -T flower > > Sorry if that was unclear! I'm slowly putting together more thorough > documentation that will give better details on actually operating these > scripts. Hopefully this will clear things up down the road. > > --Michael Schon > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#7?email_source=notifications&email_token=AMK5BXJF72UZ7P6LWQZCZVTP2NXJPA5CNFSM4HXPPUZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXWM3SA#issuecomment-502058440>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AMK5BXMOS75EKXSFFIZ2UJTP2NXJPANCNFSM4HXPPUZQ> > . > -- Satyaki, Post-doctoral associate, Gehring Lab.

-- Satyaki, Post-doctoral associate, Gehring Lab.

maschon0 self-assigned this Jun 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Installation troubles #7

Installation troubles #7

satyakiprv commented Jun 12, 2019

maschon0 commented Jun 13, 2019

satyakiprv commented Jun 13, 2019 via email

maschon0 commented Jun 13, 2019

satyakiprv commented Jun 13, 2019 via email

maschon0 commented Jun 13, 2019

satyakiprv commented Jun 14, 2019 via email

maschon0 commented Jun 14, 2019

satyakiprv commented Jun 14, 2019 via email

satyakiprv commented Jun 14, 2019 via email

Installation troubles #7

Installation troubles #7

Comments

satyakiprv commented Jun 12, 2019

maschon0 commented Jun 13, 2019

satyakiprv commented Jun 13, 2019 via email

maschon0 commented Jun 13, 2019

satyakiprv commented Jun 13, 2019 via email

maschon0 commented Jun 13, 2019

satyakiprv commented Jun 14, 2019 via email

maschon0 commented Jun 14, 2019

satyakiprv commented Jun 14, 2019 via email

satyakiprv commented Jun 14, 2019 via email