-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Installation troubles #7
Comments
Hello Satyaki, Thank you for bringing this to my attention. In order for me to replicate the problem, can you send me a link to the exact full-length TAIR10 files you are using? I will take a look at the issue. Additionally, it looks like I overlooked this in the README file, but the files passed by -A and -G need to be absolute filepaths. In my hands, passing -A resources/annotation.gff failed, but -A /full/path/to/nanoPARE/resources/annotation.gff ran in the way that passing no arguments would. I'll also work on a fix for this. |
Hello Michael,
Thanks for your prompt reply. Our lab uses the standard TAIR10 annotations
from the website.
However, I have sent you a google drive link to both these files.
I have tried changing the Chr notations here to Ath_chr to match your
sample file but it has not helped.
Thanks again for your help,
Satyaki
TAIR10.fa
<https://drive.google.com/a/wi.mit.edu/file/d/1urkdqflbrM9ZqOG5ARMLe8zrTZkqLfMX/view?usp=drive_web>
annotation_v1.gff
<https://drive.google.com/a/wi.mit.edu/file/d/1htV7vC7Xw3qk1pG9y-1w1c-LdCZc-AjJ/view?usp=drive_web>
…On Thu, Jun 13, 2019 at 4:24 AM Michael Schon ***@***.***> wrote:
Hello Satyaki,
Thank you for bringing this to my attention. In order for me to replicate
the problem, can you send me a link to the exact full-length TAIR10 files
you are using? I will take a look at the issue.
Additionally, it looks like I overlooked this in the README file, but the
files passed by -A and -G need to be absolute filepaths. In my hands,
passing -A resources/annotation.gff failed, but -A
/full/path/to/nanoPARE/resources/annotation.gff ran in the way that passing
no arguments would. I'll also work on a fix for this.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#7?email_source=notifications&email_token=AMK5BXI7JMX24KOUTMCN3PLP2H74TA5CNFSM4HXPPUZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXS5WMQ#issuecomment-501603122>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMK5BXMVBI67JHM7R5XYFVLP2H74TANCNFSM4HXPPUZQ>
.
--
Satyaki,
Post-doctoral associate,
Gehring Lab.
|
This part of nanoPARE_setup.sh completed without errors on my machine using the two files you provided. I tried both passing them by command-line and swapping them with resources/annotation.gff and resources/genome.fasta. Can you please share the full log file produced by nanoPARE_setup.sh, along with a description of your Bash environment and the software versions of Python, STAR and BEDtools you are using? I am testing the script on a machine with Bash version 4.2.47, using Python 3.6.6, STAR 2.6.1c and bedtools v2.27.1. |
My bash version is 4.4.19.
My Python version is 3.6.7
STAR version is 2.7.1 and bedtools is 2.26.
I have attached the log file.
Its really weird because I am able to install just fine with your short
chr2 file.
Log.out
<https://drive.google.com/a/wi.mit.edu/file/d/1uoUXPRABfTn79TSnP4sJwpea1b3pZeYw/view?usp=drive_web>
…On Thu, Jun 13, 2019 at 12:16 PM Michael Schon ***@***.***> wrote:
This part of *nanoPARE_setup.sh* completed without errors on my machine
using the two files you provided. I tried both passing them by command-line
and swapping them with resources/annotation.gff and resources/genome.fasta.
Can you please share the full log file produced by *nanoPARE_setup.sh*,
along with a description of your Bash environment and the software versions
of Python, STAR and BEDtools you are using?
I am testing the script on a machine with Bash version 4.2.47, using
Python 3.6.6, STAR 2.6.1c and bedtools v2.27.1.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#7?email_source=notifications&email_token=AMK5BXKUBOYV63X2PYQ2C43P2JXEJA5CNFSM4HXPPUZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXUHAVI#issuecomment-501772373>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMK5BXMM4YE3GMCYFB5GODLP2JXEJANCNFSM4HXPPUZQ>
.
--
Satyaki,
Post-doctoral associate,
Gehring Lab.
|
The culprit appears to be BEDtools! I got the same two error messages after rolling back to v2.26. You may be able to fix your problem by updating BEDtools, but alternatively I have pushed a fix to nanoPARE_setup.sh that corrects the improperly formatted lines. I also noticed from the STAR log.out file you sent me that STAR was expecting a GTF file by default and got a GFF3 format. STAR infers splice junctions from the annotation file to aid alignment, but it failed to find any (hence the nearly 850,000 warning lines). This can be corrected by changing the strings STAR searches for to indicate parent-child relationships in the GFF3 file. In this last commit I also added new command-line arguments for this. To properly process the TAIR10 GFF3 file, run:
This will pass the correct tags to STAR's --sjdbGTFtagExonParentTranscript. The TAIR10 GFF3 still has improperly formatted exon lines (they are missing an "ID="/"gene_id=" attribute), but with the command above STAR will correctly find splice junctions. Did this resolve the issue? |
Its always samtools or bedtools that cause trouble! Thanks a ton! It
installed correctly. I had initially supplied a GTF file. It threw the
following exception at the end:
Error with the gtf file
File
"/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/scripts/python_scripts/gtf_to_fasta.py",
line 94, in <module>
ref_transcripts = gu.parse_annotation(args.reference_GFF)
File
"/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/scripts/python_scripts/gff_utils.py",
line 78, in parse_annotation
return parse_gff3(path, mode)
File
"/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/scripts/python_scripts/gff_utils.py",
line 95, in parse_gff3
return get_file_content(path, 'gff3', mode)
File
"/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/scripts/python_scripts/gff_utils.py",
line 243, in get_file_content
transcript.add_sample_name(sample_name)
==
I therefore just went forward with the correct installation that used the
GFF file. I was running your sample data and endClass is crashing. It looks
like the python script is unable to find the input file.
/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE$ ./endClass.sh -T
BODY
Config settings:
…_____________________________
SETTINGS
-----------------------------
Basic configuration:
root_dir=/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE
bash_dir=/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/scripts/bash_scripts
python_dir=/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/scripts/python_scripts
resource_dir=/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/resources
temp_dir=/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/temp
log_dir=/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/log
results_dir=/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/results
General settings:
GENOME_FASTA=/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/resources/genome.fasta
ANNOTATION_GFF=/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/resources/annotation.gff
LMOD=0
RAM=
CPUS=1
EndMap settings:
LINE_NUMBER=-1
ICOMP=
EndGraph settings:
SAMPLE_NAME=
RPM=
KERNEL=
BANDWIDTH=
FRAGLEN=
EndClass settings:
SAMPLE_TYPE=BODY
UUG=0.1
EndMask settings:
SAMPLE_TYPE=BODY
MASK_SOURCE=
_____________________________
Samples:
Sample type: BODY
Merging feature files...
Bedgraph files +:
Bedgraph files -:
usage: bedgraph_combine.py [-h] [-i INPUT [INPUT ...]] [-s SCALE [SCALE
...]]
[-o OUTPUT]
bedgraph_combine.py: error: argument -i/--input: expected at least one
argument
usage: bedgraph_combine.py [-h] [-i INPUT [INPUT ...]] [-s SCALE [SCALE
...]]
[-o OUTPUT]
bedgraph_combine.py: error: argument -i/--input: expected at least one
argument
Bedgraph uuG files +:
Bedgraph uuG files -:
usage: bedgraph_combine.py [-h] [-i INPUT [INPUT ...]] [-s SCALE [SCALE
...]]
[-o OUTPUT]
bedgraph_combine.py: error: argument -i/--input: expected at least one
argument
usage: bedgraph_combine.py [-h] [-i INPUT [INPUT ...]] [-s SCALE [SCALE
...]]
[-o OUTPUT]
bedgraph_combine.py: error: argument -i/--input: expected at least one
argument
Merged coverage files generated.
*****
***** ERROR: Requested column 6, but database file - only has fields 1 - 0.
Finding feature peak positions.
Traceback (most recent call last):
File
"/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/scripts/python_scripts/bed_find_peaks.py",
line 327, in <module>
coverage_file = open(file)
FileNotFoundError: [Errno 2] No such file or directory: 'BODY.plus.bedgraph'
awk: fatal: cannot open file `BODY.peaks.bed' for reading (No such file or
directory)
Locating nearest annotation to each feature:
overlapping single-exon transcripts
overlapping 5'-terminal exons
Between exons
Upstream of annotations
Contained within introns
Antisense to existing annotations
Antisense intronic
awk: fatal: cannot open file `BODY.peaks.bed' for reading (No such file or
directory)
Traceback (most recent call last):
File
"/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/scripts/python_scripts/bed_deduplicate.py",
line 116, in <module>
print(output_best(get_unique(duplicate_lines),args.SELECT,args.STARTLINE,args.ENDLINE,args.STRANDLINE,args.SCORELINE))
File
"/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/scripts/python_scripts/bed_deduplicate.py",
line 64, in output_best
return bed_lines[sorted_order[-1]]
IndexError: list index out of range
Traceback (most recent call last):
File
"/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/scripts/python_scripts/bed_deduplicate.py",
line 116, in <module>
print(output_best(get_unique(duplicate_lines),args.SELECT,args.STARTLINE,args.ENDLINE,args.STRANDLINE,args.SCORELINE))
File
"/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/scripts/python_scripts/bed_deduplicate.py",
line 64, in output_best
return bed_lines[sorted_order[-1]]
IndexError: list index out of range
Splitting capped and noncapped features...
Loading all plus...
Traceback (most recent call last):
File
"/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/scripts/python_scripts/bed_uug_filter.py",
line 171, in <module>
for line in open(all_plus):
FileNotFoundError: [Errno 2] No such file or directory: 'BODY.plus.bedgraph'
Error: The requested file (BODY.capped.bed) could not be opened. Error
message: (No such file or directory). Exiting!
Error: The requested file (BODY.noncapped.bed) could not be opened. Error
message: (No such file or directory). Exiting!
Cap masking bedgraph files...
Cap masking merged bedgraph...
Internal mask: BODY.capped.bed
Traceback (most recent call last):
File
"/lab/solexa_gehring/satyaki/nanoPARE_install/nanoPARE/scripts/python_scripts/bedgraph_mask.py",
line 118, in <module>
mask_file=open(args.BED_INSIDE)
FileNotFoundError: [Errno 2] No such file or directory: 'BODY.capped.bed'
Error: Unable to open file BODY.capped.bed. Exiting.
grep: BODY.capped.bed: No such file or directory
Calculating gene-level capped and noncapped read coverage...
usage: bed_feature_coverage.py [-h] -L LENGTHS [-I INPUT [INPUT ...]]
[-N NAMES [NAMES ...]] -F FEATURES [-O
OUTPUT]
[--bed_out] [-G GENOME] [--g_content
G_CONTENT]
bed_feature_coverage.py: error: argument -I/--input: expected at least one
argument
usage: bed_feature_coverage.py [-h] -L LENGTHS [-I INPUT [INPUT ...]]
[-N NAMES [NAMES ...]] -F FEATURES [-O
OUTPUT]
[--bed_out] [-G GENOME] [--g_content
G_CONTENT]
bed_feature_coverage.py: error: argument -I/--input: expected at least one
argument
Sorry for all the trouble!
Satyaki
On Thu, Jun 13, 2019 at 3:15 PM Michael Schon ***@***.***> wrote:
The culprit appears to be BEDtools! I got the same two error messages
after rolling back to v2.26.
It looks like the default behavior of *bedtools sort* changed to be more
lenient in the column formatting. The way I was handling gene names for
sense-overlapping genes erroneously resulted in a small number of lines of
a temporary BED file having 4 columns instead of 5. Apparently BEDtools
v2.27 will not complain about this, but it is fatal for v2.26.
You may be able to fix your problem by updating BEDtools, but
alternatively I have pushed a fix to *nanoPARE_setup.sh* that corrects
the improperly formatted lines.
I also noticed from the STAR log.out file you sent me that STAR was
expecting a GTF file by default and got a GFF3 format. STAR infers splice
junctions from the annotation file to aid alignment, but it failed to find
any (hence the nearly 850,000 warning lines). This can be corrected by
changing the strings STAR searches for to indicate parent-child
relationships in the GFF3 file. In this last commit I also added new
command-line arguments for this. To properly process the TAIR10 GFF3 file,
run:
./nanoPARE_setup.sh --gtf_transcript_tag Parent
This will pass the correct tags to STAR's
--sjdbGTFtagExonParentTranscript. The TAIR10 GFF3 still has improperly
formatted exon lines (they are missing an "ID="/"gene_id=" attribute), but
with the command above STAR will correctly find splice junctions.
Did this resolve the issue?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#7?email_source=notifications&email_token=AMK5BXJFFJA2KW4UT2PQJTTP2KMGPA5CNFSM4HXPPUZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXUYFPY#issuecomment-501842623>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMK5BXIVHMQDIRJOKEUYWULP2KMGPANCNFSM4HXPPUZQ>
.
--
Satyaki,
Post-doctoral associate,
Gehring Lab.
|
I will take a closer look at how the pipeline handles GTF files soon. As for EndClass, try passing the "Sample Type" from the reference table, rather than the "Library Type":
Sorry if that was unclear! I'm slowly putting together more thorough documentation that will give better details on actually operating these scripts. Hopefully this will clear things up down the road. --Michael Schon |
Thanks! I should have tried that before bugging you!
Satyaki
…On Fri, Jun 14, 2019 at 6:29 AM Michael Schon ***@***.***> wrote:
I will take a closer look at how the pipeline handles GTF files soon.
As for EndClass, try passing the "Sample Type" from the reference table,
rather than the "Library Type":
./endClass.sh -T flower
Sorry if that was unclear! I'm slowly putting together more thorough
documentation that will give better details on actually operating these
scripts. Hopefully this will clear things up down the road.
--Michael Schon
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#7?email_source=notifications&email_token=AMK5BXJF72UZ7P6LWQZCZVTP2NXJPA5CNFSM4HXPPUZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXWM3SA#issuecomment-502058440>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMK5BXMOS75EKXSFFIZ2UJTP2NXJPANCNFSM4HXPPUZQ>
.
--
Satyaki,
Post-doctoral associate,
Gehring Lab.
|
One other issue. Endcut seems to be written with lots of addresses local to
the Nodine lab such as with:
export WSEQ_s=mRNA/180228/
export WSEQ=/lustre/scratch/users/michael.nodine/seq/$WSEQ_s
export dataRoot=WSEQ
export TEST=anno.mir.tas.fa.GSTAr
export SHUFF=anno.mir.tas
export BG=$WSEQ/bedFiles/$outDir_s/${NAME1}.$outDir_a.bedgraph
export BG_norm=$BG.norm
export BG_norm_overlap=$BG_norm.overlap
export BG_norm_overlap_up=$BG_norm.overlap.up
export BG_norm_overlap_down=$BG_norm.overlap.down
export outDir=$WSEQ/results/$outDir_s/$NAME1/
I was wondering if you are planning to write a version of this script that
can implemented in other places.
Thanks again,
Satyaki
On Fri, Jun 14, 2019 at 10:19 AM Satyaki Rajavasireddy <[email protected]>
wrote:
… Thanks! I should have tried that before bugging you!
Satyaki
On Fri, Jun 14, 2019 at 6:29 AM Michael Schon ***@***.***>
wrote:
> I will take a closer look at how the pipeline handles GTF files soon.
>
> As for EndClass, try passing the "Sample Type" from the reference table,
> rather than the "Library Type":
>
> ./endClass.sh -T flower
>
> Sorry if that was unclear! I'm slowly putting together more thorough
> documentation that will give better details on actually operating these
> scripts. Hopefully this will clear things up down the road.
>
> --Michael Schon
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#7?email_source=notifications&email_token=AMK5BXJF72UZ7P6LWQZCZVTP2NXJPA5CNFSM4HXPPUZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXWM3SA#issuecomment-502058440>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AMK5BXMOS75EKXSFFIZ2UJTP2NXJPANCNFSM4HXPPUZQ>
> .
>
--
Satyaki,
Post-doctoral associate,
Gehring Lab.
--
Satyaki,
Post-doctoral associate,
Gehring Lab.
|
Hi,
Thank you for all the effort you have put into this technique and the code.
I have unfortunately run into a few issues with installation. I ran installation with your annotation and genome files (containing just Chr2) and it works fine. However, it fails to write a transcriptome.fasta file if the path to the genome or GFF files are specified using the -A and -G options.
I then substituted the annotation file and fasta files in the resource folder with a full length TAIR10 and gff annotation file. It now writes a transcriptome.fasta file but throws out the following error:
"### GENERATE ANNOTATION CLASS REFERENCE FILES ###
Getting transcript-level exons
Getting 5'-most exons
Converting transcript-level to gene-level annotations
Reformatting to GFF files to BED files
Differing number of BED fields encountered at line: 444. Exiting...
Reformatting to GFF files to BED files
Differing number of BED fields encountered at line: 457. Exiting...
Cleaning up temporary files
Setup complete."
In this case, the class.exons_by_gene.bed, class.single_exon_genes.bed and class.terminal_exons_by_gene.bed are empty.
Its not clear to me if I am doing something wrong. Please let me know if you have any pointers.
Thanks
Satyaki
The text was updated successfully, but these errors were encountered: