Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Binning fails #7

Open
winterlich opened this issue Apr 28, 2023 · 16 comments
Open

Initial Binning fails #7

winterlich opened this issue Apr 28, 2023 · 16 comments

Comments

@winterlich
Copy link

Hi there,
I just tried Nanophase, both with one of my datasets and with the example dataset.
The assembly using flye --meta works fine, but the pipeline keeps terminating at the initial binning step. The logfile of MetaBat2 shows only this:
MetaBAT 2 (v2.12.1) using minContig 2500, minCV 1.0, minCVSum 1.0, maxP 95%, minS 60, and maxEdges 200.

I tried the version 0.2.2 and 0.2.3 but both versions did not work with mine datasets or the example dataset.

The NanoPhase check shows this information:

Check software availability and locations
The following packages have been found
#package location
flye /home/xxx/anaconda3/envs/nanophase0.2.2/bin/flye
metabat2 /home/xxx/anaconda3/envs/nanophase0.2.2/bin/metabat2
maxbin2 /home/xxx/anaconda3/envs/nanophase0.2.2/bin/run_MaxBin.pl
metawrap /home/xxx/anaconda3/envs/nanophase0.2.2/bin/metawrap
checkm /home/xxx/anaconda3/envs/nanophase0.2.2/bin/checkm
racon /home/xxx/anaconda3/envs/nanophase0.2.2/bin/racon
medaka /home/xxx/anaconda3/envs/nanophase0.2.2/bin/medaka
polypolish /home/xxx/anaconda3/envs/nanophase0.2.2/bin/polypolish
POLCA /home/xxx/anaconda3/envs/nanophase0.2.2/bin/polca.sh
bwa /home/xxx/anaconda3/envs/nanophase0.2.2/bin/bwa
seqtk /home/xxx/anaconda3/envs/nanophase0.2.2/bin/seqtk
minimap2 /home/xxx/software/ont-guppy/bin/minimap2
BBMap /home/xxx/anaconda3/envs/nanophase0.2.2/bin/BBMap
parallel /home/xxx/anaconda3/envs/nanophase0.2.2/bin/parallel
perl /home/xxx/anaconda3/envs/nanophase0.2.2/bin/perl
samtools /home/xxx/anaconda3/envs/nanophase0.2.2/bin/samtools
gtdbtk /home/xxx/anaconda3/envs/nanophase0.2.2/bin/gtdbtk
fastANI /home/xxx/anaconda3/envs/nanophase0.2.2/bin/fastANI
blastp /home/xxx/anaconda3/envs/nanophase0.2.2/bin/blastp
All required packages have been found in the environment. If the above certain packages integrated into nanophase were used in your investigation, please give them credit as well :)
grep: warning: stray \ before /
Warning: [flye metabat2 maxbin2 metawrap checkm racon medaka polypolish POLCA bwa seqtk BBMap parallel perl samtools gtdbtk fastANI blastp minimap2] has not been installed in the [nanophase] env. Strongly recommend intalling all packages in the nanophase env, or it may result in a failure

This message is confusing, since the required packages are installed and found, but the pipeline keeps warning about missing software.

Anyway, I would love to test your pipeline. Please let me know, if i can provide any additional information for this issue.

@Hydro3639
Copy link
Owner

Hi, could you provide the command that you used?

@winterlich
Copy link
Author

winterlich commented Apr 28, 2023

Sure:
For the example dataset, I used this command:
nanophase meta -l lr.fa.gz -t 24 -o ont-nanophase-out

for my own datasets, i used the same command, but modified the files and output folder, obviously.

@Hydro3639
Copy link
Owner

I guess the confusing message you mentioned before is due to an installation issue. for example, the name of conda env should be nanophase0.2.2, but somehow, as I can see from the log file, you activated nanophase env using a command like conda activate nanophase, but the nanophase command invoked was under the nanophase0.2.2 env. Because they are only warning messages, so no need to worry about this.

Before I can identify the potential issues, could you use the following command (after activation of the nanophase package) to see what exactly has happened for metabat binning:
metabat2 -t 16 -i ont-nanophase-out/01-LongAssemblies/assembly.fasta -o ont-nanophase-out02-LongBins/INITIAL_BINNING/metabat2/metabat2-bins/bin -a ont-nanophase-out/02-LongBins/INITIAL_BINNING/metabat2/metabat2_abun.txt --cvExt

@winterlich
Copy link
Author

Thanks for your answer.
I performed the analysis again, using the correct environments for nanophase 0.2.2 and nanophase 0.2.3 but still got the same error. The metabat2 command you suggested results in no additional results, but gives a "segmentation fault".

@Hydro3639
Copy link
Owner

It is weird for me, I can't reproduce this error using the example dataset. I would suggest removing the whole package of nanophase 0.2.2 and re-install it to see if this problem could be resolved.

@aljazdzy
Copy link

I am having this exact same issue with the exact same results as this thread. Winterlich did you ever solve the problem?

@winterlich
Copy link
Author

winterlich commented Jun 12, 2023

Okay, that is interesting. I reinstalled the package as suggested, but this did not resolve the problem. I wasn't able to dive deeper into this, so far. But I am happy for any suggestions........

@aljazdzy
Copy link

Hmm, what are the general size of your reads? Mine are admittedly kind of small for nanopore and its possible that flye is filtering too many so as that metabat2 does not have enough information to work with.

@winterlich
Copy link
Author

Thats a good idea, my read sets are also rather small. I will try another, larger dataset these days and will report on this...

@Hydro3639
Copy link
Owner

Thank you both for your contributions!

If only a small long-read dataset was provided, it would be pretty challenging to perform genome binning. If you wanted to try nanophase with a long-read dataset, we had sequenced a mock community (you can find more details about the mock community in our paper) using nanopore sequencing and uploaded it to NCBI. The dataset can be downloaded via the following command: (you may need to install sra-tools)

fastq-dump SRR17913199

Please don't hesitate to let me know if I can help.

Best

@aljazdzy
Copy link

So I ran it using the provided practice data set from your setup page and this was my result:

All required packages have been found in the environment. If the above certain packages integrated into nanophase were used in your investigation, please give them credit as well :)
[2023-06-14 12:08:04] TASK: Long-read assembly starts (be patient)
[2023-06-14 12:16:40] DONE: long-read assembly finished sucessfully: detailed log file is miniconda3envsNanophasedir/01-LongAssemblies/flye.log
[2023-06-14 12:16:40] TASK: Initial binning::metabat2 binning starts
/root/miniconda3/envs/nanophase-v0.2.2/bin/nanophase.meta: line 245: 3997 Segmentation fault metabat2 -t $N_threads -i $OutDIR/01-LongAssemblies/assembly.fasta -o $OutDIR/02-LongBins/INITIAL_BINNING/metabat2/metabat2-bins/bin -a $OutDIR/02-LongBins/INITIAL_BINNING/metabat2/metabat2_abun.txt --cvExt > $OutDIR/02-LongBins/INITIAL_BINNING/metabat2/bin.log
[2023-06-14 12:16:40] ERROR: Something wrong with metabat2 binning, please also check miniconda3envsNanophasedir/02-LongBins/INITIAL_BINNING/metabat2/bin.log, terminating...

So the I would guess that the issue is going beyond just the data we are providing, though as of yet/what I am not sure. I'm not running this on the world's most powerful computer either, is it possible I'm hitting a CPU bottleneck? I'm running it on a laptop with an i7 1360p (12 cores, 5Ghz) with 32 GBs of RAM. The ram is definitely not the bottleneck but I'm noticing my CPU is hitting 100% utilization during this run.

@Hydro3639
Copy link
Owner

Hydro3639 commented Jun 14, 2023

Did you mean the lr.fa.gz in the Example dataset?

@aljazdzy
Copy link

Yes! Is there a better one I should run?

@Hydro3639
Copy link
Owner

I am still unsure what happened, I would expect the command to exit at the semibin stage rather than metabat2 if you use the provided lr.fa.gz. If you want to try v0.2.3, you can download the long-read dataset: SRR17913199, as I mentioned earlier. Is that possible for you to run it on a Linux workstation?

@aljazdzy
Copy link

I actually did run this on ubuntu on a windows subsystem, I don't have a workstation though. I did originally do this on v0.2.3 and I had gotten the same output with my data, I didn't try it with the practice data though. I can try it with the specific long-read data set as well though.

@aljazdzy
Copy link

aljazdzy commented Jun 21, 2023

Ok so I ran the specified data-set on v0.2.3 and this was my output:
(nanophase) root@Andrew:~/miniconda3/envs/nanophase# nanophase meta -l SRR17913199.fastq -t 16 -o Practice
[2023-06-21 13:26:03] INFO: nanophase (meta) starts
[2023-06-21 13:26:03] INFO: Command line: /root/miniconda3/envs/nanophase/bin/nanophase meta -l SRR17913199.fastq -t 16 -o Practice
[2023-06-21 13:26:03] INFO: long_read_only model was selected, only Nanopore long reads will be used
[2023-06-21 13:26:03] CHECK: Nanopore long-read (fastq) file has been found
[2023-06-21 13:26:03] CHECK: Check software availability and locations
[2023-06-21 13:26:03] INFO: The following packages have been found
#package location
nanophase /root/miniconda3/envs/nanophase/bin/nanophase
flye /root/miniconda3/envs/nanophase/bin/flye
metabat2 /root/miniconda3/envs/nanophase/bin/metabat2
maxbin2 /root/miniconda3/envs/nanophase/bin/run_MaxBin.pl
SemiBin /root/miniconda3/envs/nanophase/bin/SemiBin
metawrap /root/miniconda3/envs/nanophase/bin/metawrap
checkm /root/miniconda3/envs/nanophase/bin/checkm
racon /root/miniconda3/envs/nanophase/bin/racon
medaka /root/miniconda3/envs/nanophase/bin/medaka
polypolish /root/miniconda3/envs/nanophase/bin/polypolish
POLCA /root/miniconda3/envs/nanophase/bin/polca.sh
bwa /root/miniconda3/envs/nanophase/bin/bwa
seqtk /root/miniconda3/envs/nanophase/bin/seqtk
minimap2 /root/miniconda3/envs/nanophase/bin/minimap2
BBMap /root/miniconda3/envs/nanophase/bin/BBMap
parallel /root/miniconda3/envs/nanophase/bin/parallel
perl /root/miniconda3/envs/nanophase/bin/perl
samtools /root/miniconda3/envs/nanophase/bin/samtools
gtdbtk /root/miniconda3/envs/nanophase/bin/gtdbtk
fastANI /root/miniconda3/envs/nanophase/bin/fastANI
All required packages have been found in the environment. If the above certain packages integrated into nanophase were used in your investigation, please give them credit as well :)
[2023-06-21 13:26:03] TASK: Long-read assembly starts (be patient)
[2023-06-21 13:28:49] ERROR: Something wrong with long-read (metaflye) assembly, please also check Practice/01-LongAssemblies/tmp/flye.log.debug for more information, terminating...

Which is different than our previous outputs, I looked into the flye.log.debug and it looks like my system just ran out of memory (oops), so I didn't get much data out of that attempt. Mayhaps I shall try again. The bin log is showing conitigs being created from my previous attempts with my own data, I think flye might just be set to too high an overlap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants