Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

N50 etc. are not calculated #34

Closed
HirokiK0 opened this issue Dec 15, 2024 · 9 comments
Closed

N50 etc. are not calculated #34

HirokiK0 opened this issue Dec 15, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@HirokiK0
Copy link

Thank you for developing such a useful tool.
I am having trouble with the following error, which only occurs in a specific bin.
Due to the terms of the agreement, we cannot send the bin, but this is occurring with multiple bin files, so it is not thought to be a problem caused by the binner.

Also, if you run these bin files through checkm2, they will run normally and we will be able to get all the scores. Therefore, it is also unlikely that the problem is due to missing files.

Based on this, I suspect that it is a bug in binette that only occurs under certain conditions.
If you have any solutions or things to check, please let me know.
Thank you.

io_manager.py", line 139, in write_bin_info
for bin_obj in sorted(bins, key=lambda x: (x.score, x.N50, -x.id), reverse=True):
TypeError: '>' not supported between instances of 'float' and 'NoneType'

@JeanMainguy
Copy link
Member

Hi,

Thank you for bringing this to our attention. This does seem like it could be a bug, but I'm having trouble identifying the root cause at the moment.

Could you please share the full error message along with the exact command you ran? If possible, it would also be helpful if you could run Binette with the --debug flag and share the output.

Thank you!

@JeanMainguy JeanMainguy added the bug Something isn't working label Dec 16, 2024
@HirokiK0
Copy link
Author

Thank you for your quick response.
I apologise for the lack of information.

The following is the result output in debug mode.
The results for a specific bin in input_bins_quality_reports are not being output. Presumably, an error occurs in this part and the process stops.
To repeat, it is not the case that a specific binner always causes an error, but rather the bin that causes an error differs for each sample (there are also samples that operate normally).

We thought that the cause was that errors were occurring in specific bins when calculating N50, completeness and contamination, so we made some changes to the source code so that it could sort even with missing values, and found that only two of the bins were not outputting these scores.

Also, strangely enough, if you only input the bins that are causing problems, no errors will occur (in this analysis, we used 6 bins, and also used some combinations of 2 bins for debugging, but the same errors occurred in all cases).

I am sorry that I cannot provide you with actual data, but I will provide you with as much information as possible.

0%| | 0/31643 [00:00<?, ?it/s]
0%| | 1/31643 [00:00<4:24:43, 1.99it/s]
0%| | 157/31643 [00:00<02:15, 231.62it/s]
13%|█▎ | 4067/31643 [00:00<00:03, 7007.19it/s]
20%|██ | 6373/31643 [00:01<00:02, 10206.57it/s]
26%|██▋ | 8372/31643 [00:01<00:03, 7267.78it/s]
37%|███▋ | 11592/31643 [00:01<00:01, 11177.27it/s]
44%|████▍ | 13902/31643 [00:01<00:01, 9641.09it/s]
53%|█████▎ | 16732/31643 [00:01<00:01, 12602.68it/s]
59%|█████▉ | 18741/31643 [00:02<00:00, 13587.21it/s]
65%|██████▌ | 20664/31643 [00:02<00:00, 13754.44it/s]
71%|███████ | 22433/31643 [00:02<00:01, 6815.77it/s]
76%|███████▌ | 23933/31643 [00:02<00:00, 7825.45it/s]
82%|████████▏ | 25851/31643 [00:03<00:00, 9397.57it/s]
89%|████████▉ | 28141/31643 [00:03<00:00, 11692.10it/s]
96%|█████████▌| 30251/31643 [00:03<00:00, 13567.06it/s]
100%|██████████| 31643/31643 [00:03<00:00, 9391.07it/s]

0%| | 0/31643 [00:00<?, ?contig/s]
59%|█████▉ | 18812/31643 [00:00<00:00, 188086.55contig/s]
100%|██████████| 31643/31643 [00:00<00:00, 193459.20contig/s]

0%| | 0/31643 [00:00<?, ?contig/s]
84%|████████▍ | 26733/31643 [00:00<00:00, 267299.03contig/s]
100%|██████████| 31643/31643 [00:00<00:00, 244348.56contig/s]
Traceback (most recent call last):
File "path/to/binette", line 10, in
sys.exit(main())
File "path/to/main.py", line 555, in main
io.write_original_bin_metrics(bin_set_name_to_bins, original_bin_report_dir)
File "path/to/io_manager.py", line 263, in write_original_bin_metrics
write_bin_info(bins, bins_metric_file)
File "path/to/io_manager.py", line 139, in write_bin_info
for bin_obj in sorted(bins, key=lambda x: (x.score, x.N50, -x.id), reverse=True):
TypeError: '>' not supported between instances of 'float' and 'NoneType'

@JeanMainguy
Copy link
Member

Hi,

Thanks for providing the additional information!

How many threads are you allocating to Binette? The scoring step uses multiprocessing, so something unexpected might be happening. Could you try rerunning the analysis with just one thread (--threads 1) to see if the issue persists? To save time, you might want to use the --resume flag. This allows Binette to reuse the existing intermediate pyrodigal and DIAMOND results found in the output directory.

When you mention:

in this analysis, we used 6 bins

Are you referring to 6 individual bins, or 6 bin sets?

Also it looks like the logs you shared was generated without the --debug flag. With --debug enabled, you’ll get more detailed logging that might help pinpoint what’s going on.

@HirokiK0
Copy link
Author

Unfortunately, the same error occurs even when threads is set to 1.

Also, I sent you a log that was not in debug mode. I apologise for this.
The following is the message when running with one thread.
The six bins means that we are inputting six directories output by six different biners.
Looking at the message, an error occurred when calculating the status of the bins output from semibin2.
However, as I mentioned in my first post, checkm2 works fine when used on its own, so I think it is unlikely that the problem is caused by a file error.

[2024-12-17 22:11:56] DEBUG - Writing metrics for bin set 'concoct' to file: path/to/input_bins_2.concoct.tsv
[2024-12-17 22:11:56] DEBUG - Writing metrics for bin set 'maxbin2' to file: path/to/input_bins_3.maxbin2.tsv
[2024-12-17 22:11:56] DEBUG - Writing metrics for bin set 'metabat2' to file: path/to/input_bins_4.metabat2.tsv
[2024-12-17 22:11:56] DEBUG - Writing metrics for bin set 'semibin2' to file: path/to/input_bins_5.semibin2.tsv
Traceback (most recent call last):
File "path/to/binette", line 10, in
sys.exit(main())
File "path/to/main.py", line 555, in main
io.write_original_bin_metrics(bin_set_name_to_bins, original_bin_report_dir)
File "path/to/io_manager.py", line 263, in write_original_bin_metrics
write_bin_info(bins, bins_metric_file)
File "path/to/io_manager.py", line 139, in write_bin_info
for bin_obj in sorted(bins, key=lambda x: (x.score, x.N50, -x.id), reverse=True):
TypeError: '>' not supported between instances of 'float' and 'NoneType'

@JeanMainguy
Copy link
Member

I was able to reproduce the error on my side ! So I will be able to dig more easily in it.
At the moment I think it has been introduced by some changes made in version 1.0.2.
Binette 1.0.1 seems to work just fine.

@HirokiK0
Copy link
Author

We made great progress in resolving the problem, which was very good.
Thank you very much!!

The versions I have tried are 1.0.3 and 1.0.4, so the solution you presented is likely to work!

I look forward to doing the analysis with this great tool that allows multiple bins to be entered at this high speed!

@JeanMainguy
Copy link
Member

JeanMainguy commented Dec 18, 2024

The problem occurred because two input bins from different sets shared the same contig content which is not unexpected. While the bins were correctly dereplicated during scoring, binette used the undereplicated bins when writing the input bins per set. This included unscored bins, which caused the error.

I made a fix in PR #36 and will release a new version asap.

Thanks again for reporting this bug !

@HirokiK0
Copy link
Author

Thank you for your help!

When the revised version is released, I will test it with my data too!

@HirokiK0
Copy link
Author

Thank you for updating the software.
I have confirmed that it works without errors.

Thank you for fixing it so quickly this time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants