N50 etc. are not calculated #34

HirokiK0 · 2024-12-15T23:17:43Z

Thank you for developing such a useful tool.
I am having trouble with the following error, which only occurs in a specific bin.
Due to the terms of the agreement, we cannot send the bin, but this is occurring with multiple bin files, so it is not thought to be a problem caused by the binner.

Also, if you run these bin files through checkm2, they will run normally and we will be able to get all the scores. Therefore, it is also unlikely that the problem is due to missing files.

Based on this, I suspect that it is a bug in binette that only occurs under certain conditions.
If you have any solutions or things to check, please let me know.
Thank you.

io_manager.py", line 139, in write_bin_info
for bin_obj in sorted(bins, key=lambda x: (x.score, x.N50, -x.id), reverse=True):
TypeError: '>' not supported between instances of 'float' and 'NoneType'

JeanMainguy · 2024-12-16T14:18:13Z

Hi,

Thank you for bringing this to our attention. This does seem like it could be a bug, but I'm having trouble identifying the root cause at the moment.

Could you please share the full error message along with the exact command you ran? If possible, it would also be helpful if you could run Binette with the --debug flag and share the output.

Thank you!

HirokiK0 · 2024-12-17T07:25:54Z

Thank you for your quick response.
I apologise for the lack of information.

The following is the result output in debug mode.
The results for a specific bin in input_bins_quality_reports are not being output. Presumably, an error occurs in this part and the process stops.
To repeat, it is not the case that a specific binner always causes an error, but rather the bin that causes an error differs for each sample (there are also samples that operate normally).

We thought that the cause was that errors were occurring in specific bins when calculating N50, completeness and contamination, so we made some changes to the source code so that it could sort even with missing values, and found that only two of the bins were not outputting these scores.

Also, strangely enough, if you only input the bins that are causing problems, no errors will occur (in this analysis, we used 6 bins, and also used some combinations of 2 bins for debugging, but the same errors occurred in all cases).

I am sorry that I cannot provide you with actual data, but I will provide you with as much information as possible.

0%| | 0/31643 [00:00<?, ?it/s]
0%| | 1/31643 [00:00<4:24:43, 1.99it/s]
0%| | 157/31643 [00:00<02:15, 231.62it/s]
13%|█▎ | 4067/31643 [00:00<00:03, 7007.19it/s]
20%|██ | 6373/31643 [00:01<00:02, 10206.57it/s]
26%|██▋ | 8372/31643 [00:01<00:03, 7267.78it/s]
37%|███▋ | 11592/31643 [00:01<00:01, 11177.27it/s]
44%|████▍ | 13902/31643 [00:01<00:01, 9641.09it/s]
53%|█████▎ | 16732/31643 [00:01<00:01, 12602.68it/s]
59%|█████▉ | 18741/31643 [00:02<00:00, 13587.21it/s]
65%|██████▌ | 20664/31643 [00:02<00:00, 13754.44it/s]
71%|███████ | 22433/31643 [00:02<00:01, 6815.77it/s]
76%|███████▌ | 23933/31643 [00:02<00:00, 7825.45it/s]
82%|████████▏ | 25851/31643 [00:03<00:00, 9397.57it/s]
89%|████████▉ | 28141/31643 [00:03<00:00, 11692.10it/s]
96%|█████████▌| 30251/31643 [00:03<00:00, 13567.06it/s]
100%|██████████| 31643/31643 [00:03<00:00, 9391.07it/s]

0%| | 0/31643 [00:00<?, ?contig/s]
59%|█████▉ | 18812/31643 [00:00<00:00, 188086.55contig/s]
100%|██████████| 31643/31643 [00:00<00:00, 193459.20contig/s]

0%| | 0/31643 [00:00<?, ?contig/s]
84%|████████▍ | 26733/31643 [00:00<00:00, 267299.03contig/s]
100%|██████████| 31643/31643 [00:00<00:00, 244348.56contig/s]
Traceback (most recent call last):
File "path/to/binette", line 10, in
sys.exit(main())
File "path/to/main.py", line 555, in main
io.write_original_bin_metrics(bin_set_name_to_bins, original_bin_report_dir)
File "path/to/io_manager.py", line 263, in write_original_bin_metrics
write_bin_info(bins, bins_metric_file)
File "path/to/io_manager.py", line 139, in write_bin_info
for bin_obj in sorted(bins, key=lambda x: (x.score, x.N50, -x.id), reverse=True):
TypeError: '>' not supported between instances of 'float' and 'NoneType'

JeanMainguy · 2024-12-17T10:30:32Z

Hi,

Thanks for providing the additional information!

How many threads are you allocating to Binette? The scoring step uses multiprocessing, so something unexpected might be happening. Could you try rerunning the analysis with just one thread (--threads 1) to see if the issue persists? To save time, you might want to use the --resume flag. This allows Binette to reuse the existing intermediate pyrodigal and DIAMOND results found in the output directory.

When you mention:

in this analysis, we used 6 bins

Are you referring to 6 individual bins, or 6 bin sets?

Also it looks like the logs you shared was generated without the --debug flag. With --debug enabled, you’ll get more detailed logging that might help pinpoint what’s going on.

HirokiK0 · 2024-12-17T13:10:31Z

Unfortunately, the same error occurs even when threads is set to 1.

Also, I sent you a log that was not in debug mode. I apologise for this.
The following is the message when running with one thread.
The six bins means that we are inputting six directories output by six different biners.
Looking at the message, an error occurred when calculating the status of the bins output from semibin2.
However, as I mentioned in my first post, checkm2 works fine when used on its own, so I think it is unlikely that the problem is caused by a file error.

[2024-12-17 22:11:56] DEBUG - Writing metrics for bin set 'concoct' to file: path/to/input_bins_2.concoct.tsv
[2024-12-17 22:11:56] DEBUG - Writing metrics for bin set 'maxbin2' to file: path/to/input_bins_3.maxbin2.tsv
[2024-12-17 22:11:56] DEBUG - Writing metrics for bin set 'metabat2' to file: path/to/input_bins_4.metabat2.tsv
[2024-12-17 22:11:56] DEBUG - Writing metrics for bin set 'semibin2' to file: path/to/input_bins_5.semibin2.tsv
Traceback (most recent call last):
File "path/to/binette", line 10, in
sys.exit(main())
File "path/to/main.py", line 555, in main
io.write_original_bin_metrics(bin_set_name_to_bins, original_bin_report_dir)
File "path/to/io_manager.py", line 263, in write_original_bin_metrics
write_bin_info(bins, bins_metric_file)
File "path/to/io_manager.py", line 139, in write_bin_info
for bin_obj in sorted(bins, key=lambda x: (x.score, x.N50, -x.id), reverse=True):
TypeError: '>' not supported between instances of 'float' and 'NoneType'

JeanMainguy · 2024-12-18T11:13:27Z

I was able to reproduce the error on my side ! So I will be able to dig more easily in it.
At the moment I think it has been introduced by some changes made in version 1.0.2.
Binette 1.0.1 seems to work just fine.

HirokiK0 · 2024-12-18T13:33:37Z

We made great progress in resolving the problem, which was very good.
Thank you very much!!

The versions I have tried are 1.0.3 and 1.0.4, so the solution you presented is likely to work!

I look forward to doing the analysis with this great tool that allows multiple bins to be entered at this high speed!

JeanMainguy · 2024-12-18T16:12:19Z

The problem occurred because two input bins from different sets shared the same contig content which is not unexpected. While the bins were correctly dereplicated during scoring, binette used the undereplicated bins when writing the input bins per set. This included unscored bins, which caused the error.

I made a fix in PR #36 and will release a new version asap.

Thanks again for reporting this bug !

HirokiK0 · 2024-12-18T23:57:14Z

Thank you for your help!

When the revised version is released, I will test it with my data too!

HirokiK0 · 2024-12-20T04:16:50Z

Thank you for updating the software.
I have confirmed that it works without errors.

Thank you for fixing it so quickly this time!

JeanMainguy added the bug Something isn't working label Dec 16, 2024

JeanMainguy mentioned this issue Dec 18, 2024

Fix missing score in input bins #36

Merged

HirokiK0 closed this as completed Dec 20, 2024

JeanMainguy mentioned this issue Jan 4, 2025

for bin_obj in sorted(bins, key=lambda x: (x.score, x.N50, -x.id), reverse=True): TypeError: '>' not supported between instances of 'float' and 'NoneType' #39

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

N50 etc. are not calculated #34

N50 etc. are not calculated #34

HirokiK0 commented Dec 15, 2024

JeanMainguy commented Dec 16, 2024

HirokiK0 commented Dec 17, 2024

JeanMainguy commented Dec 17, 2024

HirokiK0 commented Dec 17, 2024

JeanMainguy commented Dec 18, 2024

HirokiK0 commented Dec 18, 2024

JeanMainguy commented Dec 18, 2024 •

edited

Loading

HirokiK0 commented Dec 18, 2024

HirokiK0 commented Dec 20, 2024

N50 etc. are not calculated #34

N50 etc. are not calculated #34

Comments

HirokiK0 commented Dec 15, 2024

JeanMainguy commented Dec 16, 2024

HirokiK0 commented Dec 17, 2024

JeanMainguy commented Dec 17, 2024

HirokiK0 commented Dec 17, 2024

JeanMainguy commented Dec 18, 2024

HirokiK0 commented Dec 18, 2024

JeanMainguy commented Dec 18, 2024 • edited Loading

HirokiK0 commented Dec 18, 2024

HirokiK0 commented Dec 20, 2024

JeanMainguy commented Dec 18, 2024 •

edited

Loading