Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disagreement between Intervene and BEDtools #62

Open
lovelycatZ opened this issue May 20, 2024 · 4 comments
Open

Disagreement between Intervene and BEDtools #62

lovelycatZ opened this issue May 20, 2024 · 4 comments

Comments

@lovelycatZ
Copy link

Hello,

Thank you for providing such a convenient tool!

I have installed Intervene via conda and tested with my own dataset hap1.bed and hap2.bed to generate a Venn diagram using command line intervene venn -i hap*.bed --names hap1,hap2 --save-overlaps. Finally, in the Intervene_result folder, there were three files 1)01_hap2.bed; 2)10_hap1.bed; 3)11_hap1_hap2.bed. For the third file 11_hap1_hap2.bed, it contained 5206 intervals same with the intersection part of the Venn diagram that were found in both A and B, right? But it was strange that when I use BEDtools with command line bedtools intersect -a hap1.bed -b hap2.bed, there were 5773 records in the result file.

How to explain the the difference?

@Sarsaparella
Copy link

I want to add that there's also difference in results depending on what bed file you put first.
For example:
With intervene venn -i Aaa.bed Bbb.bed I get 1930 intersecting intervals
With intervene venn -i Bbb.bed Aaa.bed I get 1923 intersecting intervals
And bedtools intersect -a Aaa.bed -b Bbb.bed | wc -l gives me 1947 intervals

@lovelycatZ
Copy link
Author

Is there anybody can answer this question?

@asntech
Copy link
Owner

asntech commented May 30, 2024

Hi @lovelycatZ @Sarsaparella thanks for your interest and posting this.

This is a known issue when doing genomic interval intersections. This is because, for genomic regions, you will not always have a one-to-one overlap. That's why you slight difference when changing the file order: -i Aaa.bed Bbb.bed != -i Aaa.bed Bbb.bed. For example, the first combination prints how many intervals in A overlap with those in B.

For example, if you've three BED files (a, b and c) and when you run the pybedtools:

(a + b + c) != (b + c + a)

This is also explained well enough here by the pybedtools developer as posted by @amizeranschi #27 daler/pybedtools#45 (comment)

Also, Intervene is using pybedtools in the background with u=True and v=True. You can run it with --bedtools-options u=False

I hope this helps.

@amizeranschi
Copy link

Hi @asntech

Please consider implementing bedtools multiinter as an alternative to bedtools intersect, which is not very intuitive to interpret and it leads to people coming here with questions like the one above.

Implementing bedtools multiinter would make the intersections commutative, which means that the order of the sets would not matter, so (a + b + c) == (b + c + a). This would also make the Venn and UpSet diagrams more intuitive to interpret.

You would just need to use this function from pybedtools:
https://daler.github.io/pybedtools/autodocs/pybedtools.bedtool.BedTool.multi_intersect.html

which is a wrapper for bedtools multiinter:
https://bedtools.readthedocs.io/en/latest/content/tools/multiinter.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants