-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
intersect result is wrong compare to bedtools intersect #27
Comments
I want to get the intersection region of A, B, C. Orange track from your tools. can you explain what happened? |
When you intersect with intervene the order of your files does matter and you will end up having slightly different results with a different order. Also, make sure that for Intervene we are using pybedtoost with |
Hi! I'm having the same issue and i would like to know how to add the option u=True or v=True in my command line. |
Hi @asntech Regarding your earlier comment that Why is this so? Set intersection should be a commutative operation, even when evaluating the overlaps between sets of genomic regions. Based on this older remark (daler/pybedtools#45 (comment)), it would make sense to run The way I see it, genomic regions are also sets themselves, with the unit element being the base pair. Thus, intersecting (sets of) genomic regions should accurately output the base-pairs common to those regions, even if the resulting number of overlapping elements won't match with the numbers of regions from the original BED files. |
Hi!! I find this tool fairly helpful. I tested intervene on my files and it works fine for me. I actually manually calculated the overlapping fraction of regions between two files and intervene calculated the correct percentage. When finding intersecting regions what important is to know which file will act as your reference. Above the diagonal of the matrix file that intervene produce are the scenarios where the files representing the column names are taken as reference (i.e. -a $column_header_files). Below diagonals are the scenarios where the row names represent the files taken as a reference (-a). It is fair enough to calculate what fraction of regions of the reference file overlaps with the other bed file. Using -u option will prevent the overcounting of features present in the reference files that overlap with other bed files. For two files at a time intervene performs two intersections. As indicated by (daler/pybedtools#45 (comment)) If the amount of overlap is your concern you can use 50% of overlap as threshold and -r option of bedtools intersect which does not count the overlaps discussed by (daler/pybedtools#45 (comment)) by using --bedtools-options f=0.50,r function |
Thanks for your thoughts. Keep in mind, I'm not interested in counting the overlapping regions as separate entities. Instead, I'm interested in getting the total length (in bp) of the overlapping regions (1 bp minimum overlap) from two or more BED files. From this point of view, the intersection operation should be commutative, i.e. the order of input BED files shouldn't matter (see the example below). I've also compared I would like to somehow get
However, I don't see a way of doing this with Out of curiosity, have you used It seems that |
Thanks @Rohit-Satyam for your interest and input. @amizeranschi For the You can also provide additional |
@asntech I was hoping to use |
Hi @asntech I have some more doubts regarding the fractions of overlap reported by intervene. I will try to talk this with an example: I had two-bed files adipose-tissue.csv and blood.csv and the following is the number of features (coordinates) in them
Gives me 354 overlapping features. If I had to calculate the fraction of features of adipose-tissue.csv overlapping with the blood.csv, I would calculate 354/978 which will give me 0.3619. However, Intervene is calculating 354/3326 which I think is quite misleading. Can you help me understand why Intervene does that? |
Thanks so much for the great tool of intervene! I have some doubts and wish can have your guidance! It seems the --bedtools-options r ("bedtools intersect -r ") is default from the venn plot result, which in the Venn plot may has less overlap, while 90% of bed1 file may has overlap with 30% of bed2, in the -r mode, it may only display 10% overlap in the Venn plot. So which is better to display in the Venn plot, with or without the -r mode? Thanks so much for your guidance! @asntech Appreciate! |
I compare R function intersect in GenomicRanges package(terms RI), linux bedtools intersect(LI) and your tools, find that RI is the same as LI, but some wrong in your tools.
I can not send you an example pic in here.
The text was updated successfully, but these errors were encountered: