Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rhomap results with extremely high 4Ner/kb #21

Open
masen1991 opened this issue May 25, 2022 · 9 comments
Open

rhomap results with extremely high 4Ner/kb #21

masen1991 opened this issue May 25, 2022 · 9 comments

Comments

@masen1991
Copy link

masen1991 commented May 25, 2022

Thank you for your software.I'm trying to use software to calculate the hotpots of recombination in the MHC region. THE SNP data of chr6 MHC region of YRI population in KGP were used. After PHASED, I used rhomap to directly calculate the data of 5MB region.

The running parameters are as follows:

rhomap -lk new_lk.txt -burn 100000 -its 1100000 -samp 100 -seq ldhat.sites -loc ldhat.locs

The likelihood lookup files used n=192, theta=0.001 per site file.
After the calculation,i want to get the hotpots region in MHC region,I have the following questions。
1.can i use summary.txt to draw the results or do i need to use stat on the rates.txt to get res.txt and then to draw a plot?
2.Do you have some recommended parameters with rhomap?Because in LDhat.2.2 manual there is only recommended parameters in interval.
3.And the weirdest thing is, my results are extremely high with 4Ner/kb。In some region it can reach 867 (4Ner/kb).And i try rhomap on annother population data i even see 2000 (4Ner/kb).Is the numerical value of this result accurate?And I used whole 5mb region of mhc snps to calculate the recombination rate ,do i need to split to small region ?

@auton1
Copy link
Owner

auton1 commented May 26, 2022 via email

@masen1991
Copy link
Author

Thank you very much for your reply, which is of great help to me

@masen1991
Copy link
Author

@auton1
I want to confirm the accuracy of the following results.
i plot YRI(with 96 samples as n=192 ) interval and rhomap as follows:
Screen Shot 2022-06-05 at 3 43 13 PM
and rhomap :
Screen Shot 2022-06-05 at 3 43 53 PM
It all rely on MHC region.
1.Can both plot tell the recombination hot spots region?
2.with out Ne and if i only want to find the region of recombination hot spots,which would u recommended to use?

@auton1
Copy link
Owner

auton1 commented Jun 5, 2022 via email

@masen1991
Copy link
Author

masen1991 commented Jun 6, 2022

@auton1
i use vcftools to get ldhat.sites and ldhat.locs
lkgen -lk lk_n192_t0.001 -nseq 96 to get a new new_lk.txt
with rhomap:
rhomap -lk new_lk.txt -burn 100000 -its 1000000 -samp 100 -seq ldhat.sites -loc ldhat.locs
with interval:
interval -lk new_lk.txt -samp 2000 -its 1000000 -bpen 5 -seq ldhat.sites -loc ldhat.locs
stat -input rates.txt
And yes ,the y-axis appear different so i want to know why the rhomap results with extremely high 4Ner/kb.is it right
?

@ekg
Copy link

ekg commented Jun 6, 2022

The recombination rate "hotspots" correspond to segmental duplication loci. These might be violating the model assumptions. On the other hand, it is highly possible based on what we're seeing in the HPRC that there is a lot of recombination-like activity in these loci.

This is the subgraph of the MHC in the HPRC pggb graph:

Screenshot from 2022-06-06 10-28-49

It approximately corresponds to your reference range.

From annotation using gfaestus, it's clear that the big bubble, which would correspond to the highest peak in your plot, corresponds to the MHC Class-II genes.

Screenshot from 2022-06-06 10-28-46

This is taken from three slides starting here https://docs.google.com/presentation/d/1qDSHpi1i2esmnIuiBA0g5EOGzvq8jUnkIw1yFFrp4hw/edit#slide=id.g12e89fff832_0_117

@masen1991
Copy link
Author

masen1991 commented Jun 7, 2022

@ekg @auton1
Thank you very much for your explanation.
I know that MHC Class-II genes may be very complicated to count the recombination.What I want to do is make sure that the results make sense. As u can see, in the other region ,such as up from 30000kb and the region between 31-32 mb ,they may be MHC Class-I genes(HLA-A/C/B) in this region ,so do the peak also means that recombination hot spots in that areas?

Thers are many paper use LDhat to count the recombination.Such as https://www.nature.com/articles/ng1885 and https://doi.org/10.1016/j.jgg.2022.03.006.
Screen Shot 2022-06-07 at 10 19 14 AM

They may end up very different from my rhomap plot.How to find a population-specific recombination hot spots region(may not need to be format as cM/Mb,but just to confirm,where have a higher recombination rate)?

@masen1991
Copy link
Author

@auton1
I have another question.
To convert result of rho to centimorgan,as i see on #18
if i use likelihood lookup file of n=192, theta=0.001 per site,and just let's assume the population size is 10000(Ne = 10000).
To convert from 4 Ne r / kb to cM / Mb,So if a region have a recombination rate of x (in units of 4 Ne r / kb), this would be 2.5 x in units of cM /Mb .
Is that correct?
And Do you have a recommended method for calculating Ne (How can I quickly calculate the Ne required for the conversion)?

@auton1
Copy link
Owner

auton1 commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants