-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standalone ancestry module #246
Comments
This is a nice idea, but one of the important stages of the ancestry adjustment is to calculate PGS for the reference panel and target cohort using the same set of variants (the intersection), so my feeling is that recalculating PGS is almost always worth doing. What do you think @smlmbrt 🧙? |
Why do you need to calculate PGS for the reference panel? I understand why you need the intersection of reference/target variants for the PCA, but for PGS my understanding is that one could use all the target variants that intersect with the specific PGS variants, regardless of the reference (e.g. 1000G) panel? |
Hi @iranmdl, what you're describing is what happens in the normal mode (without To get the ancestry-adjusted Z-scores or compare PGS from your target to a reference panel the PGS (weighted SUM of variants*weights) will need to be calculated on an identical set of variants. If the target PGS includes 20 high-effect high-frequency variants that are not included the reference panel PGS it would bias the comparison (and the regression fit PGS ~ PCs). The intersections are cached ( If you want to just run the ancestry adjustment on your own data we have that implemented with the |
Thank you for your response, @smlmbrt . To confirm my understanding based on your explanation, when adjusting a study individual's PGS for ancestry using the Wouldn't it be more efficient to consider the intersection of all genotyped variants of the individual with those in the reference panel, irrespective of whether they are involved in the PGS of interest? This way, principal components (PCs) need only be computed once, rather than separately for each PGS. This way, principal components (PCs) can be calculated just once for all variants, and then used to adjust whichever PGS is specified by the user. This approach seems like it could save time by eliminating the need to calculate PCs for each individual PGS, as these components would be based on the complete set of overlapped variants from the individual and the reference panel. |
Correct.
Yes, this is what the pipeline actually does - the PCA is calculated from an LD-thinned subset of variants that intersect between your target genotyped variants (1) and the reference panel (3). We are going to improve the cacheing to make sure it doesn't do this every run (#239), the cacheing that is currently implemented should do this, but it's slightly unreliable. |
The cacheing has been fixed by @nebfield in the next release. |
Description of feature
Is it possible to run ancestry adjustment on a set of previously calculated scores without recalculating the scores?
The idea would be to be able to run
--run_ancestry
in standalone mode, giving as input the raw PGS values, reference panel, and target genotyped data.The text was updated successfully, but these errors were encountered: