Simulated phenotypes based on various genetic architecture on 487,409 individuals (including individual from diverse ancestral groups) and 1,054,151 hapmap3 SNPs.
PRS uncertainty will be trained on a subset of 270,000 European individuals using LDpred2 on simulation data.
- Genotype PLINK data for all 487,409 individuals and 1,054,151 hapmap3 SNPs:
DATA/PLINK/chr{chrom}
- Individual partition:
DATA/INDIVLIST/{group}.indiv
- Real phenotypes and covariates:
DATA/REAL-PHENO
- Simulated phenotype / genetic value / effect sizes:
DATA/SIM-PHENO/{setup}/sim.[pheno | pheno_g | beta].tsv.gz
, phenotypes for each simulation replicate:DATA/SIM-PHENO/{setup}/sim_{sim_i}.pheno.tsv.gz
. - Simulated PRS run:
TODO
(will be done by Yi)
- Subset a set of SNPs (overlap between hapmap3 and UK Biobank)
- Use PLINK to subset SNPs by chromosomes.
- Phenotype compilation (not used in simulation studies, but may be used in accompanied real data analyses)
- Simulate the phenotype with various genetic architecture.
Run LDpred2 on simulated data.
- prepare-ld.sh precompute the LD matrices.
- prepare-plink.sh merge the PLINK
- run-prs-weights.sh run the PRS weights.
TODO
- Run LDpred2 on real data.