-
Notifications
You must be signed in to change notification settings - Fork 6
Plink File Formats
- A Data set in text mode is divided into 2 files: the
PED
file which describe the genotypes, and theMAP
file which describes the genetic variants.
![Text Dataset Class Diagram](http://yuml.me/diagram/plain;dir:LR/class/[PED Record|FID: Family ID;IID: Individual ID;Paternal ID;Maternal ID;Sex;Phenotype;Genotype 1 Allele 1;Genotype 1 Allele 2;...;Genotype N Allele 1;Genotype N Allele 2], [MAP Record|Chromosome;Variant Identifier;Genetic Distance;Base-pair position], [PED Record] - [MAP Record])
- A Data set in binary mode is divided into 3 files:
FAM
file which describe individual, theBED
file which describe the genotypes, and theBIM
file which describes the genetic variants.
![Binary Dataset Class Diagram](http://yuml.me/diagram/plain;dir:LR/class/[FAM Record|FID: Family ID;IID: Individual ID;Paternal ID;Maternal ID;Sex;Phenotype], [BED Record|IID: Individual ID;Variant Identifier;Genotype Allele 1;Genotype Allele 2], [BIM Record|Chromosome;Variant Identifier;Genetic Distance;Base-pair position;Reference allele;Alternate allele], [FAM Record] 1-* [BED Record], [BED Record] *-1 [BIM Record])
- An IBS Matrix is stored in a
Genome
file, which generates 1 line for each potential pair of patients. Thus if there is N patients, there will beN(N-1)/2
rows. The columns stores some statistical computations.
![Pairwise IBS Metrics Class Diagram](http://yuml.me/diagram/plain;dir:LR/class/[Genome Record|FID1: Family ID of individual 1;IID1: Individual ID of individual 1;FID2: Family ID of individual 2;IID2: Individual ID of individual 2;RT: Relationship type given PED file;EZ: Expected IBD sharing given PED file;Z0: P%28IBD=0%29; Z1: P%28IBD=1%29; Z2: P%28IBD=2%29; PI_HAT: Proportion IBD; PHE: Pairwise phenotypic code; DST: IBS distance; PPC: IBS binomial test; RATIO: of HetHet; IBS0; IBS1; IBS2; HOMHOM; HETHET])
- An IBS Clustering Analysis generates 4 cluster files, hereafter referenced as
Cluster 0
,Cluster 1
,Cluster 2
,Cluster 3
. In some cases, anHH
file is also created.
TODO: diagramme à compléter par l'équipe de MGL804
![Cluster Class Diagram](http://yuml.me/diagram/plain;dir:LR/class/[Cluster 0], [Cluster 1|CID: Cluster ID;FID1_IID1;...;FIDN_IIDN], [Cluster 2|FID: Family ID;IID: Individual ID;CID: Cluster ID], [Cluster 3|FID: Family ID;IID: Individual ID;...], [HH])
- An MDS Analysis generates different files depending on the option being used:
-
--mds-plot
generates aMDS
file. This file can than be plotted into R Software to visualize the data. -
--matrix
generates aMIBS
file. -
--distance-matrix
generates aMDIST
file.
-
TODO: diagramme à compléter par l'équipe de MGL804
![MDS Class Diagram](http://yuml.me/diagram/plain;dir:TB/class/[MDS], [MIBS], [MDIST])
- An Outlier detection analysis is performed with the
--neighbour
option and generates aNEAREST
file. - Commande pour plink 1.07 : plink
--file
data--cluster
--neighbour
n1
n2
- Commande pour plink 1.9 : plink
--neighbour
n1
n2
(Pas besoin de specifier --cluster)
Où : n1
et n2
représentent l'offset des voisins les plus proches.
\La commande genere un fichier de la structure suivante.
![IBS Outliers Class Diagram](http://yuml.me/diagram/plain;dir:LR/class/[NEAREST|FID: Family ID;IID: Individual ID;NN: Nearest neighbour level;MIN_DST: IBS distance of nth nearest neighbour;Z: MIN_DST converted to a Z score;FID2: Family ID of the nth nearest neighbour;IID2: Individual ID of the nth nearest neighbour;PROP_DIFF: Proportion of significantly different others])
TODO: à compléter par l'équipe de MGL804 (si nécessaire)