Plink File Formats

PLink File Formats

1. Genotype & Variant formats (text mode)

A Data set in text mode is divided into 2 files: the PED file which describe the genotypes, and the MAP file which describes the genetic variants.

![Text Dataset Class Diagram](http://yuml.me/diagram/plain;dir:LR/class/[PED Record|FID: Family ID;IID: Individual ID;Paternal ID;Maternal ID;Sex;Phenotype;Genotype 1 Allele 1;Genotype 1 Allele 2;...;Genotype N Allele 1;Genotype N Allele 2], [MAP Record|Chromosome;Variant Identifier;Genetic Distance;Base-pair position], [PED Record] - [MAP Record])

2. Genotype & Variant formats (binary mode)

A Data set in binary mode is divided into 3 files: FAM file which describe individual, the BED file which describe the genotypes, and the BIM file which describes the genetic variants.

![Binary Dataset Class Diagram](http://yuml.me/diagram/plain;dir:LR/class/[FAM Record|FID: Family ID;IID: Individual ID;Paternal ID;Maternal ID;Sex;Phenotype], [BED Record|IID: Individual ID;Variant Identifier;Genotype Allele 1;Genotype Allele 2], [BIM Record|Chromosome;Variant Identifier;Genetic Distance;Base-pair position;Reference allele;Alternate allele], [FAM Record] 1-* [BED Record], [BED Record] *-1 [BIM Record])

3. Pairwise IBS matrix format

An IBS Matrix is stored in a Genome file, which generates 1 line for each potential pair of patients. Thus if there is N patients, there will be N(N-1)/2 rows. The columns stores some statistical computations.

![Pairwise IBS Metrics Class Diagram](http://yuml.me/diagram/plain;dir:LR/class/[Genome Record|FID1: Family ID of individual 1;IID1: Individual ID of individual 1;FID2: Family ID of individual 2;IID2: Individual ID of individual 2;RT: Relationship type given PED file;EZ: Expected IBD sharing given PED file;Z0: P%28IBD=0%29; Z1: P%28IBD=1%29; Z2: P%28IBD=2%29; PI_HAT: Proportion IBD; PHE: Pairwise phenotypic code; DST: IBS distance; PPC: IBS binomial test; RATIO: of HetHet; IBS0; IBS1; IBS2; HOMHOM; HETHET])

4. IBS clustering formats

An IBS Clustering Analysis generates 4 cluster files, hereafter referenced as Cluster 0, Cluster 1, Cluster 2, Cluster 3. In some cases, an HH file is also created.

TODO: diagramme à compléter par l'équipe de MGL804

![Cluster Class Diagram](http://yuml.me/diagram/plain;dir:LR/class/[Cluster 0], [Cluster 1|CID: Cluster ID;FID1_IID1;...;FIDN_IIDN], [Cluster 2|FID: Family ID;IID: Individual ID;CID: Cluster ID], [Cluster 3|FID: Family ID;IID: Individual ID;...], [HH])

5. MDS analysis formats

An MDS Analysis generates different files depending on the option being used:
- --mds-plot generates a MDS file. This file can than be plotted into R Software to visualize the data.
- --matrix generates a MIBS file.
- --distance-matrix generates a MDIST file.

TODO: diagramme à compléter par l'équipe de MGL804

![MDS Class Diagram](http://yuml.me/diagram/plain;dir:TB/class/[MDS], [MIBS], [MDIST])

6. IBS Clustering outliers format

An Outlier detection analysis is performed with the --neighbour option and generates a NEAREST file.
Commande pour plink 1.07 : plink --file data --cluster --neighbour n1 n2
Commande pour plink 1.9 : plink --neighbour n1 n2 (Pas besoin de specifier --cluster)

Où : n1 et n2 représentent l'offset des voisins les plus proches. \La commande genere un fichier de la structure suivante. ![IBS Outliers Class Diagram](http://yuml.me/diagram/plain;dir:LR/class/[NEAREST|FID: Family ID;IID: Individual ID;NN: Nearest neighbour level;MIN_DST: IBS distance of nth nearest neighbour;Z: MIN_DST converted to a Z score;FID2: Family ID of the nth nearest neighbour;IID2: Individual ID of the nth nearest neighbour;PROP_DIFF: Proportion of significantly different others])

7. Other formats

TODO: à compléter par l'équipe de MGL804 (si nécessaire)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly