Skip to content
karim73 edited this page Jun 7, 2015 · 36 revisions

PLink File Formats

1. Genotype & Variant formats (text mode)

  • A Data set in text mode is divided into 2 files: the PED file which describe the genotypes, and the MAP file which describes the genetic variants.

![Text Dataset Class Diagram](http://yuml.me/diagram/plain;dir:LR/class/[PED Record|FID: Family ID;IID: Individual ID;Paternal ID;Maternal ID;Sex;Phenotype;Genotype 1 Allele 1;Genotype 1 Allele 2;...;Genotype N Allele 1;Genotype N Allele 2], [MAP Record|Chromosome;Variant Identifier;Genetic Distance;Base-pair position], [PED Record] - [MAP Record])

2. Genotype & Variant formats (binary mode)

  • A Data set in binary mode is divided into 3 files: FAM file which describe individual, the BED file which describe the genotypes, and the BIM file which describes the genetic variants.

![Binary Dataset Class Diagram](http://yuml.me/diagram/plain;dir:LR/class/[FAM Record|FID: Family ID;IID: Individual ID;Paternal ID;Maternal ID;Sex;Phenotype], [BED Record|IID: Individual ID;Variant Identifier;Genotype Allele 1;Genotype Allele 2], [BIM Record|Chromosome;Variant Identifier;Genetic Distance;Base-pair position;Reference allele;Alternate allele], [FAM Record] 1-* [BED Record], [BED Record] *-1 [BIM Record])

3. Pairwise IBS matrix format

  • An IBS Matrix is stored in a Genome file, which generates 1 line for each potential pair of patients. Thus if there is N patients, there will be N(N-1)/2 rows. The columns stores some statistical computations.

![Pairwise IBS Metrics Class Diagram](http://yuml.me/diagram/plain;dir:LR/class/[Genome Record|FID1: Family ID of individual 1;IID1: Individual ID of individual 1;FID2: Family ID of individual 2;IID2: Individual ID of individual 2;RT: Relationship type given PED file;EZ: Expected IBD sharing given PED file;Z0: P%28IBD=0%29; Z1: P%28IBD=1%29; Z2: P%28IBD=2%29; PI_HAT: Proportion IBD; PHE: Pairwise phenotypic code; DST: IBS distance; PPC: IBS binomial test; RATIO: of HetHet; IBS0; IBS1; IBS2; HOMHOM; HETHET])

4. IBS clustering formats

TODO: diagramme à compléter par l'équipe de MGL804

![Cluster Class Diagram](http://yuml.me/diagram/plain;dir:LR/class/[Cluster 0], [Cluster 1|CID: Cluster ID;FID1_IID1;...;FIDN_IIDN], [Cluster 2|FID: Family ID;IID: Individual ID;CID: Cluster ID], [Cluster 3|FID: Family ID;IID: Individual ID;...], [HH])

5. MDS analysis formats

  • An MDS Analysis generates different files depending on the option being used:
    • --mds-plot generates a MDS file. This file can than be plotted into R Software to visualize the data.
    • --matrix generates a MIBS file.
    • --distance-matrix generates a MDIST file.

TODO: diagramme à compléter par l'équipe de MGL804

![MDS Class Diagram](http://yuml.me/diagram/plain;dir:TB/class/[MDS], [MIBS], [MDIST])

6. IBS Clustering outliers format

Commande pour plink 1.07 : plink --file data --cluster --neighbour [n1] [n2] Commande pour plink 1.9 : plink --neighbour [n1] [n2] ( pas besoin de specifier --cluster )

Où : n1 et n2 représentent l'offset des

![IBS Outliers Class Diagram](http://yuml.me/diagram/plain;dir:LR/class/[NEAREST], [Nearest|FID: Family ID;IID: Individual ID;NN: Nearest neighbour level (see below);MIN_DST: IBS distance of nth nearest neighbour (see below);Z : MIN_DST converted to a Z score (see below);FID2: Family ID of the nth nearest neighbour;IID2: Individual ID of the nth nearest neighbour;PROP_DIFF: Proportion of significantly different others])

7. Other formats

TODO: à compléter par l'équipe de MGL804 (si nécessaire)

Clone this wiki locally