Skip to content
naceur mhenni edited this page Jul 29, 2015 · 36 revisions

PLink File Formats

1. Genotype & Variant formats (text mode)

  • A Data set in text mode is divided into 2 files: the PED file which describe the genotypes, and the MAP file which describes the genetic variants.

![Text Dataset Class Diagram](http://yuml.me/diagram/plain;dir:LR/class/[PED Record|FID: Family ID;IID: Individual ID;Paternal ID;Maternal ID;Sex;Phenotype;Genotype 1 Allele 1;Genotype 1 Allele 2;...;Genotype N Allele 1;Genotype N Allele 2], [MAP Record|Chromosome;Variant Identifier;Genetic Distance;Base-pair position], [PED Record] - [MAP Record])

2. Genotype & Variant formats (binary mode)

  • A Data set in binary mode is divided into 3 files: FAM file which describe individual, the BED file which describe the genotypes, and the BIM file which describes the genetic variants.

![Binary Dataset Class Diagram](http://yuml.me/diagram/plain;dir:LR/class/[FAM Record|FID: Family ID;IID: Individual ID;Paternal ID;Maternal ID;Sex;Phenotype], [BED Record|IID: Individual ID;Variant Identifier;Genotype Allele 1;Genotype Allele 2], [BIM Record|Chromosome;Variant Identifier;Genetic Distance;Base-pair position;Reference allele;Alternate allele], [FAM Record] 1-* [BED Record], [BED Record] *-1 [BIM Record])

3. Pairwise IBS matrix format

  • An IBS Matrix is stored in a Genome file, which generates 1 line for each potential pair of patients. Thus if there is N patients, there will be N(N-1)/2 rows. The columns stores some statistical computations.

![Pairwise IBS Metrics Class Diagram](http://yuml.me/diagram/plain;dir:LR/class/[Genome Record|FID1: Family ID of individual 1;IID1: Individual ID of individual 1;FID2: Family ID of individual 2;IID2: Individual ID of individual 2;RT: Relationship type given PED file;EZ: Expected IBD sharing given PED file;Z0: P%28IBD=0%29; Z1: P%28IBD=1%29; Z2: P%28IBD=2%29; PI_HAT: Proportion IBD; PHE: Pairwise phenotypic code; DST: IBS distance; PPC: IBS binomial test; RATIO: of HetHet; IBS0; IBS1; IBS2; HOMHOM; HETHET])

4. IBS clustering formats

![Cluster Class Diagram](http://yuml.me/diagram/plain;dir:LR/class/ [Cluster 1|CID: Cluster ID;FID1_IID1 : Family ID1_Individual ID1 ;...;...;...;FIDN_IIDM : : Family ID N_Individual ID M], [Cluster 2|FID: Family ID;IID: Individual ID;CID: Cluster ID], [Cluster 3|FID: Family ID;IID: Individual ID;List of cluster distances])

5. MDS analysis formats

  • An MDS Analysis generates different files depending on the option being used:
    • --mds-plot generates a MDS file. This file can than be plotted into R Software to visualize the data.
    • --matrix generates a MIBS file.
    • --distance-matrix generates a MDIST file.
  • La commande --mds-plot génère le fichier MDS qui contient les champs ci dessous:

![MDS Class Diagram](http://yuml.me/diagram/plain;dir:LR/class/[MDS|FID: Family ID;IID: Individual; ID SOL: Assigned solution code (from --cluster);C1: Position on first dimension;C2: Position on second dimension;C3: Position on third dimension;C4: Position on fourth dimension])

  • La commande --matrix génère le fichier MIBS qui contient un carré, matrice symétrique des IBS distances pour toutes les paires d'individus.

MDS Class Diagram

  • La commande --distance-matrix génère le fichier MDIST: Dans les versions antérieures à v1.00, il n'y a pas les commandes --distance-matrix et --matrix qui génèrent les fichiers plink.mdist et plink.mibs, ce sont des similitudes et non pas distances.

MDS Class Diagram

6. IBS Clustering outliers format

  • An Outlier detection analysis is performed with the --neighbour option and generates a NEAREST file.
  • Commande pour plink 1.07 : plink --file data --cluster --neighbour n1 n2
  • Commande pour plink 1.9 : plink --neighbour n1 n2 (Pas besoin de specifier --cluster)

Où : n1 et n2 représentent l'offset des voisins les plus proches.

La commande genere un fichier de la structure suivante. ![IBS Outliers Class Diagram](http://yuml.me/diagram/plain;dir:LR/class/[NEAREST|FID: Family ID;IID: Individual ID;NN: Nearest neighbour level;MIN_DST: IBS distance of nth nearest neighbour;Z: MIN_DST converted to a Z score;FID2: Family ID of the nth nearest neighbour;IID2: Individual ID of the nth nearest neighbour;PROP_DIFF: Proportion of significantly different others])

7. Other formats

TODO: à compléter par l'équipe de MGL804 (si nécessaire)

Clone this wiki locally