Skip to content

Machine Learning predictor of the continental origin using SNP data

License

Notifications You must be signed in to change notification settings

tropicalberto/snp_continental_predictor

Repository files navigation

Predicting continental origin from genomic information on human Chromosome 1

Is Machine Learning racist?


Repository for the Machine Learning course (VU University, 2017)

Description

A continental predictor trained with Single-nucleotide polymorphism (SNP) data based on a Machine Learning approach.
The repository contains the source code and the full report.

The trained algorithms include:

  • Random Forest (RF)
  • Naive Bayes classifier
  • Support Vector Machine
  • Ensemble (stacking approach)

The feature selection incorporates an approach based on Information Gain quantification.

The model evaluation is based on a Receiver Operating Characteristic plot (ROC) and a confusion matrix.

The statistical significance of the models was assessed with a permutation test.

Dataset

SNP genotype from The 1000 Genomes project.

Dependencies

  • Numpy
  • Pandas

Authors

All authors contributed equally to this work.

About

Machine Learning predictor of the continental origin using SNP data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages