This repo contains the course material for the course Feature selection in GWAS given at the Machine Learning in Genomics intensive week (day 4). The slides are from Chloé-Agathe Azencott and the practicals are adapted from https://github.com/chagaz/ds3-2018-genetics.
This repo includes the slides of the lecture and the jupyter notebooks of the practical sessions. The notebooks cover the same tools as the lecture:
- practical1:
- T-test and Manhattan plots
- Linear regression
- Lasso
- practical2:
- Elastic-net
- Multi-task lasso
- Network-constained lasso
The practicals require writting very little code: most questions are about commenting on the results. Corrected version of the practicals are provided.
-
Clone the repository
git clone https://github.com/goepp/ml-in-genomics-2021/
-
You need to download the heavy files
athaliana_small.X.txt
andathaliana_small.W.txt
here and place them inpractical/data/
. Alternatively, you can just run the notebooks cells which download these two files.
You need python3, conda, and jupyter notebook. An easy way to set things up from scratch is:
-
Create a conda environment:
conda env create --file=environment.yml
. -
Activate the conda env:
conda activate mlgen
. -
Run the jupyter notebook from within the conda env:
jupyter notebook
and your notebook should open in a web browser. You're good to go!