Skip to content
ArnovanHilten edited this page Apr 25, 2022 · 24 revisions

GenNet

Framework for Interpretable Neural Networks for genetics

  1. Getting started
  2. GenNet command line.

2. Tutorial

Install GenNet according to the readme. TIP: if you are using GenNet on a cluster there are often precompiled modules available. Create a virtual environment and load the precompiled modules (For example: module module load TensorFlow/2.2.0-fosscuda-2019b-Python-3.7.4) before pip3 install -r requirements_GenNet.txt.

To test GenNet you can run the example study. To run the classification example:

  1. Activate your virtual environment and navigate to the GenNet folder.

  2. Train the network on the example data: python GenNet.py train -path ./examples/example_classification/ -ID 1. The first argument is the path to the example_classification folder. The second argument is the jobid, an unique number for each experiment. If you ran an experiment succesfully and use the same jobid the network will load the trained network from the previous experiment and use this to evaluate the performance on the validation and test set. More information about the arguments and the optional arguments can be inspected using python GenNet.py train --help. After using the command it shows first information about your GPU followed by an overview of the network and the training process of the network. Training the example network should take a couple of minutes.

  3. Use the build-in plot functions to visualize your results. To see your options use: python GenNet.py plot --help or the plot section in Modules. Visualing the example study:

    • python GenNet.py plot -ID 1 -type manhattan_relative_importance Manhattan plot using the relative importance (multiplication of all the weights from the output to the input)
    • python GenNet.py plot -ID 1 -type sunburst the relative importance are summed over genes, pathways or tissues and displayed in a sunburst plot.

    Or plot the weights of the network per layer:

    • python GenNet.py plot -ID 1 -type layer_weight -layer_n 0
    • python GenNet.py plot -ID 1 -type layer_weight -layer_n 1
    • python GenNet.py plot -ID 1 -type layer_weight -layer_n 2

The manhattan with the relative importance of all input SNPs is shown below. All plots for the classification example can be found here: https://github.com/ArnovanHilten/GenNet/tree/master/figures/classification_example

3. GenNet command line.

Preparing the data

As seen in the overview the commmand line takes 3 inputs:

  1. genotype.h5 - a genotype matrix, each row is an example (subject) each column is a feature (e.g. genetic variant).
  2. subject.csv - a .csv file with the following columns:
    • patient_id: am ID for each patient
    • labels: phenotype (with zeros and ones for classification and values for regression)
    • genotype_row: in which row the subject is in the genotype.h5 file
    • set: in which set the patient belongs (1 = training set, 2 = validation set, 3 = test, others= ignored)
  3. topology - each row is a "path" of the network, from input to output node.

Topology example (from GenNet/processed_data/example_study) :

layer0_node layer0_name layer1_node layer1_name layer2_node layer2_name
0 SNP0 0 HERC2 0 Causal_path
5 SNP5 1 BRCA2 0 Causal_path
76 SNP76 6 EGFR 1 Control_path

NOTE: It is important to name the column headers as shown in the table. The input 5 is connected to the node number 1 in layer 1. That node is connected to node 0 in layer 2. This is the last given layer name so this node is also connected to the output. The network will have as many layers as there are columns with the name layer.._node. Creating 10 columns with the names layer0_node, layer1_node.. layer10_node will results in 10 layers.

Tip: Use as example the example study found in the processed_data folder.

Modules

usage: GenNet.py [-h] {convert,train,plot,topology} ...

GenNet: Interpretable neural networks for phenotype prediction.

positional arguments:
  {convert,train,plot,topology}
                        GenNet main options
    convert             Convert genotype data to hdf5
    train               Trains the network
    plot                Generate plots from a trained network
    topology            Create standard topology files

optional arguments:
  -h, --help            show this help message and exit

GenNet.py convert

The current pipeline works well for small (< 100 GB) datasets for larger datasets please contact [email protected]

example: python GenNet.py convert -g /media/charlesdarwin/plink/ -o /media/charlesdarwin/processed_data/ -study_name name_of_plink_files -step all

usage: GenNet.py convert [-h] [-g GENOTYPE [GENOTYPE ...]] -study_name
                         STUDY_NAME [STUDY_NAME ...] [-variants VARIANTS]
                         [-o OUT] [-ID] [-vcf] [-tcm TCM]
                         [-step {all,hase_convert,merge,impute,exclude,transpose,merge_transpose,checksum}]
                         [-n_jobs N_JOBS]

optional arguments:
  -h, --help            show this help message and exit
  -g GENOTYPE [GENOTYPE ...], --genotype GENOTYPE [GENOTYPE ...]
                        path/paths to genotype data folder
  -study_name STUDY_NAME [STUDY_NAME ...]
                        Name for saved genotype data, without ext
  -variants VARIANTS    Path to file with row numbers of variants to include,
                        if none is given all variants will be used
  -o OUT, --out OUT     path for saving the results, default ./processed_data
  -ID                   Flag to convert minimac data to genotype per subject
                        files first (default False)
  -vcf                  Flag for VCF data to convert
  -tcm TCM              Modifier for chunk size during TRANSPOSING make it
                        lower if you run out of memory during transposing
  -step {all,hase_convert,merge,impute,exclude,transpose,merge_transpose,checksum}
                        Modifier to choose step to do
  -n_jobs N_JOBS        Choose jobs > 1 for multiple job submission on a
                        cluster

GenNet.py train

Trains the neural network. The first argument is the path to the folder with the three required files. The second argument is the experiment identifier.

Example: python GenNet.py train ./processed_data/example_study/ 1

Usage: GenNet.py train [-h] [-problem_type {classification,regression}] [-wpc weight positive class] [-lr learning rate] [-bs batch size] [-epochs number of epochs] [-L1] path ID

Positional arguments:
  path                  path to the data
  ID                    ID of the experiment


optional arguments:
  -h, --help            show this help message and exit
  -problem_type {classification,regression}
                        Type of problem, choices are: classification or
                        regression
  -wpc weight positive class
                        Hyperparameter:weight of the positive class
  -lr learning rate, --learning_rate learning rate
                        Hyperparameter: learning rate of the optimizer
  -bs batch size, --batch_size batch size
                        Hyperparameter: batch size
  -epochs number of epochs
                        Hyperparameter: batch size
  -L1                   Hyperparameter: value for the L1 regularization
                        pentalty similar as in lasso, enforces sparsity

GenNet.py plot

Generate plots from results

latest info python GenNet.py plot --help

Example: python GenNet.py plot 1 -type layer_weight -layer_n 0

Example: python GenNet.py plot 1 -type sunburst

Example: python GenNet.py plot 1 -type manhattan_relative_importance

Usage: GenNet.py plot [-h] [-type {layer_weight,sunburst,manhattan_relative_importance}] [-layer_n Layer_number:] ID

positional arguments:
  ID                    ID of the experiment

optional arguments:
  -h, --help            show this help message and exit
  -type {layer_weight,sunburst,manhattan_relative_importance}
  -layer_n Layer_number:
                        Only for layer weight: Number of the to be plotted