Framework for Interpretable Neural Networks for genetics

Install GenNet according to the readme. TIP: if you are using GenNet on a cluster there are often precompiled modules available. Create a virtual environment and load the precompiled modules (For example: module module load TensorFlow/2.2.0-fosscuda-2019b-Python-3.7.4) before pip3 install -r requirements_GenNet.txt.

To test GenNet you can run the example study. To run the classification example:

  1. Activate your virtual environment and navigate to the GenNet folder.

  2. Train the network on the example data: python train ./examples/example_classification/ 1. The first argument is the path to the example_classification folder. The second argument is the jobid, an unique number for each experiment. If you ran an experiment succesfully and use the same jobid the network will load the trained network from the previous experiment and use this to evaluate the performance on the validation and test set. More information about the arguments and the optional arguments can be inspected using python train --help. After using the command it shows first information about your GPU followed by an overview of the network and the training process of the network. Training the example network should take a couple of minutes.

  3. Use the build-in plot functions to visualize your results. To see your options use: python plot --help or the plot section in Modules. Visualing the example study:

    • python plot 1 -type manhattan_relative_importance Manhattan plot using the relative importance (multiplication of all the weights from the output to the input)
    • python plot 1 -type sunburst the relative importance are summed over genes, pathways or tissues and displayed in a sunburst plot. Or plot the weights of the network per layer:
    • python plot 1 -type layer_weight -layer_n 0
    • python plot 1 -type layer_weight -layer_n 1
    • python plot 1 -type layer_weight -layer_n 2

3. GenNet command line.

Preparing the data

As seen in the overview the commmand line takes 3 inputs:

  1. genotype.h5 - a genotype matrix, each row is an example (subject) each column is a feature (e.g. genetic variant).
  2. subject.csv - a .csv file with the following columns:
    • patient_id: am ID for each patient
    • labels: phenotype (with zeros and ones for classification and values for regression)
    • genotype_row: in which row the subject is in the genotype.h5 file
    • set: in which set the patient belongs (1 = training set, 2 = validation set, 3 = test, others= ignored)
  3. topology - each row is a "path" of the network, from input to output node.

Topology example (from GenNet/processed_data/example_study) :

layer0_node layer0_name layer1_node layer1_name layer2_node layer2_name
0 SNP0 0 HERC2 0 Causal_path
5 SNP5 1 BRCA2 0 Causal_path
76 SNP76 6 EGFR 1 Control_path

NOTE: It is important to name the column headers as shown in the table. The input 5 is connected to the node number 1 in layer 1. That node is connected to node 0 in layer 2. This is the last given layer name so this node is also connected to the output. The network will have as many layers as there are columns with the name layer.._node. Creating 10 columns with the names layer0_node, layer1_node.. layer10_node will results in 10 layers.

Tip: Use as example the example study found in the processed_data folder.


Open the command line and navigate to the GenNet folder then choose one of the modes: convert

Work in progress. train

Trains the neural network. The first argument is the path to the folder with the three required files. The second argument is the experiment identifier.

Example: python train ./processed_data/example_study/ 1

Usage: train [-h] [-problem_type {classification,regression}] [-wpc weight positive class] [-lr learning rate] [-bs batch size] [-epochs number of epochs] [-L1] path ID

Positional arguments:
  path                  path to the data
  ID                    ID of the experiment

optional arguments:
  -h, --help            show this help message and exit
  -problem_type {classification,regression}
                        Type of problem, choices are: classification or
  -wpc weight positive class
                        Hyperparameter:weight of the positive class
  -lr learning rate, --learning_rate learning rate
                        Hyperparameter: learning rate of the optimizer
  -bs batch size, --batch_size batch size
                        Hyperparameter: batch size
  -epochs number of epochs
                        Hyperparameter: batch size
  -L1                   Hyperparameter: value for the L1 regularization
                        pentalty similar as in lasso, enforces sparsity plot

Generate plots from results

latest info python plot --help

Example: python plot 1 -type layer_weight -layer_n 0

Example: python plot 1 -type sunburst

Example: python plot 1 -type manhattan_relative_importance

Usage: plot [-h] [-type {layer_weight,sunburst,manhattan_relative_importance}] [-layer_n Layer_number:] ID

positional arguments:
  ID                    ID of the experiment

optional arguments:
  -h, --help            show this help message and exit
  -type {layer_weight,sunburst,manhattan_relative_importance}
  -layer_n Layer_number:
                        Only for layer weight: Number of the to be plotted