This repository contains the code for NetAurHPD model based on the paper "Network Auralization Hyperlink Prediction Model to Identify Metabolic Pathways from Metabolomics Data" by Tamir Bar-Tov, Rami Puzis and David Toubiana. Link to the paper
Originaly NetAurHPD developed as a framework that relies on (1) graph auralization to extract and aggregate representations of nodes in metabolite correlation networks and (2) data augmentation method that generates metabolite correlation networks given a subset of chemical reactions defined as hyperlinks. Network Auralization is an innovative application of sound recognition neural networks to predict centrality measures of nodes within a network, by learning from the ”sound” emitted by network nodes. Networks can be likened to resonating chambers, where sound propagates through nodes and links, generating a waveform-based representation for every node. The core process of network auralization involves the propagation of energy among nodes to their neighbors until the total energy is evenly distributed throughout the entire network. In NetAurHPD we average hyperlinks waveforms to represent a hyperlink throgh a signal. Based on these hyperlinks waveforms we train M5 (very deep convolutional neural network) as classification model.
In this repository we present NetAurHPD results on common hyperlink predictions tasks as demonstrated in A Survey on Hyperlink Prediction
Performing prediction using NetAurHPD requires three steps:
- Run network auralization to find wave form to each node.
- Average the waveforms to represent hyperlinks
- Train M5 as classifier.
Deafult configurations in confog file:
- alpha = 0.5 , Proportion of genuine nodes to retain in the negative samples.
- beta = 1 , ration between positiva and negative samples.
- l = 10,000 , waveform length.
- train_size = 0.6
- stride = 8 , sliding window step in M5.
- n_channel = 32 , Number of output channels for the M5 layer.
- epochs = 50 , training iterations.
- lr = 0.01 , learning rate.
Each sample should be in the shape of: ID: {'label': 'positive', 'nodes': [1, 2]}
For example:
{1185: {'label': 'positive', 'nodes': [61, 108]},
793: {'label': 'positive', 'nodes': [58, 78]},
1139: {'label': 'negative', 'nodes': [101, 112]}}
Example showing how to perform prediction using NetAurHPD:
""" required dictionaries:
train_positive_hyperlink_dict
train_hyperlink_dict
test_hyperlink_dict
y_train (tensor)
y_test (tensor)
nodes list
"""
instrument = SignalPropagation(momentum=0.999, response_len=10000, tqdm=lambda x: x, device)
train_hyperlinks_waveforms, test_hyperlinks_waveforms = instrument.networkx_auralization(
train_positive_hyperlink_dict,train_hyperlink_dict,test_hyperlink_dict,nodes,how_graph=True)
NetAurHPD_DL_architecture = NetAurHPD_M5(n_input=1, n_output=1, stride=config.stride,
n_channel=config.n_channel)
y_pred = NetAurHPD_DL_architecture.predict(train_hyperlinks_waveforms, y_train,
test_hyperlinks_waveforms, y_test,lr=config.lr,
total_iters = config.epochs)
The data_preprocess
and create_train_and_test_sets
functions load and transform data into suitable training and test sets for model training.
The SignalPropagation
class implements the Network Auralization method to learn the underlying graph structure. This module is also responsible for applinig auralization over networkx graph.
The component averages node signals into hyperlink waveforms for further analysis.
The M5 architecture is a very deep convolutional neural network designed for sound tasks. In this case, it is structured for binary classification tasks. This module is also responsible for training the M5 model and evaluating its performance on the dataset.
The config
module contains various configurations and hyperparameters used throughout the project.
The utilities module includes the negative_sampling
function, which generates negative hyperlinks to enhance the training dataset.
The code was implemented in python 3.9. All requirements are included in the requirements.txt file.