Skip to content

Latest commit

 

History

History

data

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

README - Datasets

The data folder contains all the information ocncerning the datasets used in the study (i.e., drug-target interactions, drug-substructure annotations and drug-drug chemical similarity). The directory structure has the following convention:

  • The DT folder contains the drug-target interaction matrices for each dataset
  • The DS folder contains the drug-substructure interaction matrices for each combination of dataset and molecular descriptor
  • The DD folder contains the drug-drug similarity matrices for each combination of dataset and molecular descriptor/similarity measure
  • The SMILES folder contains the SMILES for the drugs present in each dataset
  • The LUT folder contains the look-up tables for drugs and targets of each dataset

The folder data/yamanishi2008 contains all the information of the 4 gold-standard datasets proposed by Yamanishi, et al (2008): Enzyme (denoted as e throughout the files), Ion Channel (denoted as ic throughout the files), GPCR (denoted as gpcr throughout the files) and Nuclear Receptor (denoted as nr throughout the files). The folder data/wu2017 contains all the information for the Global dataset proposed by Wu, et al (2017). The folder data/chembl contains all the information for the ChEMBL Clinical Candidates & Drugs dataset used in the time-split validation.