VeRNAml

The overall goal of this project is to provide a benchmark data set for 2.5D RNA graphs. We further demonstrate the usage of this dataset with some baseline results on predicting RNA binding-site nodes. Furthermore we use motif fingerprints to compare motif sets generated by automated motif finding programs. We will rely on interpretable prediction models at first (ie. decisions trees) to determine motif importance. The overall result will be cleaned and curated data sources; tuned parameters for VeRNAl motif extraction; trained classification models for RNA-{protein/rna/small molecule/function} prediction and novel functional insights on conserved RNA structural patterns.

Associated Repositories:

VeRNAl RNAMigos

The training data for VernaML consists of networkx graphs which are sliced into portions containing RNA interfaces and their respective complement counterparts. Graphs are '2.5D' whereby the tertiary structure is maintained through retaining a discrete set of edge types according to different possible base-pairing geometries. Here is an example of one overlayed on a PDB structure. Backbones are in white, canonical Watson-Crick bonds are in green and non-canonical bonds are in red.

1. FR3D Data

To generate this data:

Retrieve a representative set of RCSB PDB structures.
Find all interfaces within structures.
Slice native RNA graphs into interface and complement parts.

The prepare_data package contains all the scripts to do these tasks. The process can take some time so alternatively the following pre-built datasets can be downloaded from MEGA:

Dataset	Graphs	Edges	Nodes	Avg. Nodes	Avg. Edges	Links
ALL	2679	447225	641968	166.9	239.6	link
ALL complement	9034	195395	228261	21.6	25.3
RNA-Protein	2750	411487	587961	149.6	213.8	link
RNA-Protein complement	8265	241611	322324	29.2	39.0
RNA-RNA	2737	59333	79116	21.7	28.9	link
RNA-RNA complement	2483	55001	70551	22.2	28.4
RNA-Small_Mol.	166	981	1004	5.9	6.0	link
RNA-Small_Mol. complement	140	973	1038	7.0	7.4
RNA-Ion	572	3490	3764	6.1	6.6	link
RNA-Ion complement	493	3691	3993	7.5	8.1

1.1 Retrieve a Representative Set of PDB Structures

To avoid redundancies in the training data the BGSU representative set of RNAs are used. They can be downloaded from here [1]

Make a directory to store the structures

mkdir data/structures

Then run the following command to retrieve the PDB structures from the RCSB database

python prepare_data/retrieve_structures.py <BGSU file> data/structures

1.2 Find Interfaces in the PDB structures and Slice their RNA graphs

Make a directory for the native graphs and the interface graphs

mkdir data/graphs

mkdir data/graphs/interfaces

mkdir data/graphs/native

Download the set of native RNA graphs from here and extract the compressed files into the native directory.

Now run prepare_data/main.py to find all the interfaces and slice the graphs. This process will take a few hours.

python prepare_data/main.py data/graphs/interfaces

Note

The an optional parameter -t can be added to specify the RNA interaction type. The default is all but can be any of rna protein ion ligand. Use a string in quotations seperated by spaces for multple interaction types.
Once the PDB interfaces are found, if you would like to run the script again use -interface_list_input interface_residues_list.csv option to use the interfaces computed from previous call and speed up execution.

2. DSSR Data

The code base to prepare the DSSR data is stored on another repository called RNAGlib which has not been published yet. For now the most recent version of the data can be downloaded here

References

Leontis, N. B., & Zirbel, C. L. (2012). Nonredundant 3D Structure Datasets for RNA Knowledge Extraction and Benchmarking. In RNA 3D Structure Analysis and Prediction N. Leontis & E. Westhof (Eds.), (Vol. 27, pp. 281–298). Springer Berlin Heidelberg. doi:10.1007/978-3-642-25740-7_13
Lu, X. J. & Olson, W. K. 3DNA: A versatile, integrated software sys-tem for the analysis, rebuilding and visualization of three-dimensionalnucleic-acid structures.Nature Protocols3,1213–1227.issn: 17542189.http://3dna.rutgers.edu/.(July 2008).

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
data		data
images		images
motif_encoder		motif_encoder
prepare_data		prepare_data
results		results
sampledata/practice_n100		sampledata/practice_n100
tools		tools
train		train
train_fr3d		train_fr3d
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
balance.csv		balance.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VeRNAml

Associated Repositories:

1. FR3D Data

1.1 Retrieve a Representative Set of PDB Structures

1.2 Find Interfaces in the PDB structures and Slice their RNA graphs

Note

2. DSSR Data

References

About

Releases

Packages

Languages

Jonbroad15/vernaml

Folders and files

Latest commit

History

Repository files navigation

VeRNAml

Associated Repositories:

1. FR3D Data

1.1 Retrieve a Representative Set of PDB Structures

1.2 Find Interfaces in the PDB structures and Slice their RNA graphs

Note

2. DSSR Data

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages