examples/
). The following dependencies are needed to start the pipeline. To start the pipeline, several parameters are needed, given in a file (inputvalues.dat
, e.g. see in the respective example folder). Before starting, the paths to Ernwin, SimRNA and the pipeline itself must be updated in the inputvalues.dat
file. ./start.sh examples/from_2D/test/inputvalues_test.dat
example/
folders. The raw pipeline result consists of numerous SimRNA trajectories containing each sampled 3D structure. To facilitate the analysis of this large collection of 3D structures we translate each structure back to 2D and generate a csv
file for each ncluster x nrun x nsim
. These csv
files provide detailed information, particularly regarding the formation of base pairs:
nstep
- dotbracket representation
- SimRNA based values:
- energy
- energy plus constraint
- temperature
- "constancy":
nstep
until bps change in the structure - basepairlist of
- whole structure
- the interaction site
- intramolecular bps of chain-A
- intramolecular bps of chain-B
- difference to the start structure for this extension step
- difference to the constrained structure
- difference to the constrained interaction site
- difference to the
nstep
structure before - intermoleculare bps that do not belong to the main interaction e.g separated by intramoleculare bps
- multiplets
- bps that occur neither in the start nor in the target structure
- basepairs count of
- chain-A
- chain-B
- interaction length (perfect helix)
- interaction length with loops allowed
- intermoleculare bp do not belong to the main interaction
- bps that occure neither in the start nor in the target structure
- count of bps that differ to
- the start structur for this expansion step
- the
nstep
structure before - the constrained structure
- the constrained interaction
Additionally, for each ncluster x nrun
, two summaries are provided. The first summary, stored in a .csv_bp
file, includes all occurring base pairs and their frequencies. The second summary focuses on the frequency of dotbracket structures.
Furthermore, after each extension step, the .interaction-csv
file contains all structures that are considered for further extension. The best structure, which is the first entry in the .csv
file, is selected. This selected structure is then translated into a full-atom PDB format. For more details on the selection process, please see selectnext.py
.
start.sh
(e.g for all examples in examples/from_2D/
). Second, it is also possible to start from an already existing 3D structure in PDB format. The script
startexpansion.sh
is used for this purpose (e.g for the HIV kissing hairpin interaction in examples/from_pdb/
). Both start options allow you to costumise the pipeline conditions of the interaction extension through the parameters defined in the
inputvalues.dat
file.
-
inputvalues.dat
VARIABLE VALUE/SAMPLE DESCRIPTION more Details START /pathto/RRI-3D/examples/from_2D Input path for all structure conditional files which are needed for the start and Output path [required files] BASENAME test0 File/Structure name for the RNAdesign NAME test0 Core name of the file/structure PROGS /pathto/RRI-3D/src Path to this git repro and it's scripts [scripts] DESIGNS 3 specifies how many different RNA designs of this structure should be created and calculated [RNAblueprint] ERNWIN /pathto/ernwin Path to ernwin [dependency] ERNITERATIONS 100000 Number of structures to generate during ernwin simulation [ernwin] ERNROUND 10000 Save the best (lowest rmsd) n structures during a ernwin simulation [ernwin] FALLBACKSTATES true | false Additional short artificial structures that can be used as fallback fragments if less or no examples of a secondary structure element (ernwin) could be found in the PDB. [ernwin] CLUSTER 10 ncluster
; Cluster the ernwin structures based on the used coarse grained elements[ernwin-script] SIMRNA /pathto/simrna Path to simrna [dependency] WHERE local | cluster run the SimRNA simulation locally or on a slurm cluster [dependency] SIMROUND 5 nrun
; Number of SimRNA runs with the same setting but a different seed.[SimRNA] TREESEARCH true | false SEED step | random Setting the SimRNA seed; step correspond to the respective SIMROUND [SimRNA] TYPE expand [SimRNA] RELAX relax_test SimRNA settings for a first relaxed run. E.g. examples/from_2D/config_relax_test.dat
.[SimRNA] EXTEND expand_test SimRNA settings for the expansion mode. E.g. examples/from_2D/config_expand_test.dat
.[SimRNA] ROUND 0 Start with Round 0
[expansion settings] ROUNDS 100 Number of expansion rounds. A value > the possible interaction length corresponds to an automatic extension up to the longest continuous (perfect helix) interaction (exception: TARGET = true). [expansion settings] TARGET true | false Instead of an expansion until there is no more complementarity, it is based on a target interaction. BUFFER 2 length of linker/buffer region around the interaction site [expansion settings] EXPANDBMODE 1-7 - right and left at once
- only right
- only left
- alternate right and left
- alternate left and right
- first right then left ; 1 and then 2
- first left then right ; 2 and then 1
- provided dotbracket notation with all intermediates
[expansion settings] CONSECUTIVEPERFECT true | false Selection of the best/longest interaction for the next expansion based on a consecutive interaction / an interaction incl. bulges [selectnext] CONTSEARCH1 force | interaction | Structure selection type after the relax-run [selectnext] CONTSEARCH2 force | interaction | Structure selection type for the next expansion step [selectnext] - Structure information
- Each filename must consist of the BASENAME and the respective ending (see below) and must be stored in the START directory.
The nucleodide sequence must be written in capital letters. Allowed are the four nucleobases: adenine, guanine, cytosine, uracil.- *.fa
- FASTA-FILE for ernwin_start
- *.seq
- Usage: SimRNA & pipeline-scripts
- *.ss
- Usage: SimRNA & pipeline-scripts
Represents the secondary structure constraint for the current extension round - *.ss_cc
- Usage: SimRNA & pipeline-scripts
Secondary structure constraint from the last extension round - *.il
- *_target.ss
- Usage: expansion settings
Secondary structure to be reached
>Name >test0
NAME = BASE NAME Sequence CUUGCUGAAGUGCACACAGCAAG&CUUGCUGAAGUGCACACAGCAAG
The separator between two sequences is a & character Dotbracket (((((((..[[.....)))))))&.............]]........
Single line dotbracket notation; each pseudonode/interaction is represented by a new bracket type e.g.b [ ], { }, < > Sequence CUUGCUGAAGUGCACACAGCAAG CUUGCUGAAGUGCACACAGCAAG
Same sequence as in fasta file but the separator between two sequences is a whitespace Dotbracket ((((((.........))))))) (((((((.........)))))))
........((((........... .........)))).........
Dotbracket notation with classical round brackets and dots.
A "bracket" crossing requires the start of a new line, e.g. 1st line intramolecular structure, 2nd line interaction.
For the start of the simulation the .ss-file contains the native start dotbracket notation.Dotbracket ((((((.........))))))) (((((((.........)))))))
........((............ ..........))..........
In the previous extension step achieved secondary structure.
For the first run it must conform to the .ss dotbracket notation by default.- If no extension (no longer interaction) compared to the previous run is recorded in this file, the respective run stops.
Control file which specifies how many base pairs make up the extended interaction of the "best" structure of a run.
Must contain a0
at the beginning..ss (((((((.........))))))) (((((((.........)))))))
.........((((((........ .........))))))........
_target.ss (((((((..((((((.((((((( )))))))..)))))).)))))))
Without a target structure (TARGET = FALSE) the extension stops when no more complimentary base pairing is possible -> Needed if the extended target interaction contains bulges. -
config.dat
- The config.dat file contains parameters for the SimRNA simulations, e.g how many nsteps should be made per nsim. SimRNA comes with a default config.dat file (see dependencies), but it is recommended to customise it for the use with the pipeline. This can be done separately for the relaxation run after simulating the start structure in Ernwin on the one hand (inputfile variable: RELAX), and for the runs to extend the interaction site (input variable: EXPAND) on the other.
In the foldersrc/SimRNA_config
you can find several example ''.dat'' files. If you want to use these configurations please copy them into the original SimRNA folder or adapt theconfig.dat
file in the original SimRNA folder individually and according to the pipeline.
The expansion can be started from a dotbracket structure (SimRNA format), as well as from a base pair list. Allowed are complimentary (A-U, G-C) as well as G-U base pairings. By default, the interaction will be extended by the closest base pair (without bulge). If no extension is possible in the respective step, the simulation stops. If a bulge is desired/structurally necessary it is recommended to specify a target structure to extend to.
An extension can be done to both sides of the interaction simultaneously, as well as to one only chain direction. Another option is to extend the interaction by several base pairs in one step. Furthermore, a buffer/linker region without base pairing between intramolecular and intermolecular structure can be specified.
The following parsing options can be selected:
FLAG | NAME | TYPE | DEFAULT | DESCRIPTION |
-d |
--dotbracket |
path to file | none | Dotbracket structure in SimRNA style |
-x |
--basepairlist |
path to file | none | Basepair list e.g. ((....)) --> [[1,8],[2.7]] |
-n |
--nucleotides |
path to file | none | Nucleotide sequence |
-o |
--output |
path to file/filename | none | Path and name of the outputfile |
-t |
--target |
path to file | none | End/Target structure |
-s |
--stepsize |
int | 1 | How many nucleotides should be added to the interaction (on one site). |
-r |
--right |
boolean | default both TRUE | Expand right |
-l |
--left |
boolean | default both TRUE | Expand left |
-b |
--buffer |
int | 0 | Length of the buffer/linker region, no intra- and interaction allowed, before and after the interaction site. |
-v |
--verbose |
store_true | FALSE | Be verbose |
Further Descriptions & Examples
Expand the interaction right (-r) or left (-l): |
((((.............)))) ((((.(((...............))))))) ....R(((((((((((L.... .........L)))))))))))R........ |
-b 2 / --buffer 2 |
(---............---)) ((((.((---...........---)))))) ....R(((((((((((L.... .........L)))))))))))R........ |
RNAblueprint is a library for designing sequences that are compatible with multiple structural constraints. This allows us to generate multi-stable RNAs, i.e. RNAs that switch between several pre-defined structures.
The main function performs a simple optimization using simulated annealing. The crucial part is the objective() function, which is now designed such that it becomes minimal when the Boltzmann ensemble is dominated by the two target structures.
The following parsing options can be selected:
FLAG | NAME | TYPE | DEFAULT | DESCRIPTION |
-i |
--input |
path to file | if not given - use default testinput | Secondary Structure - SimRNA format |
-i |
--input |
path to file | testinput | Secondary Structure - SimRNA format |
-o |
--output |
filename | design | Name of the outputfiles. The designs will be saved with the following filename: name + 'design'+ consecutive designnumber.seq |
-n |
--number |
int | 10 | Number of designs |
-s |
--selection |
int | 5 | Number of selected Designs that will be saved as .seq file |
-v |
--verbose |
store_true | FALSE | Be verbose |
Further Descriptions & Examples
- The input:
The first structure (1) describes the two separate hairpins with a connection element (A) the second structure (2) should ensure the complementarity cleaveage of the two hairpins. With the objective2 function every designed hearpin will be evaluated separately.
1 (((((((.........))))))) (((((((.........))))))) 2 ((((((((((((((((((((((( ))))))))))))))))))))))) |
FLAG | NAME | TYPE | DEFAULT | DESCRIPTION |
-p |
--path |
path to files | none | Path to Inputfiles:*_0.ss ,*.seq |
-n |
--name |
filename | none | BASENAME |
-c |
--count |
int | none | Number of samples/designs |
-v |
--verbose |
store_true | FALSE | Be verbose |
- Testinput
> python formattranslation.py -p PATHtoINPUTFILES -n NAMEofINPUTFILES -c 100
FLAG | NAME | TYPE | DEFAULT | DESCRIPTION |
-i |
--input |
path to files | none | Path to the ernwin .coord -files |
-n |
--number |
int | none | Number of saved ernwin structures = --save-n-best in ernwin call |
-c |
--cluster |
int | none | Number of clusters |
-v |
--verbose |
store_true | FALSE | Be verbose |
Output: number of the ernwin sample
FLAG | NAME | TYPE | DEFAULT | DESCRIPTION |
-i |
--input |
path to files | none | Path to the ernwin out.log file |
-v |
--verbose |
store_true | FALSE | Be verbose |
out.log
file: Step, Sampling_Energy, Constituing_Energies, ROG, ACC, Asphericity, Anisotropy, Local-Coverage, Tracked Energy, Tracked Energy, Tracked Energy, time, Sampling Move, Rej.Clashes, Rej.BadMls
FLAG | NAME | TYPE | DEFAULT | DESCRIPTION |
-i |
--input |
name of the inputfile | none | Input .trafl an outputfile from SimRNA |
-p |
--path |
path for the input/output file | none | |
-o |
--output |
name of the outputfile | none | Output .trafl |
-v |
--verbose |
store_true | FALSE | Be verbose |
trafl
file: consec_write_number, replica_number, energy_value_plus_restraints_score, energy_value, current_temperature, datapoints
The function to read/write the structure with the minimum free from a trafl file is also provided directly by SimRNA, bin in this case the the energy_value is used without the constraint - here in this script mainly the energy value plus the constrainet score. Also, the SimRNA script is only available in a python3 environment. Alternatively, this script can be used.
The first output is a csv-file with the following information for each nsim in an extension step:
number, sequence, count_constraint, count_start, count_before, constancy, dif_constraint, dif_start, dif_before, bp, time, energy_values_plus_restraint_score, energy_value, current_temp, interaction, len_interaction, count_interaction_constraint, dif_interaction_cc
The second output is a csv.file with all unique structures collected over all nsim in an extension step:
sequence, count_how_often, count_constraint, count_start, dif_constraint, dif_start, bp, bpstr, interaction, len_interaction, count_interaction_constraint, dif_interaction_cc
FLAG | NAME | TYPE | DEFAULT | DESCRIPTION |
-p |
--path |
path to file | none | Path to SimRNAfiles for input |
-i |
--input |
filename | none | Input ss-sequence |
-c |
--constraint |
filename | none | Constrained ss-sequence |
-t |
--trafl |
filename | none | Traflfile |
-o |
--output |
filename | none | Name of the outputfile |
-m |
--outputmode |
choices= 'w','a' | 'w' | Overwrite ('w') or append ('a') |
-u |
--uniqueoutput |
filename | none | Name of the unique outputfile/or the already existing one |
-v |
--verbose |
store_true | FALSE | Be verbose |
Further Descriptions & Examples
> python comparison.py p /place/with/all/ss-sequences -i ss-constrain -c ssstart -o firstoutput.csv -u secondoutput.csv -m 'w' -t traflfile
FLAG | NAME | TYPE | DEFAULT | DESCRIPTION |
-p |
--path |
path to file | none | Path to Inputfiles |
|
--printout |
store_true | none | Print a csv-file with all minEnergy relevant files |
-f |
--force |
store_true | none | Instead of the most common secondary structure: find the secundary-structures most similar to the constrained one |
|
--interaction |
store_true | none | Instead of the most common secondary structure: Find the interaction-structure most similar to the constrained one |
|
--first |
name | none | Verify the first line in the dataframe - FILENAME for the first line e.g test0_00.ss |
|
--second |
name | none | Verify the second line in the dataframe - FILENAME for the second line e.g test0_00.ss_cc |
-i |
--initialname |
name | none | e.g. test0_01, test0_02, ... |
-c |
--consecutive |
boolean | none | true/false , given through CONSECUTIVEPERFECT |
-v |
--verbose |
store_true | FALSE | Be verbose |
Further Descriptions & Examples
>python selectnext.py -p 00/surface/analyse/ --print --first test0_00.ss --second test0_00.ss_cc -f
- python V3.11
- SimRNA V3.2
- Ernwin V1.2
- for RNA design:
- To open the PyMOL sessions (
.pse
) in the/example
folder with selected 3D structures:
- - Standard packeges: argparse, collections,csv, disutils, glob, itertools, json, logging, operator, optparse, os, random, re, sys, math
- - more-itertools V.9.0.0
- - numpy V.1.24.2
- - pandas V.1.5.3
- - scikit-learn V.1.2.1
- Note: The files supplied with the RRI-3D package under
src/SimRNA_config/config*
are example SimRNA configurations for this pipeline. If you want to use these please copy them into the original SimRNA folder or adapt the config.dat
file in the SimRNA folder individually and according to the pipeline, e.g. see section config.dat - - forgi V2.2.2
- - Note: incl. setup for all-atom reconstruction and fallbackstates
- - PyMol
Our pipeline's runtime is determined by both; the number of ncluster x nrun x nsim x nstep
and how many nrun
are allowed to make a further extension step.
As an example of runtime for our pipeline, we would like to highlight the CopA--CopT simualtion, which is mentioned in our publication. In this simulation, we utilized the following parameters: ncluster=10, nrun=5, nsim=5, nstep=10000
. The simulation was executed over a duration of approximately 19 hours, utilizing up to 10 cores.
- If you use this software package, please cite the follwing publication:
- For the pipeline presented here, parts of the following already published software-features are used: