NTF2Gen: An enumerative algorithm for full-atom models of proteins belonging to the NTF2-like superfamily
If you find the scripts in this repository useful, please cite https://doi.org/10.1073/pnas.2005412117.
This algorithm samples a wide diversity of protein structures by carrying out backbone
sampling at two levels. At the top level, sampling is carried out in the space of high-level
parameters that define the overall properties of the NTF2 fold: for example, the overall sheet length and
curvature, the lengths of the helices that complement the sheet, the placement of the pocket opening and
the presence or absence of C-terminal elements. We then convert each choice of high-level
parameters into structure blueprint/constraints pairs, which guide
backbone structure sampling at successive stages of fold assembly. In a final sequence design step, for each generated
backbone, low energy sequences are identified through combinatorial sequence optimization using
RosettaDesign.
The NTF2Gen repository contains all the tools for de novo design of NTF2-like proteins. The main script
is CreateBeNTF2_backbone.py, which manages the construction of NTF2 backbones, followed by
DesignBeNTF2.py (BeNTF2seq/Nonbinding, or DesignBeNTF2_test1.py at BeNTF2seq/design_with_PSSM to design using PSSMs), which designs sequence on a given backbone generated by the previous script. To
generate backbones from a specific set of parameters, use CreateBeNTF2PDBFromDict.py.
The fundamental building blocks of the backbone generation protocol are Rosetta XML protocols (included in the repository) that are specialized instances of the BlueprintBDRMover Rosetta fragment assembly mover. All backbone quality checks and filters previous to design are implemented either in the XML files or the python scripts. The design script is also based on a set of XML protocols, one for each design stage. The glycine placement in highly curved strand positions and the selection of pocket positions are managed by DesignBeNTF2.py (or DesignBeNTF2_test1.py at BeNTF2seq/design_with_PSSM to design using PSSMs). Pocket positions are selected by placing a virtual atom in the midpoint between the H3-S3 connection and the S6 bulge, and choosing all positions whose Cα-Cβ vector is pointing towards the virtual atom (the Vatom-Cα-Cβ angle is smaller than 90º), and their Cα is closer than 8Å.
pyrosetta*
pandas
*pyrosetta is free (with a subscription) for academic use: http://www.pyrosetta.org/dow
As the overarching goal of this work is to expand the set of available protein structures with pockets, we generated a final set of scaffolds that incorporates all of the lessons from this study. Here we present proteins from 1,619 unique parameter combinations with improved stability-related metrics (see SI Appendix, Supplementary Methods and Figs. S33, S34, and S40 for pocket diversity). We have made this set of 32,380 scaffolds (20 models with different sequences per parameter combination) available for general use as starting points for ligand binding and enzyme design.
To access them, download this repo and go to ./BeNTF2seq/design_with_PSSM/final_set
Then run:
cat final_set.tar.gz.part?? > final_set.tar.gz && tar -xzf final_set.tar.gz