This repository contains scripts and results related to structure- and homology-directed protein recombination experiments conducted with the Schreiter Lab.
SCHEMA-RASPP is a protocol for structure-directed recombination using a parent structure and multiple homologous sequences. We have forked the original CalTech repository and initialized the forked version as a submodule in this repository (origin is hbhargava/SCHEMA-RASPP). It is important to keep the submodule up to date by periodically running git submodule foreach git pull origin master
to fetch the latest version.
The Python script compute_chimeras
takes as input a single directory containing a subdirectory for each homolog to be recombined (each subdirectory contains a FASTA sequence file and, optionally, a PDB structure file). Several additional parameters are specified by the user in the course of the computation. The overall output is a list of chimeras with corresponding SCHEMA energies and mutation scores.
To begin a computation, run the following:
python compute_chimeras.py -i /full/path/to/input/dir -o /full/path/to/output/dir
This command will initiate the chimera generation process. The steps are outlined below.
- Search input directory for potential parent sequences and structures and ask user which sequence is to be used for structure guidance and which are to be used as normal parent structures.
- Build FASTA files for all selected sequences (including structure parent).
- Compute sequence alignment of all parent sequences using ClustalOmega.
- Build FASTA file with structure parent sequence and sequence from PDB file.
- Compute sequence alignment for both parent sequences using ClustalOmega.
- [In progress]: Compute the sequence identity for all possible pairs of homologs.
- Copy parent PDB structure file to output folder for use by SCHEMA algorithm.
- Use
SCHEMA_RASPP.schemacontacts
viaSR_interlink
to perform radial search and determine contacts present in structure. - Obtain desired number of crossovers from user and use
SCHEMA_RASPP.rasppcurve
viaSR_interlink
to compute the RASPP curve. - Obtain desired crossover locations from user and use
SCHEMA_RASPP.schemaenergy
viaSR_interlink
to compute SCHEMA energies and mutation scores for all chimeras.
numpy
for miscellaneous computationbiopython
for FASTA parsing and moreclustalomega
for sequence alignmentpicker
for user multi-selection