Skip to content

miguelcorrea/PKSpop

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PKSpop

A pipeline to predict type I PKSs protein order in polyketide biosynthetic assembly lines.

PKSpop comprises three main steps to infer protein order:

  1. Identify class memberships for query docking domains and align the sequences
  2. Pair each class I Ndd with all class I Cdds and use Ouroboros to predict the interaction probability for each pair. The probabilities are filled into a matrix
  3. Infer the protein order by a greedy probability matrix-filling method which takes the assembly line constraints and compatibility class into account.

Run PKSpop

python PKSpop/code/run_analysis.py input_json_file.json

Input

The input of PKSpop is a JSON file which contains following informaion:

  • gbk_path: path where the antiSMASH .gbk file of the query PKS gene cluster
  • protein_id: list of identifiers of query proteins whose order will be predicted
  • id_category: the category of the protein identifiers: "gene", "protein_id" or "locus_tag"
  • result_path: path where result will be saved
  • Ouroboros_path: path to Ouroboros repositry
  • Ouroboros_int_frac: list of int_frac parameter of Ouroboros with default [0.9, 0.8]. It is recommended to add number below 0.8 if there are more than 10 query proteins.
  • n_repeat: number of repeat time to run Ouroboros with each int_frac parameter An example can be found in data/test

Output

The prediction results are in result_path/output/

  • prediction_result.txt gives the predicted order
  • int_prob_mtx.csv is the pairwise interaction probabilities matrix of query proteins predicted by Ouroboros
  • int_prob_mtx.png show the interaction probabilities matrix

Additional files used in prediction process:

  • dd_raw.fasta is the raw sequences extracted from the input .gbk file
  • dd_class_*.fasta is the sequences pf 3 compatibility classes
  • dd_hmmscan_oupt.txt is the result of hmmscan, which contains the class information of the sequences
  • dd_class_1_aln.afa is the aligned class 1 sequences
  • dd_class_1.afa is the conserved region on the aligned sequences
  • dd_class_1_paired.afa is the paired sequences of all query proteins
  • dd_class_1_ouro_inpt.fasta is the fasta file that input into Ouroboros
  • Ouroboros_class_1_ouro_inpt_soft_warm is the Ouroboros' output

License

This project is licensed under the BSD-3 license. See the LICENSE file for details.

Requirements

PKSpop requires Python 3.6+. The following tools should be installed/downloaded before running PKSpop:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%