Tutorial

Requirement

Installation and Configuration steps must be completed before this part.

Preface

This tutorial describes how to run ASMC on a family of homologous proteins, named Amine Dehydrogenases (AmDHs), when the active site residues are known (cf. ASMC with user-refined pocket).

A directory named tutorial/ is available at ASMC/docs/ and contains the following input files:

ADH4.pdb : PDB ID 6G1M, chain B.
DH35.pdb : PDB ID 6IAU, chain B.
DHP6.pdb : PDB ID 6IAQ, chain A.
MATA.pdb : PDB ID 7ZBO, chain A.
pocket.csv : list of amino acid residues considered as part of the active site.
sequences.fasta : a set of 954 protein sequences in FASTA format (950 AmDHs + 4 reference AmDHs).

The last file required is reference_file which must be written as follows, replacing <path_to_ASMC> with the path to where the ASMC repository was downloaded:

<path_to_ASMC>/ASMC/docs/tutorial/ADH4.pdb
<path_to_ASMC>/ASMC/docs/tutorial/DH35.pdb
<path_to_ASMC>/ASMC/docs/tutorial/DHP6.pdb
<path_to_ASMC>/ASMC/docs/tutorial/MATA.pdb

NB: if the active site is unknown, please consider this section.

Usage

Change the working directory for <path_to_ASMC>/ASMC/docs/tutorial/ and run ASMC with reference_file, pocket.csv and sequences.fasta called with the -r, -p and -s options, respectively.

cd <path_to_ASMC>/ASMC/docs/tutorial/
asmc run --log run_asmc.log --threads 6 -r reference_file -p pocket.csv -s sequences.fasta

The whole process can be verified by checking the file run_asmc.log.

Once completed, the following output files are available:

models.txt : list of models generated by MODELLER.
identity_targets_refs.tsv : identity percentage between each protein sequence and its reference.
active_site_alignment.fasta : active site sequences for each protein, in FASTA format.
groups_0.12_min_5.tsv : clustering computed by DBSCAN, here with eps=0.12 and min_samples=5 - automatically computed by DBSCAN if not provided by users.
GX.fasta : FASTA file for each DBSCAN group, here with X = [-1:3].
groups_logo.png : sequence logos for all DBSCAN groups, with the number of sequences per group indicated in the bottom right-hand corner.
models/ : directory including all the 3D models listed in models.txt.
pairwise/ : directory including all the structural pairwise alignments, computed by US-align, in FASTA format.
superposition/ : directory including all the PDB models with 3D coordinates aligned on its reference.

Proteins belonging to group -1 (G-1.fasta) must be considered as "outliers" since DBSCAN was unable to group them in a cluster >= 5 members. This do not mean these proteins are not interesting and users are advised to consider them.

If a group is wide enough, users can try to generate sub-clusters using the Re-Clustering procedure, by playing with the --eps parameter.

Several python scripts were designed to further analyze ASMC clusters, more details in the section How to deal with ASMC outputs.

Tutorial output

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tutorial

Requirement

Preface

Usage

Clone this wiki locally