Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
Woosub-Kim committed Nov 23, 2023
2 parents 28a4a7f + e00a3dc commit 7180ed4
Show file tree
Hide file tree
Showing 29 changed files with 73 additions and 31 deletions.
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ set(FRAMEWORK_ONLY 1 CACHE INTERNAL "" FORCE)
include(MMseqsSetupDerivedTarget)
add_subdirectory(lib/mmseqs EXCLUDE_FROM_ALL)

set(FOLDSEEK_FRAMEWORK_ONLY 0 CACHE INTERNAL "" FORCE)
set(FOLDSEEK_FRAMEWORK_ONLY 0 CACHE BOOL "Framework mode (don't create foldseek executable)")
if (FOLDSEEK_FRAMEWORK_ONLY)
set(FRAMEWORK_ONLY 1 CACHE INTERNAL "" FORCE)
endif()
Expand Down
46 changes: 34 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Foldseek enables fast and sensitive comparisons of large structure sets.
- [Output](#output-cluster)
- [Important Parameters](#important-cluster-parameters)
- [Complexsearch](#complexsearch)
- [Output](#output-complexsearch)
- [Output](#complex-search-output)
- [Main Modules](#main-modules)
- [Examples](#examples)

Expand Down Expand Up @@ -218,14 +218,26 @@ MCAR...Q
| --tmscore-threshold | Alignment | accept alignments with an alignment TMscore > thr |
| --lddt-threshold | Alignment | accept alignments with an alignment LDDT score > thr |

### Complexsearch
The `easy-complexsearch` module allows to search single or multiple query protein complexes, formatted in PDB/mmCIF format (flat or gzipped), against a target database, folder or single protein complexes. In default it outputs the alignment information as a [tab-separated file](#tab-separated-complex) but we support also [report](#report). <!-- or a HTML output. -->

foldseek easy-complexsearch example/1tim.pdb.gz example/8tim.pdb.gz aln tmpFolder
### Complexsearch
The `easy-complexsearch` module is a tool for searching single or multiple query protein complexes (PDB/mmCIF, flat or gzipped) against a target database of protein complexes. It reports the similarity metrices of the complexes like TMscore.

#### Output Complexsearch
#### Using Complexsearch
To pairwise compare complexes use `easy-complexsearch`, run the following command:
```
foldseek easy-complexsearch example/1tim.pdb.gz example/8tim.pdb.gz result tmpFolder
```
This command searches the specified protein complexe `1tim.pdb.gz` against 8tim.pdb.gz, producing alignment information.
Foldseek `easy-complexsearch` can also be used to search full databases:
```
foldseek databases PDB100 pdb tmp
foldseek easy-complexsearch example/1tim.pdb.gz pdb result tmpFolder
```

#### Complex Search Output
##### Tab-separated-complex
The default fields are containing the following fields: `query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,complexassignid` but they can be customized with the `--format-output` option e.g. `--format-output "query,target,complexqtmscore,complexttmscore,complexassignid"` returns the query and target accession, the tm scores of complex alignment normalized with query and target lengthes, and assignment id. You can choose many different output columns.
By default, `easy-complexsearch` outputs the alignment as a tab-separated file. The standard fields include `query, target, fident, alnlen, mismatch, gapopen, qstart, qend, tstart, tend, evalue, bits, complexassignid`. Customize output with the `--format-output` option. For example, `--format-output "query,target,complexqtmscore,complexttmscore,complexassignid"` alters the output to show specific scores and identifiers.

| Code | Description |
| --- | --- |
| **Commons** |
Expand All @@ -237,22 +249,31 @@ The default fields are containing the following fields: `query,target,fident,aln
|complexu | Rotation matrix of Complex alignment (computed to by TM-score) |
|complext | Translation vector of Complex alignment (computed to by TM-score) |
|complexassignid| Index of Complex alignment |

**Example Output:**
```
1tim.pdb.gz_A 8tim.pdb.gz_A 0.967 247 8 0 1 247 1 247 5.412E-43 1527 0
1tim.pdb.gz_B 8tim.pdb.gz_B 0.967 247 8 0 1 247 1 247 1.050E-43 1551 0
```
##### Report
Reports are containing the following fields:

##### Complex Report
`easy-complexsearch` also generates a report format (prefixed `_report`), which provides a summary ot the inter complex chain matching, including identifiers, chains, TM scores, rotation matrices, translation vectors, and assignment IDs. Reports are containing the following fields:
| Column | Description |
| --- | --- |
| (1,2) | Identifiers for query and target complex |
| (3,4) | Chains of query complex and target complex |
| (5,6) | TM scores based on query and target residue length |
| 1 | Identifiers for query complex |
| 2 | Identifiers for query complex |
| 3 | Matched chains of query complex |
| 4 | Matched chains of target complex |
| 5 | TM scores normalized by query length |
| 6 | TM scores normalized by target length |
| (8,9) | Rotation matrix (u) and Translation vector(t) |
| (9) | Assignment id |
| 9 | Complex Assignment Id |

**Example Output:**
```
1tim.pdb.gz 8tim.pdb.gz A,B A,B 0.98941 0.98941 0.999983,0.000332,0.005813,-0.000373,0.999976,0.006884,-0.005811,-0.006886,0.999959 0.298992,0.060047,0.565875 0
```

<!--
##### Interactive HTML
Foldseek can locally generate a search result HTML similiar to the [webserver](https://search.foldseek.com) by specifying the format mode `--format-mode 3`
Expand Down Expand Up @@ -301,6 +322,7 @@ foldseek createtsv db db clu clu.tsv
### Query centered multiple sequence alignment
Foldseek can generate a3m based multiple sequence alignments using the following commands.
a3m can be converted to fasta format using [reformat.pl](https://raw.githubusercontent.com/soedinglab/hh-suite/master/scripts/reformat.pl) (`reformat.pl in.a3m out.fas`).

```
foldseek createdb example/ targetDB
foldseek createdb example/ queryDB
Expand Down
7 changes: 5 additions & 2 deletions lib/3di/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ add_library(3di
structureto3diseqdist.cpp
)
mmseqs_setup_derived_target(3di)

target_include_directories(3di PRIVATE ..) # needed for kerasify/keras_model.h
target_include_directories(3di
PRIVATE
.. # kerasify/keras_model.h
${PROJECT_BINARY_DIR}/generated # encoder_weights_3di.kerasify.h
)
add_dependencies(3di local-generated)
8 changes: 4 additions & 4 deletions lib/mmseqs/src/commons/Parameters.h
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,8 @@ struct MMseqsParameter {
}
};

void initParameterSingleton(void);
#define DEFAULT_PARAMETER_SINGLETON_INIT void initParameterSingleton() { new Parameters; }

class Parameters {
public:
Expand Down Expand Up @@ -712,13 +714,11 @@ class Parameters {
static Parameters& getInstance()
{
if (instance == NULL) {
initInstance();
initParameterSingleton();
}
return *instance;
}
static void initInstance() {
new Parameters;
}
friend void initParameterSingleton(void);

void setDefaults();
void initMatrices();
Expand Down
3 changes: 3 additions & 0 deletions lib/mmseqs/src/mmseqs.cpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#include "Command.h"
#include "DownloadDatabase.h"
#include "Prefiltering.h"
#include "Parameters.h"

const char* binary_name = "mmseqs";
const char* tool_name = "MMseqs2";
Expand All @@ -19,6 +20,8 @@ void init() {
}
void (*initCommands)(void) = init;

DEFAULT_PARAMETER_SINGLETON_INIT

std::vector<DatabaseDownload> externalDownloads = {};
std::vector<KmerThreshold> externalThreshold = {};

Expand Down
1 change: 1 addition & 0 deletions lib/mmseqs/src/test/TestAlignment.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
#include "Matcher.h"

const char* binary_name = "test_alignment";
DEFAULT_PARAMETER_SINGLETON_INIT

int main (int, const char**) {
const size_t kmer_size=6;
Expand Down
1 change: 1 addition & 0 deletions lib/mmseqs/src/test/TestAlignmentPerformance.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
#include "StripedSmithWaterman.h"

const char* binary_name = "test_alignmentperformance";
DEFAULT_PARAMETER_SINGLETON_INIT

#define MAX_FILENAME_LIST_FILES 4096

Expand Down
1 change: 1 addition & 0 deletions lib/mmseqs/src/test/TestAlignmentTraceback.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
#include "Parameters.h"

const char* binary_name = "test_alignmenttraceback";
DEFAULT_PARAMETER_SINGLETON_INIT

struct scores{
short H;
Expand Down
1 change: 1 addition & 0 deletions lib/mmseqs/src/test/TestAlp.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ Contents: pairwise alignment algorithms
#include "sls_alignment_evaluer.hpp"

const char* binary_name = "test_alp";
DEFAULT_PARAMETER_SINGLETON_INIT

using namespace Sls;
using namespace std;
Expand Down
1 change: 0 additions & 1 deletion lib/mmseqs/src/test/TestBacktraceTranslator.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
#include "BacktraceTranslator.h"
#include "Parameters.h"

const char* binary_name = "test_backtracetranslator";

Expand Down
1 change: 0 additions & 1 deletion lib/mmseqs/src/test/TestBestAlphabet.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@
#include "DBReader.h"
#include "Sequence.h"
#include "Indexer.h"
#include "Parameters.h"

const char* binary_name = "test_bestalphabet";

Expand Down
1 change: 1 addition & 0 deletions lib/mmseqs/src/test/TestCompositionBias.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
#include "Parameters.h"

const char* binary_name = "test_compositionbias";
DEFAULT_PARAMETER_SINGLETON_INIT

void calcLocalAaBiasCorrection(Sequence* seq, SubstitutionMatrix * m){
const int windowSize = 40;
Expand Down
1 change: 1 addition & 0 deletions lib/mmseqs/src/test/TestDiagonalScoring.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
#include "Parameters.h"

const char* binary_name = "test_diagonalscoring";
DEFAULT_PARAMETER_SINGLETON_INIT

int main (int, const char**) {
size_t kmer_size = 6;
Expand Down
1 change: 1 addition & 0 deletions lib/mmseqs/src/test/TestDiagonalScoringPerformance.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ KSEQ_INIT(int, read)
#include "Parameters.h"

const char* binary_name = "test_diagonalscoringperformance";
DEFAULT_PARAMETER_SINGLETON_INIT

int main (int, const char**) {
size_t kmer_size = 6;
Expand Down
1 change: 1 addition & 0 deletions lib/mmseqs/src/test/TestKmerGenerator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
#include "Parameters.h"

const char* binary_name = "test_kmergenerator";
DEFAULT_PARAMETER_SINGLETON_INIT

int main (int, const char**) {
const size_t kmer_size=6;
Expand Down
1 change: 1 addition & 0 deletions lib/mmseqs/src/test/TestKmerNucl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
#include "Orf.h"

const char* binary_name = "test_kmernucl";
DEFAULT_PARAMETER_SINGLETON_INIT

std::string kmerToSting(size_t idx, int size) {
char output[32];
Expand Down
1 change: 1 addition & 0 deletions lib/mmseqs/src/test/TestKmerScore.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
#include "Parameters.h"

const char* binary_name = "test_kmerscore";
DEFAULT_PARAMETER_SINGLETON_INIT

int main (int, const char**) {
const size_t kmer_size = 6;
Expand Down
1 change: 0 additions & 1 deletion lib/mmseqs/src/test/TestKwayMerge.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@

#include "SubstitutionMatrix.h"
#include "Sequence.h"
#include "Parameters.h"

const char* binary_name = "test_kwaymerge";
struct KmerEntry{
Expand Down
1 change: 1 addition & 0 deletions lib/mmseqs/src/test/TestMultipleAlignment.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
#include "Parameters.h"

const char* binary_name = "test_multiplealignment";
DEFAULT_PARAMETER_SINGLETON_INIT

int main(int, const char**) {
Parameters& par = Parameters::getInstance();
Expand Down
1 change: 1 addition & 0 deletions lib/mmseqs/src/test/TestPSSM.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
#include "MultipleAlignment.h"

const char* binary_name = "test_pssm";
DEFAULT_PARAMETER_SINGLETON_INIT

int main (int, const char**) {
Parameters& par = Parameters::getInstance();
Expand Down
1 change: 1 addition & 0 deletions lib/mmseqs/src/test/TestPSSMPrune.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
#include <string.h>

const char* binary_name = "test_pssmprune";
DEFAULT_PARAMETER_SINGLETON_INIT

int main (int, const char**) {
Parameters& par = Parameters::getInstance();
Expand Down
1 change: 1 addition & 0 deletions lib/mmseqs/src/test/TestProfileAlignment.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
#include "Parameters.h"

const char* binary_name = "test_profilealignment";
DEFAULT_PARAMETER_SINGLETON_INIT

int main (int, const char**) {
const size_t kmer_size=6;
Expand Down
1 change: 1 addition & 0 deletions lib/mmseqs/src/test/TestReduceMatrix.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
#include "Parameters.h"

const char* binary_name = "test_reducematrix";
DEFAULT_PARAMETER_SINGLETON_INIT

int main (int, const char**) {
const int reductionAlphabetSize = 17;
Expand Down
1 change: 1 addition & 0 deletions lib/mmseqs/src/test/TestScoreMatrixSerialization.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
#include "Debug.h"

const char* binary_name = "test_scorematrixserialization";
DEFAULT_PARAMETER_SINGLETON_INIT

int main (int, const char**) {
Parameters& par = Parameters::getInstance();
Expand Down
1 change: 1 addition & 0 deletions lib/mmseqs/src/test/TestSequenceIndex.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
#include "Parameters.h"

const char* binary_name = "test_sequenceindex";
DEFAULT_PARAMETER_SINGLETON_INIT

int main (int, const char**) {
size_t kmer_size = 6;
Expand Down
1 change: 1 addition & 0 deletions lib/mmseqs/src/test/TestTanTan.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
#include "Parameters.h"

const char* binary_name = "test_tantan";
DEFAULT_PARAMETER_SINGLETON_INIT

int main (int, const char**) {
const size_t kmer_size = 6;
Expand Down
6 changes: 4 additions & 2 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ set(HAVE_GCS 0 CACHE BOOL "Have Google Cloud Storage SDK")
include_directories(commons)
add_subdirectory(strucclustutils)
add_subdirectory(commons)
add_subdirectory(version)
add_subdirectory(workflow)

add_library(foldseek-framework
Expand All @@ -14,7 +13,7 @@ add_library(foldseek-framework
FoldseekBase.cpp
)
mmseqs_setup_derived_target(foldseek-framework)
target_link_libraries(foldseek-framework version gemmiwrapper 3di pulchra kerasify tmalign block-aligner-c)
target_link_libraries(foldseek-framework gemmiwrapper 3di pulchra kerasify tmalign block-aligner-c)
add_dependencies(foldseek-framework local-generated)

if(HAVE_GCS)
Expand All @@ -23,7 +22,10 @@ if(HAVE_GCS)
endif()

if (NOT FOLDSEEK_FRAMEWORK_ONLY)
add_subdirectory(version)
add_executable(foldseek foldseek.cpp)
mmseqs_setup_derived_target(foldseek foldseek-framework)
target_link_libraries(mmseqs-framework version)
target_link_libraries(foldseek version)
install(TARGETS foldseek DESTINATION bin)
endif()
7 changes: 1 addition & 6 deletions src/commons/LocalParameters.h
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,10 @@ struct FoldSeekDbValidator : public DbValidator {

class LocalParameters : public Parameters {
public:
static void initInstance() {
new LocalParameters;
}
LocalParameters();
static LocalParameters& getLocalInstance() {
if (instance == NULL) {
initInstance();
initParameterSingleton();
}
return static_cast<LocalParameters&>(LocalParameters::getInstance());
}
Expand Down Expand Up @@ -137,9 +134,7 @@ class LocalParameters : public Parameters {


private:

LocalParameters(LocalParameters const&);
~LocalParameters() {};
void operator=(LocalParameters const&);
};
#endif
4 changes: 3 additions & 1 deletion src/foldseek.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
#include <cstddef>
#include "Command.h"
#include "LocalParameters.h"

const char* binary_name = "foldseek";
const char* tool_name = "foldseek";
Expand All @@ -17,4 +18,5 @@ void init() {
registerCommands(&baseCommands);
registerCommands(&foldseekCommands);
}
void (*initCommands)(void) = init;
void (*initCommands)(void) = init;
void initParameterSingleton() { new LocalParameters; }

0 comments on commit 7180ed4

Please sign in to comment.