Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input database "./GCA_019458185.1.faa" has the wrong type (Generic) #923

Closed
fengqingling opened this issue Dec 18, 2024 · 2 comments
Closed

Comments

@fengqingling
Copy link

I want to use mmseqs to annotate PFAM for an faa file.
I can sure the faa file is fasta format, and it's Aminoacid.

Steps to Reproduce (for bugs)

First of all, I download pfam_seed.
mmseqs databases Pfam-A.seed pfam_seed/pfam tmp --threads 10

Then, create the index.
mmseqs createindex pfam_seed/pfam tmp -k 5 -s 7

But when I run mmseqs to annotation PFAM, some bug are generated.
mmseqs search ./GCA_019458185.1.faa pfam_seed/pfam mmseq_result.txt tmp

MMseqs Output (for bugs)

MMseqs Version: 15.6f452
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Add backtrace false
Alignment mode 2
Alignment mode 0
Allow wrapped scoring false
E-value threshold 0.001
Seq. id. threshold 0
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 0
Coverage mode 0
Max sequence length 65535
Compositional bias 1
Compositional bias 1
Max reject 2147483647
Max accept 2147483647
Include identical seq. id. false
Preload mode 0
Pseudo count a substitution:1.100,context:1.400
Pseudo count b substitution:4.100,context:5.800
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Correlation score weight 0
Gap open cost aa:11,nucl:5
Gap extension cost aa:1,nucl:2
Zdrop 40
Threads 128
Compressed 0
Verbosity 3
Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out
Sensitivity 5.7
k-mer length 0
Target search mode 0
k-score seq:2147483647,prof:2147483647
Alphabet size aa:21,nucl:5
Max results per query 300
Split database 0
Split mode 2
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 0
Minimum diagonal score 15
Selected taxa
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Mask profile 1
Profile E-value threshold 0.1
Global sequence weighting false
Allow deletions false
Filter MSA 1
Use filter only at N seqs 0
Maximum seq. id. threshold 0.9
Minimum seq. id. 0.0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Pseudo count mode 0
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Add orf stop false
Overlap between sequences 0
Sequence split mode 1
Header split mode 0
Chain overlapping alignments 0
Merge query 1
Search type 0
Search iterations 1
Start sensitivity 4
Search steps 1
Prefilter mode 0
Exhaustive search mode false
Filter results during exhaustive search 0
Strand selection 1
LCA search mode false
Disk space limit 0
MPI runner
Force restart with latest tmp false
Remove temporary files false

Input database "./GCA_019458185.1.faa" has the wrong type (Generic)
Allowed input:

  • Index
  • Nucleotide
  • Profile
  • Aminoacid
@fengqingling
Copy link
Author

OK, I can use easy-search to get the result.
mmseqs easy-search file pfam_seed/pfam result.txt tmp
But I still want to know what is the difference between the results of easy-search and search, and why search has an error.

@martin-steinegger
Copy link
Member

mmseqs search ./GCA_019458185.1.faa can only read databases not fasta files. So you would need to call createdb first on the GCA_019458185.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants