Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU version has corrupted sequence output #912

Open
yab-fsp opened this issue Dec 2, 2024 · 1 comment
Open

GPU version has corrupted sequence output #912

yab-fsp opened this issue Dec 2, 2024 · 1 comment

Comments

@yab-fsp
Copy link

yab-fsp commented Dec 2, 2024

Expected Behavior

For the GPU version, when running easy-search mode and using --format-output with "tseq" to get the sequences for the hits, the amino acid sequences should be printed properly.

Current Behavior

Instead, the amino acid sequence appears as a bunch of other characters (see below).

Steps to Reproduce (for bugs)

Comandline: mmseqs easy-search $INPUT.fasta /mnt/ephemeral/dbmm/nr_gpu RESULT /mnt/ephemeral/tmp2 --gpu 1 --num-iterations 3 -s 8 --max-seqs 999999 --format-mode 4 --format-output "query,target,evalue,fident,nident,qstart,qend,qlen,tstart,tend,tlen,alnlen,bits,qcov,tcov,tseq"

MMseqs Output (for bugs)

(QUERY and TARGET anonymized)
query target evalue fident nident qstart qend qlen tstart tend tlen alnlen bits qcov tcov tseq
QUERY TARGET 7.770E-159 1.000 238 1 238 238 1 238 238 238 504 1.000 1.000
^O^H^E^C^C ^D^P^E^Q^Q^L^G ^Q^C ^B^E^B^Q^K^E^F^H^D^O^Q^O^E^C^E^C^E^B^@^P^S^E^H ^P ^D^G^A^P^P^E^H ^L^Q^L^R^L^P ^Q^P^P^D^O^S^E^Q^M^A^D^O^N^S^L^B^F
^H^M^F^B^D^D^H^O^@
^L^C^E^S^Q^M^C^N^P^G^D^D^H^B^B^E^K^S^H^P^N^@^C^Q^H^D^C^E^B^P ^Q^K^N^G^C ^E^G^B^D^H^C^B^E^K^G ^E^F^H ^C^S^K^S^K^O^F^K^Q^S^G
^@^B^H^M^H^K^E^G^H^Q^K^D^H^G^N^F^K^G^C^B^E^O^Q^M ^@^B^F^S^M^K^P^L^G^E^B^E^L^Q ^L^B^K^F^S ^O^P^M^O^@ ^O^H^B^L^K^C^H^N^B^F
^Q ^C^D^Q^P^@^@^E^G^P^F^E
^B^C ^S^H

Your Environment

Include as many relevant details about the environment you experienced the bug in.

  • Git commit used (The string after "MMseqs Version:" when you execute MMseqs without any parameters):
    ** 59016d2
  • Which MMseqs version was used (Statically-compiled, self-compiled, Homebrew, etc.):
    ** compiled binary provided by soedinglab on mmseqs2 website
  • Server specifications (especially CPU support for AVX2/SSE and amount of system memory):
    ** EC2 instance type g5.12xlarge (192GB memory, 4x A10 GPU with 24GB RAM a piece)
  • Operating system and version:
    ** Ubuntu 20.04.6 LTS (GNU/Linux 5.15.0-1072-aws x86_64)
@milot-mirdita
Copy link
Member

Good catch, I did not think this through completely. It's not actually corrupted, this is the new byte encoding necessary for the GPU search (with byte values 0 to 64 encoding masked or unmasked amino acids). I will fix the display issue asap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants