'FileNotFoundError: [Errno 2] No such file or directory #913

Ryosuke-254 · 2024-12-03T09:25:27Z

I wrote the following code to select sequences with sequence identity below 0.8, using Google Colab, but I am encountering the error 'FileNotFoundError: [Errno 2] No such file or directory: './mmseqs_work/search_result.m8'.' I am also unable to use the GPU in Colab. Ideally, I want to use the GPU in Colab to select sequences with sequence identity below 0.8. I would appreciate any advice you could provide.

必要なライブラリとツールのインストール

!apt-get update
!apt-get install -y mmseqs2 wget
!pip install biopython
!apt-get install -y nvidia-cuda-toolkit

PyTorchを使ってGPUの可用性を確認

import torch
print(torch.cuda.is_available()) # Trueが返るはず
print(torch.cuda.get_device_name(0)) # 使用可能なGPUの名前を表示

CUDA関連ツールのインストール（オプションで必要な場合）

!apt-get install -y cuda-toolkit-12-2 # 使用したいCUDAバージョンに合わせて変更
!apt-get install -y cmake
!mkdir build && cd build
!cmake -DCMAKE_BUILD_TYPE=RELEASE -DCMAKE_INSTALL_PREFIX=. -DENABLE_CUDA=1 -DCMAKE_CUDA_ARCHITECTURES="75;80;86;89;90" ..
!make -j8
!make install
!pip install -q condacolab
import condacolab
condacolab.install()
!pip install pycuda
import pycuda.driver as cuda
cuda.init()
print(f"CUDA device count: {cuda.Device.count()}") # CUDAデバイスの数を表示

MMseqs2ワークディレクトリの作成

import os

work_dir = "./mmseqs_work"
os.makedirs(work_dir, exist_ok=True)

入力FASTAファイルを指定

input_fasta = "/content/38181dh_c.fasta" # 既存のFASTAファイルパスを指定

MMseqs2データベースの作成（1回のみ）

!mmseqs createdb {input_fasta} {work_dir}/db

自身に対してペアワイズ検索（GPUを使用）

search_result_path = os.path.join(work_dir, "search_result")
tmp_dir = os.path.join(work_dir, "tmp")
!mmseqs search {work_dir}/db {work_dir}/db {search_result_path} {tmp_dir} --min-seq-id 0.8 --threads 2 --gpu-only --search-type 3 --gpu only

出力結果を解析

import pandas as pd
from Bio import SeqIO

search_result_m8 = f"{search_result_path}.m8" # MMseqs2出力ファイルパス

MMseqs2出力形式を読み込む

columns = ["query", "target", "pident", "alnlen", "mismatch", "gapopen", "qstart", "qend", "tstart", "tend", "evalue", "bits"]
results = pd.read_csv(search_result_m8, sep="\t", names=columns)

配列同一性が80%未満のクエリ配列を抽出

filtered_results = results[results["pident"] < 80]
unique_query_ids = set(filtered_results["query"])

元のFASTAから該当する配列を抽出

filtered_sequences = {rec.id: rec for rec in SeqIO.parse(input_fasta, "fasta") if rec.id in unique_query_ids}
output_fasta = "/content/filtered_sequences.fasta"

with open(output_fasta, "w") as f:
SeqIO.write(filtered_sequences.values(), f, "fasta")

print(f"フィルタされた配列を保存しました: {output_fasta}")

milot-mirdita · 2024-12-03T09:29:14Z

You can download the GPU-enabled binary from:
https://github.com/soedinglab/MMseqs2/releases/download/16-747c6/mmseqs-linux-gpu.tar.gz

You don't need to compile it yourself.

The relevant parameter to enable the GPU search mode is --gpu 1:

mmseqs search {work_dir}/db {work_dir}/db {search_result_path} {tmp_dir} --min-seq-id 0.8 --threads 2 --search-type 3 --gpu 1

Please refer to the wiki for additional details:
https://github.com/soedinglab/MMseqs2/wiki#gpu-accelerated-search

Ryosuke-254 · 2024-12-03T10:29:48Z

Thank you for your response. Even after replacing '--gpu-only' with '--gpu 1' as per your advice, I am encountering a similar error. The error is:

'Unrecognized parameter "--gpu-only". Did you mean "--gap-open" (Gap open cost)?'
FileNotFoundError Traceback (most recent call last)
in <cell line: 52>()
50 # MMseqs2出力形式を読み込む
51 columns = ["query", "target", "pident", "alnlen", "mismatch", "gapopen", "qstart", "qend", "tstart", "tend", "evalue", "bits"]
---> 52 results = pd.read_csv(search_result_m8, sep="\t", names=columns)
53
54 # 配列同一性が80%未満のクエリ配列を抽出

4 frames
/usr/local/lib/python3.10/dist-packages/pandas/io/common.py in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
871 if ioargs.encoding and "b" not in ioargs.mode:
872 # Encoding
--> 873 handle = open(
874 handle,
875 ioargs.mode,
FileNotFoundError: [Errno 2] No such file or directory: './mmseqs_work/search_result.m8'

How should I improve this? Also, is it possible to run MMseqs2 on Google Colab? I would appreciate any advice you can provide."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'FileNotFoundError: [Errno 2] No such file or directory #913

'FileNotFoundError: [Errno 2] No such file or directory #913

Ryosuke-254 commented Dec 3, 2024

milot-mirdita commented Dec 3, 2024

Ryosuke-254 commented Dec 3, 2024

'FileNotFoundError: [Errno 2] No such file or directory #913

'FileNotFoundError: [Errno 2] No such file or directory #913

Comments

Ryosuke-254 commented Dec 3, 2024

必要なライブラリとツールのインストール

PyTorchを使ってGPUの可用性を確認

CUDA関連ツールのインストール（オプションで必要な場合）

MMseqs2ワークディレクトリの作成

入力FASTAファイルを指定

MMseqs2データベースの作成（1回のみ）

自身に対してペアワイズ検索（GPUを使用）

出力結果を解析

MMseqs2出力形式を読み込む

配列同一性が80%未満のクエリ配列を抽出

元のFASTAから該当する配列を抽出

milot-mirdita commented Dec 3, 2024

Ryosuke-254 commented Dec 3, 2024