Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'FileNotFoundError: [Errno 2] No such file or directory #913

Open
Ryosuke-254 opened this issue Dec 3, 2024 · 2 comments
Open

'FileNotFoundError: [Errno 2] No such file or directory #913

Ryosuke-254 opened this issue Dec 3, 2024 · 2 comments

Comments

@Ryosuke-254
Copy link

I wrote the following code to select sequences with sequence identity below 0.8, using Google Colab, but I am encountering the error 'FileNotFoundError: [Errno 2] No such file or directory: './mmseqs_work/search_result.m8'.' I am also unable to use the GPU in Colab. Ideally, I want to use the GPU in Colab to select sequences with sequence identity below 0.8. I would appreciate any advice you could provide.

必要なライブラリとツールのインストール

!apt-get update
!apt-get install -y mmseqs2 wget
!pip install biopython
!apt-get install -y nvidia-cuda-toolkit

PyTorchを使ってGPUの可用性を確認

import torch
print(torch.cuda.is_available()) # Trueが返るはず
print(torch.cuda.get_device_name(0)) # 使用可能なGPUの名前を表示

CUDA関連ツールのインストール(オプションで必要な場合)

!apt-get install -y cuda-toolkit-12-2 # 使用したいCUDAバージョンに合わせて変更
!apt-get install -y cmake
!mkdir build && cd build
!cmake -DCMAKE_BUILD_TYPE=RELEASE -DCMAKE_INSTALL_PREFIX=. -DENABLE_CUDA=1 -DCMAKE_CUDA_ARCHITECTURES="75;80;86;89;90" ..
!make -j8
!make install
!pip install -q condacolab
import condacolab
condacolab.install()
!pip install pycuda
import pycuda.driver as cuda
cuda.init()
print(f"CUDA device count: {cuda.Device.count()}") # CUDAデバイスの数を表示

MMseqs2ワークディレクトリの作成

import os

work_dir = "./mmseqs_work"
os.makedirs(work_dir, exist_ok=True)

入力FASTAファイルを指定

input_fasta = "/content/38181dh_c.fasta" # 既存のFASTAファイルパスを指定

MMseqs2データベースの作成(1回のみ)

!mmseqs createdb {input_fasta} {work_dir}/db

自身に対してペアワイズ検索(GPUを使用)

search_result_path = os.path.join(work_dir, "search_result")
tmp_dir = os.path.join(work_dir, "tmp")
!mmseqs search {work_dir}/db {work_dir}/db {search_result_path} {tmp_dir} --min-seq-id 0.8 --threads 2 --gpu-only --search-type 3 --gpu only

出力結果を解析

import pandas as pd
from Bio import SeqIO

search_result_m8 = f"{search_result_path}.m8" # MMseqs2出力ファイルパス

MMseqs2出力形式を読み込む

columns = ["query", "target", "pident", "alnlen", "mismatch", "gapopen", "qstart", "qend", "tstart", "tend", "evalue", "bits"]
results = pd.read_csv(search_result_m8, sep="\t", names=columns)

配列同一性が80%未満のクエリ配列を抽出

filtered_results = results[results["pident"] < 80]
unique_query_ids = set(filtered_results["query"])

元のFASTAから該当する配列を抽出

filtered_sequences = {rec.id: rec for rec in SeqIO.parse(input_fasta, "fasta") if rec.id in unique_query_ids}
output_fasta = "/content/filtered_sequences.fasta"

with open(output_fasta, "w") as f:
SeqIO.write(filtered_sequences.values(), f, "fasta")

print(f"フィルタされた配列を保存しました: {output_fasta}")

@milot-mirdita
Copy link
Member

You can download the GPU-enabled binary from:
https://github.com/soedinglab/MMseqs2/releases/download/16-747c6/mmseqs-linux-gpu.tar.gz

You don't need to compile it yourself.

The relevant parameter to enable the GPU search mode is --gpu 1:

mmseqs search {work_dir}/db {work_dir}/db {search_result_path} {tmp_dir} --min-seq-id 0.8 --threads 2 --search-type 3 --gpu 1 

Please refer to the wiki for additional details:
https://github.com/soedinglab/MMseqs2/wiki#gpu-accelerated-search

@Ryosuke-254
Copy link
Author

Thank you for your response. Even after replacing '--gpu-only' with '--gpu 1' as per your advice, I am encountering a similar error. The error is:

'Unrecognized parameter "--gpu-only". Did you mean "--gap-open" (Gap open cost)?'
FileNotFoundError Traceback (most recent call last)
in <cell line: 52>()
50 # MMseqs2出力形式を読み込む
51 columns = ["query", "target", "pident", "alnlen", "mismatch", "gapopen", "qstart", "qend", "tstart", "tend", "evalue", "bits"]
---> 52 results = pd.read_csv(search_result_m8, sep="\t", names=columns)
53
54 # 配列同一性が80%未満のクエリ配列を抽出

4 frames
/usr/local/lib/python3.10/dist-packages/pandas/io/common.py in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
871 if ioargs.encoding and "b" not in ioargs.mode:
872 # Encoding
--> 873 handle = open(
874 handle,
875 ioargs.mode,
FileNotFoundError: [Errno 2] No such file or directory: './mmseqs_work/search_result.m8'

How should I improve this? Also, is it possible to run MMseqs2 on Google Colab? I would appreciate any advice you can provide."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants