You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wrote the following code to select sequences with sequence identity below 0.8, using Google Colab, but I am encountering the error 'FileNotFoundError: [Errno 2] No such file or directory: './mmseqs_work/search_result.m8'.' I am also unable to use the GPU in Colab. Ideally, I want to use the GPU in Colab to select sequences with sequence identity below 0.8. I would appreciate any advice you could provide.
filtered_sequences = {rec.id: rec for rec in SeqIO.parse(input_fasta, "fasta") if rec.id in unique_query_ids}
output_fasta = "/content/filtered_sequences.fasta"
with open(output_fasta, "w") as f:
SeqIO.write(filtered_sequences.values(), f, "fasta")
print(f"フィルタされた配列を保存しました: {output_fasta}")
The text was updated successfully, but these errors were encountered:
I wrote the following code to select sequences with sequence identity below 0.8, using Google Colab, but I am encountering the error 'FileNotFoundError: [Errno 2] No such file or directory: './mmseqs_work/search_result.m8'.' I am also unable to use the GPU in Colab. Ideally, I want to use the GPU in Colab to select sequences with sequence identity below 0.8. I would appreciate any advice you could provide.
必要なライブラリとツールのインストール
!apt-get update
!apt-get install -y mmseqs2 wget
!pip install biopython
!apt-get install -y nvidia-cuda-toolkit
PyTorchを使ってGPUの可用性を確認
import torch
print(torch.cuda.is_available()) # Trueが返るはず
print(torch.cuda.get_device_name(0)) # 使用可能なGPUの名前を表示
CUDA関連ツールのインストール(オプションで必要な場合)
!apt-get install -y cuda-toolkit-12-2 # 使用したいCUDAバージョンに合わせて変更
!apt-get install -y cmake
!mkdir build && cd build
!cmake -DCMAKE_BUILD_TYPE=RELEASE -DCMAKE_INSTALL_PREFIX=. -DENABLE_CUDA=1 -DCMAKE_CUDA_ARCHITECTURES="75;80;86;89;90" ..
!make -j8
!make install
!pip install -q condacolab
import condacolab
condacolab.install()
!pip install pycuda
import pycuda.driver as cuda
cuda.init()
print(f"CUDA device count: {cuda.Device.count()}") # CUDAデバイスの数を表示
MMseqs2ワークディレクトリの作成
import os
work_dir = "./mmseqs_work"
os.makedirs(work_dir, exist_ok=True)
入力FASTAファイルを指定
input_fasta = "/content/38181dh_c.fasta" # 既存のFASTAファイルパスを指定
MMseqs2データベースの作成(1回のみ)
!mmseqs createdb {input_fasta} {work_dir}/db
自身に対してペアワイズ検索(GPUを使用)
search_result_path = os.path.join(work_dir, "search_result")
tmp_dir = os.path.join(work_dir, "tmp")
!mmseqs search {work_dir}/db {work_dir}/db {search_result_path} {tmp_dir} --min-seq-id 0.8 --threads 2 --gpu-only --search-type 3 --gpu only
出力結果を解析
import pandas as pd
from Bio import SeqIO
search_result_m8 = f"{search_result_path}.m8" # MMseqs2出力ファイルパス
MMseqs2出力形式を読み込む
columns = ["query", "target", "pident", "alnlen", "mismatch", "gapopen", "qstart", "qend", "tstart", "tend", "evalue", "bits"]
results = pd.read_csv(search_result_m8, sep="\t", names=columns)
配列同一性が80%未満のクエリ配列を抽出
filtered_results = results[results["pident"] < 80]
unique_query_ids = set(filtered_results["query"])
元のFASTAから該当する配列を抽出
filtered_sequences = {rec.id: rec for rec in SeqIO.parse(input_fasta, "fasta") if rec.id in unique_query_ids}
output_fasta = "/content/filtered_sequences.fasta"
with open(output_fasta, "w") as f:
SeqIO.write(filtered_sequences.values(), f, "fasta")
print(f"フィルタされた配列を保存しました: {output_fasta}")
The text was updated successfully, but these errors were encountered: