Problems with running Liftoff on HPC environment #162

zgb963 · 2024-01-24T19:01:02Z

Hello,

I've been having issues running Liftoff. It's taking days to run and then terminates. I'm running it on an HPC environment using 100GB memory and a computer node that has 2000 cores. The below command is what I'm using to run liftoff. The target genome is rhemac10 FASTA and I've also inputed the human genome hg38 FASTA and human genome annotation GFF.

liftoff liftoff/rheMac10.fa.gz liftoff/GRCh38_latest_genomic.fna.gz -g liftoff/GRCh38_latest_genomic.gff.gz -p 32 -o liftoff/update_rhemac10_lifted.gtf

Here is the bsub command I used to submit my script

bsub -q long -R rusage[mem=25G] -R span[hosts=1] -W 96:00 -n 4 -o ~/macaque_snRNAseq/liftoff/my_out.%J -e ~/macaque_snRNAseq/liftoff/my_err.%J ~/macaque_snRNAseq/scripts/update_liftoff.sh

And here is my script

#!/bin/bash

#activate liftoff
conda activate liftoff

#run liftoff

liftoff liftoff/rheMac10.fa.gz liftoff/GRCh38_latest_genomic.fna.gz -g liftoff/GRCh38_latest_genomic.gff.gz  -p 10 -o liftoff/update_rhemac10_lifted.gtf

echo liftoff finished running!

However, it has been running for several days and it's stuck on lifting features.

extracting features
2024-01-23 11:57:09,016 - INFO - Populating features
2024-01-23 12:04:20,319 - INFO - Populating features table and first-order relations: 4900134 features
2024-01-23 12:04:20,319 - INFO - Updating relations
2024-01-23 12:05:01,905 - INFO - Creating relations(parent) index
2024-01-23 12:05:05,589 - INFO - Creating relations(child) index
2024-01-23 12:05:10,210 - INFO - Creating features(featuretype) index
2024-01-23 12:05:14,158 - INFO - Creating features (seqid, start, end) index
2024-01-23 12:05:19,103 - INFO - Creating features (seqid, start, end, strand) index
2024-01-23 12:05:24,253 - INFO - Running ANALYZE features
aligning features
[M::main::16.3110.41] loaded/built the index for 2939 target sequence(s)
[M::mm_mapopt_update::17.5900.45] mid_occ = 596
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 2939
[M::mm_idx_stat::18.3710.48] distinct minimizers: 101324913 (39.04% are singletons); average occurrences: 5.469; average spacing: 5.362; total length: 2971331530
[M::worker_pipeline::226.3593.67] mapped 10628 sequences
[M::worker_pipeline::382.8163.79] mapped 10362 sequences
[M::worker_pipeline::555.9683.84] mapped 12280 sequences
[M::worker_pipeline::711.785*3.85] mapped 14834 sequences
[M::main] Version: 2.26-r1175
[M::main] CMD: minimap2 -o intermediate_files/reference_all_to_target_all.sam -a --end-bonus 5 --eqx -N 50 -p 0.5 -t 32 liftoff/rheMac10.fa.gz.mmi intermediate_files/reference_all_genes.fa
[M::main] Real time: 712.151 sec; CPU: 2743.497 sec; Peak RSS: 27.401 GB
lifting feature

Am I using enough memory or cores/threads for liftoff? Is there a typical runtime for lifting over features from one large genome to another?

The text was updated successfully, but these errors were encountered:

yeeus · 2024-02-29T03:31:52Z

I also encountered this problem, have you solved it?

Agamoni · 2024-02-29T19:52:55Z

Hi, I'm also having the same issue; any advice?

zgb963 · 2024-03-26T17:12:56Z

I also encountered this problem, have you solved it?

@yeeus not yet, I heard from someone that liftoff needs to be run with a gtf file and not a gff file. So I tried that but I got the following error 'GFF does not contain any gene features. Use -f to provide a list of other feature types to lift over.'

salzberg · 2024-05-20T21:35:18Z

We'll look into this - but Liftoff usually runs in no more than an hour or two on a mammalian genome, so if it's running for many hours something is wrong. It doesn't need that much memory.
However it seems you are lifting human annotation onto Rhesus macaque, which is pretty distant from human (at the DNA level). This means that minimap2 will likely have trouble mapping many genes. You might instead try our newer LiftOn program, which is designed for more distant mapping problems. It uses Liftoff as a module, and also miniprot. Check it out here: https://github.com/Kuanhao-Chao/LiftOn/blob/main/README.md
https://github.com/Kuanhao-Chao/LiftOn/blob/main/README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems with running Liftoff on HPC environment #162

Problems with running Liftoff on HPC environment #162

zgb963 commented Jan 24, 2024 •

edited

Loading

yeeus commented Feb 29, 2024

Agamoni commented Feb 29, 2024

zgb963 commented Mar 26, 2024

salzberg commented May 20, 2024

Problems with running Liftoff on HPC environment #162

Problems with running Liftoff on HPC environment #162

Comments

zgb963 commented Jan 24, 2024 • edited Loading

yeeus commented Feb 29, 2024

Agamoni commented Feb 29, 2024

zgb963 commented Mar 26, 2024

salzberg commented May 20, 2024

zgb963 commented Jan 24, 2024 •

edited

Loading