Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with running Liftoff on HPC environment #162

Open
zgb963 opened this issue Jan 24, 2024 · 4 comments
Open

Problems with running Liftoff on HPC environment #162

zgb963 opened this issue Jan 24, 2024 · 4 comments

Comments

@zgb963
Copy link

zgb963 commented Jan 24, 2024

Hello,

I've been having issues running Liftoff. It's taking days to run and then terminates. I'm running it on an HPC environment using 100GB memory and a computer node that has 2000 cores. The below command is what I'm using to run liftoff. The target genome is rhemac10 FASTA and I've also inputed the human genome hg38 FASTA and human genome annotation GFF.

liftoff liftoff/rheMac10.fa.gz liftoff/GRCh38_latest_genomic.fna.gz -g liftoff/GRCh38_latest_genomic.gff.gz -p 32 -o liftoff/update_rhemac10_lifted.gtf

Here is the bsub command I used to submit my script

bsub -q long -R rusage[mem=25G] -R span[hosts=1] -W 96:00 -n 4 -o ~/macaque_snRNAseq/liftoff/my_out.%J -e ~/macaque_snRNAseq/liftoff/my_err.%J ~/macaque_snRNAseq/scripts/update_liftoff.sh 

And here is my script

#!/bin/bash

#activate liftoff
conda activate liftoff

#run liftoff

liftoff liftoff/rheMac10.fa.gz liftoff/GRCh38_latest_genomic.fna.gz -g liftoff/GRCh38_latest_genomic.gff.gz  -p 10 -o liftoff/update_rhemac10_lifted.gtf

echo liftoff finished running!

However, it has been running for several days and it's stuck on lifting features.

extracting features
2024-01-23 11:57:09,016 - INFO - Populating features
2024-01-23 12:04:20,319 - INFO - Populating features table and first-order relations: 4900134 features
2024-01-23 12:04:20,319 - INFO - Updating relations
2024-01-23 12:05:01,905 - INFO - Creating relations(parent) index
2024-01-23 12:05:05,589 - INFO - Creating relations(child) index
2024-01-23 12:05:10,210 - INFO - Creating features(featuretype) index
2024-01-23 12:05:14,158 - INFO - Creating features (seqid, start, end) index
2024-01-23 12:05:19,103 - INFO - Creating features (seqid, start, end, strand) index
2024-01-23 12:05:24,253 - INFO - Running ANALYZE features
aligning features
[M::main::16.3110.41] loaded/built the index for 2939 target sequence(s)
[M::mm_mapopt_update::17.590
0.45] mid_occ = 596
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 2939
[M::mm_idx_stat::18.3710.48] distinct minimizers: 101324913 (39.04% are singletons); average occurrences: 5.469; average spacing: 5.362; total length: 2971331530
[M::worker_pipeline::226.359
3.67] mapped 10628 sequences
[M::worker_pipeline::382.8163.79] mapped 10362 sequences
[M::worker_pipeline::555.968
3.84] mapped 12280 sequences
[M::worker_pipeline::711.785*3.85] mapped 14834 sequences
[M::main] Version: 2.26-r1175
[M::main] CMD: minimap2 -o intermediate_files/reference_all_to_target_all.sam -a --end-bonus 5 --eqx -N 50 -p 0.5 -t 32 liftoff/rheMac10.fa.gz.mmi intermediate_files/reference_all_genes.fa
[M::main] Real time: 712.151 sec; CPU: 2743.497 sec; Peak RSS: 27.401 GB
lifting feature

Am I using enough memory or cores/threads for liftoff? Is there a typical runtime for lifting over features from one large genome to another?

@yeeus
Copy link

yeeus commented Feb 29, 2024

I also encountered this problem, have you solved it?

@Agamoni
Copy link

Agamoni commented Feb 29, 2024

Hi, I'm also having the same issue; any advice?

@zgb963
Copy link
Author

zgb963 commented Mar 26, 2024

I also encountered this problem, have you solved it?

@yeeus not yet, I heard from someone that liftoff needs to be run with a gtf file and not a gff file. So I tried that but I got the following error 'GFF does not contain any gene features. Use -f to provide a list of other feature types to lift over.'

@salzberg
Copy link

We'll look into this - but Liftoff usually runs in no more than an hour or two on a mammalian genome, so if it's running for many hours something is wrong. It doesn't need that much memory.
However it seems you are lifting human annotation onto Rhesus macaque, which is pretty distant from human (at the DNA level). This means that minimap2 will likely have trouble mapping many genes. You might instead try our newer LiftOn program, which is designed for more distant mapping problems. It uses Liftoff as a module, and also miniprot. Check it out here: https://github.com/Kuanhao-Chao/LiftOn/blob/main/README.md
https://github.com/Kuanhao-Chao/LiftOn/blob/main/README.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants