Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

内存占用非常低时,却抛出 std::bad_alloc 异常 #18

Open
altairwei opened this issue Jan 14, 2025 · 3 comments
Open

内存占用非常低时,却抛出 std::bad_alloc 异常 #18

altairwei opened this issue Jan 14, 2025 · 3 comments

Comments

@altairwei
Copy link

altairwei commented Jan 14, 2025

描述:

在使用 basevar 工具时,观察到系统内存占用非常低 (30G / 756G) 左右,但程序偶尔会抛出 std::bad_alloc 异常,提示内存分配失败。实际上在我同时并发的十几个 basevar 任务中,大部分都能跑完,但始终会有一两个出现内存分配错误。

以下是详细信息和复现步骤:

复现步骤

  1. 运行以下命令:
./bin/basevar basetype -t 20 -L output/calls/all.bam.list \
    --filename-has-samplename -R data/reference/Homo_sapiens_assembly38.fasta \
    -r chr9:1-5000000 --min-af=0.001 \
    --output-vcf output/calls/chr9_1_5000000.vcf.gz \
    --output-cvg output/calls/chr9_1_5000000.cvg.tsv.gz \
    > log/20250114_075951/chr9_1_5000000.log
  1. 程序运行一段时间后,抛出以下错误:
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
/usr/bin/bash: line 1: 154874 Aborted                 ./bin/basevar basetype -t 20 -L output/calls/all.bam.list --filename-has-samplename -R data/reference/Homo_sapiens_assembly38.fasta -r chr9:1-5000000 --min-af=0.001 --output-vcf output/calls/chr9_1_5000000.vcf.gz --output-cvg output/calls/chr9_1_5000000.cvg.tsv.gz > log/20250114_075951/chr9_1_5000000.log

预期行为

程序应正常运行,完成数据处理任务,而不会因内存分配失败而崩溃。

实际行为

程序偶尔会抛出 std::bad_alloc 异常,尽管系统内存占用非常低。

@altairwei
Copy link
Author

我是用 Snakemake 执行 basevar 命令的。但奇怪的时,当我单独将失败的 job 手动运行时就能成功,难道真的是因为内存分配问题吗?

@altairwei
Copy link
Author

altairwei commented Jan 14, 2025

下面是某个失败任务的日志:

Program start on Tue Jan 14 08:39:03 2025

[INFO] BaseVar arguments:
basevar basetype -R data/reference/Homo_sapiens_assembly38.fasta \ 
   -L output/calls/all.bam.list \ 
   -q 10 \ 
   -m 0.001 \ 
   -B 200 \ 
   -t 10 \ 
   -r chr5:115000001-120000000 \ 
   --output-vcf output/calls/chr5_115000001_120000000.vcf.gz \ 
   --output-vcg output/calls/chr5_115000001_120000000.cvg.tsv.gz \ 
   --filename-has-samplename

[INFO] Finish loading arguments and we have 10 BAM/CRAM files for variants calling.

---- Calling Intervals ----
1 - chr5:115000001-120000000

[INFO] BaseVar'll load samples id from filename directly, becuase you set --filename-has-samplename.
[INFO] Tue Jan 14 08:39:03 2025. Done for loading all samples' id from alignment files, 0 seconds elapsed.

[INFO] Tue Jan 14 08:40:06 2025. Done for creating batchfile output/calls/cache_chr5_115000001_120000000/chr5_115000001_120000000.chr5_115000001-120000000.1_1.bf.gz, 62 (CPU time: 62.08) seconds elapsed.
[INFO] Tue Jan 14 08:40:06 2025. Done for creating all 1 batchfiles in chr5_115000001-120000000 and start to call variants, 63 (CPU time: 62.64) seconds elapsed in total.

[INFO] Have been loaded 10000 lines.
[INFO] Have been loaded 10000 lines.
[INFO] Have been loaded 10000 lines.
[INFO] Have been loaded 10000 lines.
[INFO] Have been loaded 10000 lines.
[INFO] Have been loaded 10000 lines.
[INFO] Have been loaded 10000 lines.
[INFO] Have been loaded 10000 lines.
[INFO] Have been loaded 10000 lines.
[INFO] Have been loaded 10000 lines.
[INFO] Have been loaded 20000 lines.
[INFO] Have been loaded 20000 lines.
[INFO] Have been loaded 20000 lines.
[INFO] Have been loaded 20000 lines.
[INFO] Have been loaded 20000 lines.
[INFO] Have been loaded 20000 lines.
[INFO] Have been loaded 20000 lines.
[INFO] Have been loaded 20000 lines.
[INFO] Have been loaded 20000 lines.
[INFO] Have been loaded 20000 lines.
[INFO] Have been loaded 30000 lines.
[INFO] Have been loaded 30000 lines.
[INFO] Have been loaded 30000 lines.
[INFO] Have been loaded 30000 lines.
[INFO] Have been loaded 30000 lines.
[INFO] Have been loaded 30000 lines.
[INFO] Have been loaded 30000 lines.
[INFO] Have been loaded 30000 lines.
[INFO] Have been loaded 30000 lines.
[INFO] Have been loaded 40000 lines.
[INFO] Have been loaded 30000 lines.
[INFO] Have been loaded 40000 lines.
[INFO] Have been loaded 40000 lines.
[INFO] Have been loaded 40000 lines.
[INFO] Have been loaded 40000 lines.
[INFO] Have been loaded 40000 lines.
[INFO] Have been loaded 40000 lines.
[INFO] Have been loaded 40000 lines.
[INFO] Have been loaded 40000 lines.
[INFO] Have been loaded 50000 lines.
[INFO] Have been loaded 50000 lines.
[INFO] Have been loaded 40000 lines.
[INFO] Have been loaded 50000 lines.
[INFO] Have been loaded 50000 lines.
[INFO] Have been loaded 60000 lines.
[INFO] Have been loaded 50000 lines.
[INFO] Have been loaded 50000 lines.
[INFO] Have been loaded 50000 lines.
[INFO] Have been loaded 60000 lines.
[INFO] Have been loaded 50000 lines.
[INFO] Have been loaded 50000 lines.
[INFO] Have been loaded 50000 lines.
[INFO] Have been loaded 60000 lines.
[INFO] Have been loaded 60000 lines.
[INFO] Have been loaded 70000 lines.
[INFO] Have been loaded 60000 lines.
[INFO] Have been loaded 60000 lines.
[INFO] Have been loaded 70000 lines.
[INFO] Have been loaded 60000 lines.
[INFO] Have been loaded 70000 lines.
[INFO] Have been loaded 60000 lines.
[INFO] Have been loaded 60000 lines.
[INFO] Have been loaded 60000 lines.
[INFO] Have been loaded 70000 lines.
[INFO] Have been loaded 70000 lines.
[INFO] Have been loaded 70000 lines.
[INFO] Have been loaded 70000 lines.
[INFO] Have been loaded 80000 lines.
[INFO] Have been loaded 70000 lines.
[INFO] Have been loaded 80000 lines.
[INFO] Have been loaded 80000 lines.
[INFO] Have been loaded 70000 lines.
[INFO] Have been loaded 70000 lines.
[INFO] Have been loaded 80000 lines.
[INFO] Have been loaded 80000 lines.
[INFO] Have been loaded 80000 lines.
[INFO] Have been loaded 80000 lines.
[INFO] Have been loaded 90000 lines.
[INFO] Have been loaded 80000 lines.
[INFO] Have been loaded 90000 lines.
[INFO] Have been loaded 80000 lines.
[INFO] Have been loaded 90000 lines.
[INFO] Have been loaded 80000 lines.
[INFO] Have been loaded 90000 lines.
[INFO] Have been loaded 90000 lines.
[INFO] Have been loaded 90000 lines.
[INFO] Have been loaded 90000 lines.
[INFO] Have been loaded 100000 lines.
[INFO] Have been loaded 100000 lines.
[INFO] Have been loaded 90000 lines.
[INFO] Have been loaded 90000 lines.
[INFO] Have been loaded 100000 lines.
[INFO] Have been loaded 100000 lines.
[INFO] Have been loaded 100000 lines.
[INFO] Have been loaded 90000 lines.
[INFO] Have been loaded 100000 lines.
[INFO] Have been loaded 100000 lines.
[INFO] Have been loaded 100000 lines.
[INFO] Have been loaded 100000 lines.
[INFO] Tue Jan 14 08:40:08 2025. Done for creating [output/calls/cache_chr5_115000001_120000000/chr5_115000001_120000000.chr5_115000001-120000000.vcf.gz.2_50, output/calls/cache_chr5_115000001_120000000/chr5_115000001_120000000.chr5_115000001-120000000.cvg.gz.2_50], 2 (CPU time: 14.16) seconds elapsed.
[INFO] Tue Jan 14 08:40:08 2025. Done for creating [output/calls/cache_chr5_115000001_120000000/chr5_115000001_120000000.chr5_115000001-120000000.vcf.gz.1_50, output/calls/cache_chr5_115000001_120000000/chr5_115000001_120000000.chr5_115000001-120000000.cvg.gz.1_50], 2 (CPU time: [INFO] Tue Jan 14 08:40:08 2025. Done for creating [output/calls/cache_chr5_115000001_120000000/chr5_115000001_120000000.chr5_115000001-120000000.vcf.gz.5_50, output/calls/cache_chr5_115000001_120000000/chr5_115000001_120000000.chr5_115000001-120000000.cvg.gz.5_50], 2 (CPU time: 14.17) seconds elapsed.
14.17) seconds elapsed.
[INFO] Tue Jan 14 08:40:08 2025. Done for creating [output/calls/cache_chr5_115000001_120000000/chr5_115000001_120000000.chr5_115000001-120000000.vcf.gz.9_50, output/calls/cache_chr5_115000001_120000000/chr5_115000001_120000000.chr5_115000001-120000000.cvg.gz.9_50], 2 (CPU time: 14.29) seconds elapsed.
[INFO] Have been loaded 100000 lines.
[INFO] Have been loaded 10000 lines.
[INFO] Have been loaded 10000 lines.
[INFO] Tue Jan 14 08:40:08 2025. Done for creating [output/calls/cache_chr5_115000001_120000000/chr5_115000001_120000000.chr5_115000001-120000000.vcf.gz.7_50, output/calls/cache_chr5_115000001_120000000/chr5_115000001_120000000.chr5_115000001-120000000.cvg.gz.7_50], 2 (CPU time: 14.74) seconds elapsed.
[INFO] Tue Jan 14 08:40:08 2025. Done for creating [output/calls/cache_chr5_115000001_120000000/chr5_115000001_120000000.chr5_115000001-120000000.vcf.gz.10_50, output/calls/cache_chr5_115000001_120000000/chr5_115000001_120000000.chr5_115000001-120000000.cvg.gz.10_50], g/output/calls/cache_chr5_115000001_120000000/chr5_115000001_120000000.chr5_115000001-120000000.cvg.gz.7_50], 2 (CPU time: 14.74) seconds elapsed.
14.78) seconds elapsed.

@altairwei
Copy link
Author

当我并发了 482 个单线程的 basevar 时,所有任务都成功结束了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant