Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Freemuxlet fails during popscle dsc-pileup due to different order of bam and vcf. #25

Closed
bednarsky opened this issue Nov 6, 2023 · 5 comments

Comments

@bednarsky
Copy link

bednarsky commented Nov 6, 2023

Sorry for having so many questions and thank you in advance for your help!

Freemuxlet fails with this error:

NOTICE [2023/11/05 16:04:12] - Finished loading 6249 droplet/cell barcodes to consider
NOTICE [2023/11/05 16:04:12] - Initializing BCF reader..
NOTICE [2023/11/05 16:04:12] - Finished identifying 0 samples to load from VCF/BCF
NOTICE [2023/11/05 16:04:12] - Initializing SAM reader..

FATAL ERROR -
[E:int32_t cmdCramDigitalPileup(int32_t, char**)] Your VCF/BCF files and SAM/BAM/CRAM files have di
fferent ordering of chromosomes. SAM/BAM/CRAM file has GL000225.1 before GL000194.1, but VCF/BCF file
has GL000225.1 after GL000194.1

terminate called after throwing an instance of 'pException'
what(): Exception was thrown
.command.sh: line 36: 1042790 Aborted (core dumped) popscle dsc-pileup --sam sorted
.bam --tag-group CB --tag-UMI UB --vcf genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf --sam-verbose 10000
00 --vcf-verbose 10000 --cap-BQ 40 --min-BQ 13 --min-MQ 20 --min-TD 0 --excl-flag 3844 --group-list b
arcodes.tsv.gz --min-total 0 --min-uniq 0 --min-snp 0 --out freemuxlet_391/plp/freemuxlet_out

Work dir:
/projects/project_dir/resources/demultiplexing/hadge/work/6b/502461a7f29a
e39aff076e44409a7d

Do you know this issue? What is your recommended solution? I tried popscle_helper_tools to reorder the vcf file, but it is currently still failing, see issue on that repo.

Also note this line Finished identifying 0 samples to load from VCF/BCF → is this usual, or does this point to an earlier error?

This issue occured for both hg38 (which matches the ref genome used for demultiplexing) and hg19 (which I used first by accident).

Really appreciate all your work!

@wxicu
Copy link
Collaborator

wxicu commented Nov 6, 2023

Hey here are my suggestions:

  1. Reference files and chromosome ordering statgen/popscle#25 (comment) Maybe try to remove the config completely?
  2. I dont think Finished identifying 0 samples to load from VCF/BCF is really a problem. bcftools is trying to identify the samples from a vcf of common SNPs which don't have any sample lists. So it can not identify any sample.
  3. Could you please try to run the command
popscle dsc-pileup --sam sorted
.bam --tag-group CB --tag-UMI UB --vcf genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf --sam-verbose 10000
00 --vcf-verbose 10000 --cap-BQ 40 --min-BQ 13 --min-MQ 20 --min-TD 0 --excl-flag 3844 --group-list b
arcodes.tsv.gz --min-total 0 --min-uniq 0 --min-snp 0 --out freemuxlet_391/plp/freemuxlet_out

outside the nextflow pipeline? Maybe tabix indeed solves the problem but due to the reason that hadge doesnt mount the tbi into the working directory, freemuxlet can not find this file and fails again. That could also be possible.

@bednarsky
Copy link
Author

Thanks again for the fast response, amazing support! I will let you know what worked.

@Zethson
Copy link
Member

Zethson commented Nov 17, 2023

Keep them coming @bednarsky ! Thank you for playing with hadge. We're happy to provide additional assistance if necessary.

@wxicu
Copy link
Collaborator

wxicu commented Nov 17, 2023

Please feel free to reopen this issue if you are still running into issues and we would be happy to help troubleshoot further.

@wxicu wxicu closed this as completed Nov 17, 2023
@bednarsky
Copy link
Author

bednarsky commented Nov 21, 2023

I had an issue with the header and the table of vcf files using different formatting. I changed all formatting to the one in the bam files, namely ch1, chr2 etc, and then the reordering with popscle_helper_tools worked, and the process continued.

These are my commands to download and prepare the files.

wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=13aebUpEKrtjliyT9rYzRijtkNJVUk5F_' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=13aebUpEKrtjliyT9rYzRijtkNJVUk5F_" -O common_variants_grch38.vcf && rm -rf /tmp/cookies.txt

# rename chromosomes to match bams, need to change header back first so that then bcftools works
sed 's/ID=chr/ID=/g' common_variants_grch38.vcf > common_variants_grch38.no_chr.vcf

# create file for renaming, then do renaming
for i in {1..22} X Y; do echo -e "$i\tchr$i"; done > rename_chrs.txt
bcftools annotate --rename-chrs rename_chrs.txt -o common_variants_grch38.vcf common_variants_grch38.no_chr.vcf

# adapt for scSplit like done in example from repo
bcftools query -f '%CHROM:%POS\n' common_variants_grch38.vcf > common_variants_grch38_list.vcf

# freemuxlet and cellsnp
wget https://master.dl.sourceforge.net/project/cellsnp/SNPlist/genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf.gz
gunzip genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf.gz

# do the renaming
bcftools annotate --rename-chrs rename_chrs.txt -o genome1K.phase3.SNP_AF5e2.chr1toX.hg38.chr.vcf genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf

# reorder
popscle_helper_tools/sort_vcf_same_as_bam.sh \
    tmp_dir_example_bam/example_bam.bam \
     genome1K.phase3.SNP_AF5e2.chr1toX.hg38.chr.vcf > genome1K.phase3.SNP_AF5e2.chr1toX.hg38.chr.reordered.vcf

Removing the contig lines, as proposed here, statgen/popscle#25 (comment), led to different downstream errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants