Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wildcards in group subsetting? #164

Open
phillip-richmond-umoja opened this issue Feb 6, 2024 · 3 comments
Open

Wildcards in group subsetting? #164

phillip-richmond-umoja opened this issue Feb 6, 2024 · 3 comments

Comments

@phillip-richmond-umoja
Copy link

Question not bug:

Can you select wildcards for subsetting across samples for a genotype?

My use case:
Single ancestor line, multiple clones derived from it. Alias file:

#ancestor	clone
NH50191	C3
NH50191	C56
NH50191	C3-1
NH50191	C56-1

Code chunk with subset expression:

## slivar filter 
	slivar expr \
	--vcf $vcf \
	--alias $alias_samplesheet \
	--pass-only \
	--info 'INFO.impactful' \
	--group-expr "Clonal_impact: INFO.gnomad_popmax_af <= 0.0001 && ancestor.AD[1] <= $max_ancestor_reads_for_clonal  && clone.AD[1] >= $min_clone_reads_for_clonal" \
       -o Clonal_impactful_snvindel.vcf.gz	

If I want any clone.AD[1] >= value; or all clone.AD[1] is that possible with the --group-expr function?

One workaround could be changing the alias file to

#ancestor	clone1    clone2    clone3    clone4
NH50191	C3    C3-1    C56    C56-1

and expanding the group expression to

--group-expr "Clonal_impact: INFO.gnomad_popmax_af <= 0.0001 && ancestor.AD[1] <= $max_ancestor_reads_for_clonal  && clone1.AD[1] >= $min_clone_reads_for_clonal || clone2.AD[1] >= $min_clone_reads_for_clonal || clone3.AD[1] >= $min_clone_reads_for_clonal || clone4.AD[1] >= $min_clone_reads_for_clonal ||" \

But that requires that || (or) is implemented in your subsetting code.

Any advice?

Thanks,
Phil

For example, I want to access AD[1] or DP acro

@brentp
Copy link
Owner

brentp commented Feb 6, 2024

Hi Phil,
does your last expression not work as expected? The expressions are javascript so || should work fine.

@phillip-richmond-umoja
Copy link
Author

Matter of fact it does.
This works fine

	slivar expr \
	--vcf $vcf \
	--alias $alias_samplesheet_separate \
	--pass-only \
	--info 'INFO.impactful' \
	--group-expr "Clonal_impact: INFO.gnomad_popmax_af <= 0.0001 && ancestor.AD[1] <= $max_ancestor_reads_for_clonal && \
	( clone1p1.AD[1] >= $min_clone_reads_for_clonal || clone1p2.AD[1] >= $min_clone_reads_for_clonal || clone2p1.AD[1] >= $min_clone_reads_for_clonal || clone2p2.AD[1] >= $min_clone_reads_for_clonal)" \
       -o Clonal_impactful_snvindel_withOR.vcf.gz	

But problem here is I have to know the names of each clone, as opposed to a more generalized "clone" label. Just curious if there is any such operation as clone.each or clone.any or clone.all when referring back to a sample from the column titled "clone". If not I can keep it hard coded in the pipeline...such is the challenge of pipeline flexibility to differing inputs.

If no such operation exists it's a bummer and you can close the thread...it's been tough to move away from GEMINI in python2...and I'm missing operations like https://gemini.readthedocs.io/en/latest/content/querying.html#gt-filter-wildcard-filtering-on-genotype-columns

Thanks for a great tool though!
-Phil

@brentp
Copy link
Owner

brentp commented Feb 9, 2024

I see. I think you can use:

ancestor.every(function(a) { a.AD[1] < 1}) && clone.some(function(c) { c.AD[1] < 0 })

that expression probably doesn't do anything useful, but shows how you'd use every and some

I'm still confused about exactly what you want to do. usually you'd have:

#ancestors    clones
A1,A2,A3        C1,C2,C3,C4...

then ancesors and clones will both be javascript arrays (because column ends in 's'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants