Wildcards in group subsetting? #164

phillip-richmond-umoja · 2024-02-06T19:06:08Z

Question not bug:

Can you select wildcards for subsetting across samples for a genotype?

My use case:
Single ancestor line, multiple clones derived from it. Alias file:

#ancestor	clone
NH50191	C3
NH50191	C56
NH50191	C3-1
NH50191	C56-1

Code chunk with subset expression:

## slivar filter 
	slivar expr \
	--vcf $vcf \
	--alias $alias_samplesheet \
	--pass-only \
	--info 'INFO.impactful' \
	--group-expr "Clonal_impact: INFO.gnomad_popmax_af <= 0.0001 && ancestor.AD[1] <= $max_ancestor_reads_for_clonal  && clone.AD[1] >= $min_clone_reads_for_clonal" \
       -o Clonal_impactful_snvindel.vcf.gz

If I want any clone.AD[1] >= value; or all clone.AD[1] is that possible with the --group-expr function?

One workaround could be changing the alias file to

#ancestor	clone1    clone2    clone3    clone4
NH50191	C3    C3-1    C56    C56-1

and expanding the group expression to

--group-expr "Clonal_impact: INFO.gnomad_popmax_af <= 0.0001 && ancestor.AD[1] <= $max_ancestor_reads_for_clonal  && clone1.AD[1] >= $min_clone_reads_for_clonal || clone2.AD[1] >= $min_clone_reads_for_clonal || clone3.AD[1] >= $min_clone_reads_for_clonal || clone4.AD[1] >= $min_clone_reads_for_clonal ||" \

But that requires that || (or) is implemented in your subsetting code.

Any advice?

Thanks,
Phil

For example, I want to access AD[1] or DP acro

The text was updated successfully, but these errors were encountered:

brentp · 2024-02-06T19:59:50Z

Hi Phil,
does your last expression not work as expected? The expressions are javascript so || should work fine.

phillip-richmond-umoja · 2024-02-06T22:47:54Z

Matter of fact it does.
This works fine

	slivar expr \
	--vcf $vcf \
	--alias $alias_samplesheet_separate \
	--pass-only \
	--info 'INFO.impactful' \
	--group-expr "Clonal_impact: INFO.gnomad_popmax_af <= 0.0001 && ancestor.AD[1] <= $max_ancestor_reads_for_clonal && \
	( clone1p1.AD[1] >= $min_clone_reads_for_clonal || clone1p2.AD[1] >= $min_clone_reads_for_clonal || clone2p1.AD[1] >= $min_clone_reads_for_clonal || clone2p2.AD[1] >= $min_clone_reads_for_clonal)" \
       -o Clonal_impactful_snvindel_withOR.vcf.gz

But problem here is I have to know the names of each clone, as opposed to a more generalized "clone" label. Just curious if there is any such operation as clone.each or clone.any or clone.all when referring back to a sample from the column titled "clone". If not I can keep it hard coded in the pipeline...such is the challenge of pipeline flexibility to differing inputs.

If no such operation exists it's a bummer and you can close the thread...it's been tough to move away from GEMINI in python2...and I'm missing operations like https://gemini.readthedocs.io/en/latest/content/querying.html#gt-filter-wildcard-filtering-on-genotype-columns

Thanks for a great tool though!
-Phil

brentp · 2024-02-09T10:12:43Z

I see. I think you can use:

ancestor.every(function(a) { a.AD[1] < 1}) && clone.some(function(c) { c.AD[1] < 0 })

that expression probably doesn't do anything useful, but shows how you'd use every and some

I'm still confused about exactly what you want to do. usually you'd have:

#ancestors    clones
A1,A2,A3        C1,C2,C3,C4...

then ancesors and clones will both be javascript arrays (because column ends in 's'.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wildcards in group subsetting? #164

Wildcards in group subsetting? #164

phillip-richmond-umoja commented Feb 6, 2024

brentp commented Feb 6, 2024

phillip-richmond-umoja commented Feb 6, 2024

brentp commented Feb 9, 2024

Wildcards in group subsetting? #164

Wildcards in group subsetting? #164

Comments

phillip-richmond-umoja commented Feb 6, 2024

brentp commented Feb 6, 2024

phillip-richmond-umoja commented Feb 6, 2024

brentp commented Feb 9, 2024