You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At the end of the regenotyping pipeline, it would be very easy to dump the following
Some kind of summary/signature of all the snps/indels in the VCF (might just be md5)
for each sample, a JSON with two entries. One is a bitfield and one an integer array, each as long as the VCF has records (ie one bit/integer per record). In these we put:
for each record, set bit to 1 if genotype is either ./. or het
for each record, set integer to the (haploid) genotype.
Once stored at the end of regenotyping, this will make distance measuring trivial
Then at the end we can just "cat" all the bitarrays for ./. or het, and cat all the intvectors, and then the distance measurement is trivial:
dist=0
for i= 0 to number of records-1
for j= i to number of records-1
if the bitfield[i]==bitfield[j]==0 (meaning it is neither missing nor het)
if the int vector[i] != int vector [j]
dist++
Ought to be v fast
The text was updated successfully, but these errors were encountered:
At the end of the regenotyping pipeline, it would be very easy to dump the following
Once stored at the end of regenotyping, this will make distance measuring trivial
Then at the end we can just "cat" all the bitarrays for ./. or het, and cat all the intvectors, and then the distance measurement is trivial:
dist=0
for i= 0 to number of records-1
for j= i to number of records-1
if the bitfield[i]==bitfield[j]==0 (meaning it is neither missing nor het)
if the int vector[i] != int vector [j]
dist++
Ought to be v fast
The text was updated successfully, but these errors were encountered: