Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dump binary encoded genotypes after regenotyping #98

Open
iqbal-lab opened this issue Feb 1, 2020 · 3 comments
Open

Dump binary encoded genotypes after regenotyping #98

iqbal-lab opened this issue Feb 1, 2020 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@iqbal-lab
Copy link
Contributor

iqbal-lab commented Feb 1, 2020

At the end of the regenotyping pipeline, it would be very easy to dump the following

  1. Some kind of summary/signature of all the snps/indels in the VCF (might just be md5)
  2. for each sample, a JSON with two entries. One is a bitfield and one an integer array, each as long as the VCF has records (ie one bit/integer per record). In these we put:
  • for each record, set bit to 1 if genotype is either ./. or het
  • for each record, set integer to the (haploid) genotype.
    Once stored at the end of regenotyping, this will make distance measuring trivial

Then at the end we can just "cat" all the bitarrays for ./. or het, and cat all the intvectors, and then the distance measurement is trivial:

dist=0
for i= 0 to number of records-1
for j= i to number of records-1

if the bitfield[i]==bitfield[j]==0 (meaning it is neither missing nor het)
if the int vector[i] != int vector [j]
dist++

Ought to be v fast

@iqbal-lab iqbal-lab added the enhancement New feature or request label Feb 1, 2020
@iqbal-lab
Copy link
Contributor Author

Note these could be merged as vcfs are merged

@iqbal-lab
Copy link
Contributor Author

This just an idea for the future

@iqbal-lab
Copy link
Contributor Author

maybe this is now redundant?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants