Fallback parsing for Genotype files (esp. 23andMe) #428

BenjaminHCCarr · 2017-08-25T00:28:45Z

It would be useful (thought possible CPU expensive) to have a fallthrough tree for file types if the indicated one does not work.

My biggest request is that 23andme-format would fallback to 23andMe-EXOME, VCF-format (or reverse) in the case of:
We are sorry to inform you that there was at least one line in your genotyping file, excluding the header, that could not be parsed correctly.
Prior to emailing the user.
This greatly increases the user experience (UX) likely at a low cost of parsing the file; it appears based on the speed that this is a regexp check, so you could even do a stupid simple regexp (not sure which expression engine you are using).

if 23andme-format fails, see if outside the header (heck, do a head -n 10)
^1\ matches, then try parsing with 23andMe-EXOME, VCF-format

with the reverse being true if EXOME fails see if
^rs[digits]\ on a head after the header.

It also saves bandwidth on both ends, so in the end, depending on hosting, may end up saving you money in transport costs versus cpu cycles.

deCODEme
has a unique header:

Name,Variation,Chromosome,Position,Strand,YourCode
rs[digits]\

as does FamilyTreeDNA

RSID,CHROMOSOME,POSITION,RESULT
\"rs[digits]\"\,

Though I would think the 23andMe is the most likely fault case.
If you can point me to the right place in the src tree I would be happy to write a PR for this feature.

The text was updated successfully, but these errors were encountered:

gedankenstuecke · 2017-08-25T06:59:24Z

Thanks for the suggestions! The code is in two different sidekiq workers that handle our parsing.

In the preparsing step some general checks are done to see whether the file should be parsed further. If this indicates that everything's fine and it's not a duplicate the file will be passed on to the parsing step, which handles the writing of the files into our database.

Hope that's a good pointer for the start, let us know if we can offer further explanations. 🙂

gedankenstuecke · 2018-01-18T20:32:43Z

Is this still something you want to tackle? 😃

gedankenstuecke added the feature label Aug 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fallback parsing for Genotype files (esp. 23andMe) #428

Fallback parsing for Genotype files (esp. 23andMe) #428

BenjaminHCCarr commented Aug 25, 2017

gedankenstuecke commented Aug 25, 2017

gedankenstuecke commented Jan 18, 2018

Fallback parsing for Genotype files (esp. 23andMe) #428

Fallback parsing for Genotype files (esp. 23andMe) #428

Comments

BenjaminHCCarr commented Aug 25, 2017

gedankenstuecke commented Aug 25, 2017

gedankenstuecke commented Jan 18, 2018