Automatic Data Inferring #30

Varunram · 2019-09-19T10:11:07Z

Data inferences

A problem that arose while consuming data from public facing sites was that data was formatted into different names and it was non trivial to identify which names were associated with standard measurable values. This problem would compound when there are multiple providers uploading data and when the platform is not able to figure out where said data belongs. One way to approach this would be to have a standard list and ask uploaders to transfer data from the format they have into the new format that we define. But, as past efforts have shown, this is unsustainable and companies and countries are not incentivised to do this and as a result will not do this.

Assume there are three inputs - Input1, Input2, and Input3 with three fields to report

Input1 defines them to be Field1, Field2, Field3
Input2 defines them to be F1, F2, F3
Input3 defines them to be f1, f2, f3

Assume that the platform expects these fields to be defined as field1, field2, field3. The platform must have a way to infer that the respective fields are mapped to their correct domains by parsing their names. This model could be powered by a simple text parser, a ML based learning algorithm, etc. The idea is that this parsing layer must be a blackbox and everything put into it must come out cleanly formatted.

This blackbox could also potentially be used in other places where we might need inferential analysis (API endpoints, Names, etc). This would be a nice side project that can be easily plugged into the platform and does not depend on the platform to make any changes (one could write a parser that works on 100 examples and then run it on the platform)

Varunram mentioned this issue Sep 19, 2019

Data storage and retrieval #31

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic Data Inferring #30

Automatic Data Inferring #30

Varunram commented Sep 19, 2019

Automatic Data Inferring #30

Automatic Data Inferring #30

Comments

Varunram commented Sep 19, 2019

Data inferences