You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
lib-cove can only go so fast because Python's JSON Schema validators are all slow. (There are fast ones in Java and JavaScript.)
Given the size of OCDS datasets, it does not make sense to validate 100% of data. We should instead validate a sample.
I suppose the process would be for the analyst to set a sample rate. (If no sample rate is explicitly set, then the check step should be skipped.) Then, the worker would "roll the dice" on receiving each message, and only process the message if the dice roll succeeds according to the sample rate.
We can provide guidance on an appropriate sample rate, based on the size of the dataset. For some datasets, it's possible to determine the size (because the API offers a count, for example). For others, we might need to count on prior knowledge or other means.
The text was updated successfully, but these errors were encountered:
jpmckinney
added
feature
Relating to loading data from the web API or CLI command
and removed
steps
Relating to specific steps (transforms)
labels
Jun 8, 2022
lib-cove can only go so fast because Python's JSON Schema validators are all slow. (There are fast ones in Java and JavaScript.)
Given the size of OCDS datasets, it does not make sense to validate 100% of data. We should instead validate a sample.
I suppose the process would be for the analyst to set a sample rate. (If no sample rate is explicitly set, then the check step should be skipped.) Then, the worker would "roll the dice" on receiving each message, and only process the message if the dice roll succeeds according to the sample rate.
We can provide guidance on an appropriate sample rate, based on the size of the dataset. For some datasets, it's possible to determine the size (because the API offers a count, for example). For others, we might need to count on prior knowledge or other means.
The text was updated successfully, but these errors were encountered: