-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
introduce different dataloader categories #8
Comments
We should also consider that not all data loaders have a simple update logic. I.e. they have to perform complex oerations to define the updates. Example: The loading script that generates Btw gene databases are not static 😄 |
I was allready afraid thats the case. but my brain just wouldnt come up with a good example at 1am :D
imho the dataloader is the problem in this case :) What about a flag to fragged text, or a simple logic like "when textfraggments are on the node, no fragging is needed anymore" |
To prevent messed up data and enable possible new features we need to categorize dataloaders
none idempotent dataloader
Dataloaders that only run once inital. these are for static data like gene databases
idempotent dataloader
Dataloaders that will evolve and data will probably change. Like publication data in the CORD19 dataset which iterates from time to time.
If a rerun is neccesary could be decide by changing docker hub hashed (changing dataloader image)
service dataloaders
Data that will change in any case regulary, like covid case statistics.
These dataloaders should run periodically
The text was updated successfully, but these errors were encountered: