Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic class definitions for handling upload of data from table specs #24

Open
azimov opened this issue Mar 29, 2023 · 0 comments
Open

Comments

@azimov
Copy link
Collaborator

azimov commented Mar 29, 2023

Currently the uploadResults function is limited in terms of allowing extendability that could allow packages to easily implement customizable upload functions.

The idea here is that these classes could allow users to modify data before upload, validate data or perform other tasks in the upload pipeline in a way that works well in the generic case and allows customisable complexity in a consistent way.

The initial usecase is to support a complicated example from the requirements of PLP here that works but could be implemented in a more consistent way.

Some requirements to gradually implement:

  • Define what generic classes are needed
  • Implement default behaviour that works as now (upload data and validate or overwrite according to specifications)
  • Support complex modifications that allow modifications of uploaded data
  • Improve upon current implementation by supporting improved loading concepts (load tables, table partition for large tables, cross platform indexes)
  • Allow loading from object storage like AWS S3
  • Fast loading with multiprocess operations
  • Support transactional loading of data - on error rollback certain tables or entire result sets if desired
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant