Skip to content

Set of methods and solutions for dealing with task scheduling and dependencies in ML data pipelines

License

Notifications You must be signed in to change notification settings

davorborcic/pipelineConfig

Repository files navigation

Data Pipeline Configurator

Set of methods and solutions for dealing with task scheduling and dependencies in ML data pipelines

The main module is data_pipeline.py and it has 2 classes:

  • Pipeline: represents data pipeline (a DAG structure)
  • PipelineElement: represents individual elements of the data pipeline, equivalent to a vertix / node in a graph

A pipeline is defined by specifying a list of pipeline elements and then providing edges, indexed (starting with 0) by the sequence in which the elements have been entered.

See analyze_dependencies.py for the usage example, for the workflow in the figure below

data pipeline graph example with numbered elements Example of the data pipeline workflow, with numbered elements

References

Sedgewick, Robert, and Kevin Wayne, Algorithms, Addison-Wesley, 2014

About

Set of methods and solutions for dealing with task scheduling and dependencies in ML data pipelines

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages