You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, pudl_etl and ferc_to_sqlite cli commands use the dagster.build_reconstructable_job method for executing multi process dagster jobs. build_reconstructable_job is an experimental method and is kind of confusing. We can likely completely replace our pudl_etl and ferc_to_sqlite cli command code by just creating preconfigured jobs and executing them with the dagster cli:
dagster job execute <name of job>
Jobs will likely be the ones we currently have plus a nightly_build_etl_full and a nightly_build_ferc_to_sqlite_full job. If we move to this, we'll need to define the preconfigured jobs in python. We mostly do this now with the exception of the args people to the pudl_etl and ferc_to_sqlite cli commands.
How can we incorporate pudl_etl arguments into the dagster configuration system? Current args that aren't included right now are loglevel and logfile. Same args for ferc_to_sqlite with the addition of the dataset_only arg.
How do we want to generate the configurations? 90% of our config is generated in pud.etl.__init__.py via a few strategies:
Loading default configuration of dagster resources
zaneselvans
added
cli
Scripts and other command line interfaces to PUDL.
dagster
Issues related to our use of the Dagster orchestrator
labels
Jul 3, 2024
Currently,
pudl_etl
andferc_to_sqlite
cli commands use thedagster.build_reconstructable_job
method for executing multi process dagster jobs.build_reconstructable_job
is an experimental method and is kind of confusing. We can likely completely replace ourpudl_etl
andferc_to_sqlite
cli command code by just creating preconfigured jobs and executing them with the dagster cli:Jobs will likely be the ones we currently have plus a
nightly_build_etl_full
and anightly_build_ferc_to_sqlite_full
job. If we move to this, we'll need to define the preconfigured jobs in python. We mostly do this now with the exception of the args people to thepudl_etl
andferc_to_sqlite
cli commands.How can we incorporate
pudl_etl
arguments into the dagster configuration system? Current args that aren't included right now areloglevel
andlogfile
. Same args forferc_to_sqlite
with the addition of thedataset_only
arg.How do we want to generate the configurations? 90% of our config is generated in
pud.etl.__init__.py
via a few strategies:Loading default configuration of dagster resources
pudl/src/pudl/etl/__init__.py
Lines 262 to 266 in 548401f
Using default configuration + asset selection
pudl/src/pudl/etl/__init__.py
Lines 267 to 272 in 548401f
Loading configuration from a yaml file
pudl/src/pudl/etl/__init__.py
Lines 273 to 284 in 548401f
We also have a
default_config
dictionary that should shared by all jobs:pudl/src/pudl/etl/__init__.py
Lines 211 to 222 in 548401f
Tasks
The text was updated successfully, but these errors were encountered: