Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Config.yaml step config only used in first step when calling step multiple times #2145

Open
1 task done
christianversloot opened this issue Dec 13, 2023 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@christianversloot
Copy link
Contributor

Contact Details [Optional]

No response

System Information

zenml 0.50.0

What happened?

We have a pipeline with a step named run_model:

@step
def run_model(X_train: np.ndarray, y_train: np.ndarray, X_test: np.ndarray,
              y_test: np.ndarray, name: str,  configuration: Dict):

Using the new pipeline/step syntax, it is called multiple times:

 for model in models:
        if model_config[model]['active']:
            run_model(X_train, y_train, X_test, y_test, model, configuration, id=model)

We're using a config.yaml based configuration for the step:

run_model:
    enable_cache: false
    experiment_tracker: "trackername"
    settings:
      experiment_tracker.mlflow:
        experiment_name: "experimentname"
        nested: True

However, the configuration is only used in run_model, not in run_model_2, run_model_3 and run_model_4, of which the names are automatically generated.

Is this a bug?
If not, how can we avoid this from happening other than manually specifying the config multiple times (this would be somewhat redundant / not DRY).

Thanks!

image

Reproduction steps

...

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@christianversloot christianversloot added the bug Something isn't working label Dec 13, 2023
@Vishal-Padia
Copy link
Contributor

To use the same step instance and configuration don't specify the id parameter when calling run_model. This will reuse the same step instance each time:

for model in models:
  if model_config[model]['active']:
    run_model(X_train, y_train, X_test, y_test, model, configuration)

Or I think you can create the step instance once and reuse it:

model_step = run_model.with_id("model")

for model in models:
  if model_config[model]['active']:
    model_step(X_train, y_train, X_test, y_test, model, configuration)

This isn't a bug - it's just creating new step instances each time run_model is called with a different id. To reuse the configuration, you need to reuse the same step instance.

@strickvl strickvl self-assigned this Feb 5, 2024
@ConX
Copy link

ConX commented Aug 22, 2024

It's not clear whether this is considered a bug or not.

Same as @christianversloot, my expectation was that the configuration will be used across all invocations of a step, but it isn't. I think it would be best to make the behavior such that, by default, all steps use the same configuration despite their dynamic suffix (i.e., _1, _2, etc.).

In my case, I have a step that I invoke twice by passing two different values for one its parameters, expecting all other parameters to be common as defined in the configuration. My current workaround to achieve the latter is by utilizing the YAML anchor notation. For example:

steps:
  my_step:
    parameters: &my_step_params
      shared_param_1: "value_1"
      shared_param_2: "value_2"
  my_step_2:
     parameters: *my_step_params 

@schustmi
Copy link
Contributor

We have this on our roadmap to either apply this to all invocations by default or provide a wildcard syntax that can be used to match multiple step invocations, but until that is done the suggested solution is the YAML anchors mentioned by @ConX

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants