Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align slicing pipeline behavior between Kedro Viz and Kedro Framework #2187

Open
Huongg opened this issue Nov 12, 2024 · 8 comments
Open

Align slicing pipeline behavior between Kedro Viz and Kedro Framework #2187

Huongg opened this issue Nov 12, 2024 · 8 comments

Comments

@Huongg
Copy link
Contributor

Huongg commented Nov 12, 2024

Description:

Ensure that the from-nodes and to-nodes slicing functionality behaves consistently across both the CLI and Viz interfaces.

Currently, Kedro Viz and the Kedro Framework do not behave in the same way, and not exactly what is described in the documentation for the slicing functionality

[SCREENSHOT 1]
image

For example:
When selecting from shuttles to combine step, Kedro Viz currently displays everything upwards from combine step (SCREENSHOT 1), with the run command shown in the kedro viz UI as kedro run --to-nodes="combine step"

(SCREENSHOT 2 below) However Kedro Viz should ideally display only the nodes between from-nodes and to-nodes, such as from companies to combine step . I have created a quick proof of concept demonstrating how this can be achieved on the front-end. Therefore, the expected run command should be kedro run --from-nodes="companies" --to-nodes="combine step”.

[SCREENSHOT 2]
image

In the first MVP, we did not implement the full solution (SCREENSHOT 2) due to time constraints and potential CLI errors when attempting to execute a node that requires inputs that haven't been generated. Further discussion and details can be found here.

Task Scope

The goal of this task is to work with the Kedro Framework team to define the best approach for enabling Viz to work the same way as the Kedro Framework when running from-nodes and to-nodes. This will ensure consistent user experience across both interfaces.

Potential Solutions

There are two potential solutions identified:

  1. Display Notification/Error Message in Kedro-Viz:
    • Kedro-Viz can display a notification or error message if a user attempts to execute a node without having generated its before, there might be an error when they run it.
    • In this case, the run command should ideally show the full syntax kedro run --from-nodes="" --to-nodes=""for the command that would trigger the error (e.g., kedro run --from-nodes="..." --to-nodes="...").
  2. Kedro Framework Syntax Enhancement:
    • The Kedro Framework could consider supporting alternative syntax to handle these edge cases.
  3. Update the documentation to make it clear the condition to run kedro run --from-nodes --to-nodes
@Huongg Huongg added this to Kedro-Viz Nov 12, 2024
@Huongg Huongg moved this to Inbox in Kedro-Viz Nov 12, 2024
@datajoely
Copy link
Contributor

Crazy idea - you could run pyodide and just run Kedro in the browser and not reimplement the logic

@rashidakanchwala
Copy link
Contributor

I discussed this with @idanov, and here are the key takeaways:

  • The slicing logic for both the front-end and back-end should be consistent. If we want to avoid making back-end calls, we can try replicating the back-end logic on the front-end,

  • The flowchart doesn’t need to be "smart." In cases with MemoryDatasets or if there’s no dataset (e.g., when kedro run hasn't been executed), copying and pasting the command from the terminal will cause it to fail and display relevant messages on the terminal, which is acceptable. It’s the user’s responsibility to handle these cases. Kedro-Viz should simply provide a command for running the sliced pipeline, without trying to interpret or adjust for such conditions.

@datajoely
Copy link
Contributor

datajoely commented Nov 13, 2024

Annoyingly we can't install Kedro on Pyodide because of OmegaConf's use of the ANTLR C library.

>>> await micropip.install('kedro', keep_going=True)
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/lib/python3.12/site-packages/micropip/_commands/install.py", line 146, in install
    raise ValueError(
ValueError: Can't find a pure Python 3 wheel for: 'antlr4-python3-runtime==4.9.*'
See: https://pyodide.org/en/stable/usage/faq.html#why-can-t-micropip-find-a-pure-python-wheel-for-a-package

@astrojuanlu
Copy link
Member

We'll have a TD session about this.

@astrojuanlu
Copy link
Member

Some early thoughts: it will probably make sense to split the use case between reporting and development. People navigating a pipeline will likely not be interested in the run command, whereas this will be useful information for people during the development phase.

@datajoely
Copy link
Contributor

I really really recommend that people use dbt for a bit in anger, becasuse the tight linking between the CLI command and the UI is such a productivity boost. Sometimes we can just steal and take competitor/comparator validation in place of our own user research from first principles.

@astrojuanlu
Copy link
Member

I think the goal here is quite clear. The main blocker is how kedro run commands work, how to align that with the UI without copy-pasting the slicing logic to the frontend while also avoid doing backend calls kedro-org/kedro#4113 and finally deciding where and when should we do this.

@Huongg
Copy link
Contributor Author

Huongg commented Nov 19, 2024

I like your suggestion to split the use cases between reporting and development @astrojuanlu . Here are some of my thoughts for further discussion in the TD

  1. Separating Run Command and Slicing
    • These are distinct functionalities and can be tackled independently to simplify the development.
  2. Slicing via the UI for the Reporting Use Case
    • We should conduct quick user testing with two alternative solutions we discussed yesterday:
      • Option 1: Display only from-nodes and to-nodes, with no additional context. (I already have a POC ready for testing this approach.)
      • Option 2: Show from-nodes and to-nodes, along with faded-out “Dependencies” above them, similar to the focus mode view. (Creating a POC for this should also be simple, as it mainly requires adjusting the dependency colours.)
    • Additionally, we could offer users the ability to turn off/on the visibility of dependencies in the UI.
  3. Run Command for the Development Use Case
    • We need to decide on how and when we should show the kedro run commands, not just in slicing pipeline.
    • Improving error messages for the kedro run command would be highly beneficial, especially for debugging during development.

Since these features are not tightly coupled, I can see potential for reusing them in the VS Code extension as well.

cc @stephkaiser @noklam @rashidakanchwala

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Inbox
Development

No branches or pull requests

4 participants