Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
ravi-kumar-pilla committed Nov 15, 2024
2 parents b0a8575 + 5a7f11a commit 41c93cb
Show file tree
Hide file tree
Showing 14 changed files with 245 additions and 14 deletions.
Binary file added .github/img/backend-architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .github/img/frontend-architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
48 changes: 48 additions & 0 deletions .github/workflows/label-community-issues.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
name: Label Community Issues

on:
issues:
types:
- opened

jobs:
label:
runs-on: ubuntu-latest
steps:
- name: Check if issue author is a member of Kedro org
uses: actions/github-script@v6
id: membership
with:
github-token: ${{ secrets.GH_TAGGING_TOKEN }}
result-encoding: string
script: |
try {
const result = await github.rest.orgs.getMembershipForUser({
org: "kedro-org",
username: '${{ github.actor }}'
})
console.log(result?.data?.state)
if (result?.data?.state == "active"){
console.log("%s: detected as an active member of Kedro org", '${{ github.actor }}')
return "member";
} else {
console.log("%s: not detected as active member of Kedro org", '${{ github.actor }}')
return "notMember";
}
} catch (error) {
console.log("%s: Error occured and marked user as notMember", '${{ github.actor }}')
console.log("Error", error.stack);
console.log("Error", error.name);
console.log("Error", error.message);
return "notMember";
}
- name: Label issue if author is from community
if: ${{ steps.membership.outputs.result == 'notMember' }}
uses: actions-ecosystem/action-add-labels@v1
with:
github_token: ${{ secrets.GH_TAGGING_TOKEN }}
labels: 'Community'
20 changes: 20 additions & 0 deletions .github/workflows/no-response.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
name: No Response

on:
issue_comment:
types: [created]
schedule:
# Run every day at 9am (UTC time)
- cron: '0 9 * * *'

jobs:
noResponse:
runs-on: ubuntu-latest
steps:
- uses: lee-dohm/[email protected]
with:
token: ${{ secrets.GITHUB_TOKEN }}
responseRequiredLabel: "support: needs more info"
daysUntilClose: 28
closeComment: >-
This issue has been closed due to lack of information. Feel free to re-open this issue if you're facing a similar problem. Please provide as much information as possible so we can help resolve your issue.
9 changes: 8 additions & 1 deletion ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ For further information, see also:

- [Kedro-Viz contributing documentation](CONTRIBUTING.md), which covers how to start development on the project
- [Kedro-Viz style guide](STYLE_GUIDE.md), which walks through our standards and recommended best practices for our codebase
- [Kedro-Viz Architecture Diagram](https://miro.com/app/board/uXjVKhNg1RE=/?moveToWidget=3458764606468376036&cot=10), to see a high level overview of both back-end and front-end and how they are connected.

## High-level Overview

Expand Down Expand Up @@ -62,7 +63,7 @@ The `localStorage` state is updated automatically on every Redux store update, v

## Data ingestion

![Kedro-Viz data flow diagram](/.github/img/app-architecture-data-flow.png)
![Kedro-Viz data flow diagram](/.github/img/frontend-architecture.png)

Kedro-Viz currently utilizes two different methods of data ingestion: the Redux setup for the pipeline and flowchart-view related components, and GraphQL via Apollo Client for the experiment tracking components.

Expand Down Expand Up @@ -147,3 +148,9 @@ Kedro-Viz includes a graph layout engine, for details see the [layout engine doc
Our layout engine runs inside a web worker, which asynchronously performs these expensive calculations in a separate CPU thread, in order to avoid this blocking other operations on the main thread (e.g. CSS transitions and other state updates).

The app uses [redux-watch](https://github.com/ExodusMovement/redux-watch) with a graph input selector to watch the store for state changes relevant to the graph layout. If the layout needs to change, this listener dispatches an asynchronous action which sends a message to the web worker to instruct it to calculate the new layout. Once the layout worker completes its calculations, it returns a new action to update the store's `state.graph` property with the new layout. Updates to the graph input state during worker calculations will interrupt the worker and cause it to start over from scratch.

## Backend Architecture

![Kedro-Viz backend architecture](/.github/img/backend-architecture.png)

The backend of Kedro-Viz serves as the data provider and API layer that interacts with Kedro projects and manages data access for visualisations in the frontend. It offers both REST and GraphQL APIs to support data retrieval for the frontend, allowing access to pipeline structures, node-specific details, and experiment tracking data. Key components include the `DataAccessManager`, which interfaces with data `Repositories` to fetch and structure data. The CLI enables users launch with Kedro-Viz from the command line, while deploy and build options enables seamless sharing of pipeline visualisations on any static website hosting platform.
3 changes: 3 additions & 0 deletions RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,9 @@ Please follow the established format:
- Display full dataset type with library prefix in metadata panel (#2136)
- Enable SQLite WAL mode for Azure ML to fix database locking issues (#2131)
- Replace `flake8`, `isort`, `pylint` and `black` by `ruff` (#2149)
- Refactor `DatasetStatsHook` to avoid showing error when dataset doesn't have file size info (#2174)
- Fix 404 error when accessing the experiment tracking page on the demo site (#2179)
- Add check for port availability before starting Kedro Viz to prevent unintended browser redirects when the port is already in use (#2176)


# Release 10.0.0
Expand Down
37 changes: 37 additions & 0 deletions docs/source/kedro-viz_visualisation.md
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,43 @@ The visualisation now includes the layers:

![](./images/pipeline_visualisation_with_layers.png)

Duplicated definitions like:

```yaml
metadata:
kedro-viz:
layer: raw
```

can be avoided by leveraging YAML native syntax for anchors and aliases.

Use an anchor (`&`) first, to create a reusable piece of configuration:

```yaml
_raw_layer: &raw_layer
metadata:
kedro-viz:
layer: 01_raw
```

And then use aliases (`*`) to reference it:

```yaml
companies:
type: pandas.CSVDataset
filepath: data/01_raw/companies.csv
<<: *raw_layer
reviews:
type: pandas.CSVDataset
filepath: data/01_raw/reviews.csv
<<: *raw_layer
# Same for other datasets of the raw layer...
```

See [this example from the Kedro docs](https://docs.kedro.org/en/stable/data/data_catalog_yaml_examples.html#load-multiple-datasets-with-similar-configuration-using-yaml-anchors) for more details.

## Share a pipeline visualisation

You can save a pipeline structure within a Kedro-Viz visualisation directly from the terminal as follows:
Expand Down
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
},
"proxy": "http://localhost:4142/",
"scripts": {
"build": "cross-env GENERATE_SOURCEMAP=false react-scripts build",
"build": "cross-env GENERATE_SOURCEMAP=false react-scripts build && cp ./build/index.html ./build/404.html",
"postbuild": "rm -rf build/api",
"start": "REACT_APP_DATA_SOURCE=$DATA NODE_OPTIONS=\"--dns-result-order=ipv4first\" npm-run-all -p start:app start:lib",
"start:dev": "rm -rf node_modules/.cache && npm start",
Expand Down
28 changes: 18 additions & 10 deletions package/kedro_viz/integrations/kedro/hooks.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
from pathlib import Path, PurePosixPath
from typing import Any, Union

import fsspec
from kedro.framework.hooks import hook_impl
from kedro.io import DataCatalog
from kedro.io.core import get_filepath_str
Expand Down Expand Up @@ -141,19 +142,26 @@ def get_file_size(self, dataset: Any) -> Union[int, None]:
Args:
dataset: A dataset instance for which we need the file size
Returns: file size for the dataset if file_path is valid, if not returns None
Returns:
File size for the dataset if available, otherwise None.
"""

if not (hasattr(dataset, "_filepath") and dataset._filepath):
return None

try:
file_path = get_filepath_str(
PurePosixPath(dataset._filepath), dataset._protocol
)
return dataset._fs.size(file_path)
if hasattr(dataset, "filepath") and dataset.filepath:
filepath = dataset.filepath
# Fallback to private '_filepath' for known datasets
elif hasattr(dataset, "_filepath") and dataset._filepath:
filepath = dataset._filepath
else:
return None

fs, path_in_fs = fsspec.core.url_to_fs(filepath)
if fs.exists(path_in_fs):
file_size = fs.size(path_in_fs)
return file_size
else:
return None

except Exception as exc:
except Exception as exc: # pragma: no cover
logger.warning(
"Unable to get file size for the dataset %s: %s", dataset, exc
)
Expand Down
5 changes: 4 additions & 1 deletion package/kedro_viz/launchers/cli/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@ def run(
from kedro_viz.launchers.utils import (
_PYPROJECT,
_check_viz_up,
_find_available_port,
_find_kedro_project,
_start_browser,
_wait_for,
Expand Down Expand Up @@ -145,6 +146,9 @@ def run(
"https://github.com/kedro-org/kedro-viz/releases.",
"yellow",
)

port = _find_available_port(host, port)

try:
if port in _VIZ_PROCESSES and _VIZ_PROCESSES[port].is_alive():
_VIZ_PROCESSES[port].terminate()
Expand Down Expand Up @@ -186,7 +190,6 @@ def run(
)

display_cli_message("Starting Kedro Viz ...", "green")

viz_process.start()

_VIZ_PROCESSES[port] = viz_process
Expand Down
29 changes: 29 additions & 0 deletions package/kedro_viz/launchers/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
used in the `kedro_viz.launchers` package."""

import logging
import socket
import sys
import webbrowser
from pathlib import Path
from time import sleep, time
Expand Down Expand Up @@ -80,6 +82,33 @@ def _check_viz_up(host: str, port: int):
return response.status_code == 200


def _is_port_in_use(host: str, port: int):
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
return s.connect_ex((host, port)) == 0


def _find_available_port(host: str, start_port: int, max_attempts: int = 5) -> int:
max_port = start_port + max_attempts - 1
port = start_port
while port <= max_port:
if not _is_port_in_use(host, port):
return port
display_cli_message(
f"Port {port} is already in use. Trying the next port...",
"yellow",
)
port += 1
display_cli_message(
f"Error: All ports in the range {start_port}-{max_port} are in use.",
"red",
)
display_cli_message(
"Please specify a different port using the '--port' option.",
"red",
)
sys.exit(1)


def _is_localhost(host: str) -> bool:
"""Check whether a host is a localhost"""
return host in ("127.0.0.1", "localhost", "0.0.0.0")
Expand Down
41 changes: 41 additions & 0 deletions package/tests/test_integrations/test_hooks.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,3 +137,44 @@ def test_get_file_size(dataset, example_dataset_stats_hook_obj, example_csv_data
assert example_dataset_stats_hook_obj.get_file_size(
example_csv_dataset
) == example_csv_dataset._fs.size(file_path)


def test_get_file_size_file_does_not_exist(example_dataset_stats_hook_obj, mocker):
class MockDataset:
def __init__(self):
self._filepath = "/non/existent/path.csv"

mock_dataset = MockDataset()
mock_fs = mocker.Mock()
mock_fs.exists.return_value = False

mocker.patch(
"fsspec.core.url_to_fs",
return_value=(mock_fs, "/non/existent/path.csv"),
)

# Call get_file_size and expect it to return None
file_size = example_dataset_stats_hook_obj.get_file_size(mock_dataset)
assert file_size is None


def test_get_file_size_public_filepath(example_dataset_stats_hook_obj, mocker):
class MockDataset:
def __init__(self):
self.filepath = "/path/to/existing/file.csv"

mock_dataset = MockDataset()

# Mock fs.exists to return True
mock_fs = mocker.Mock()
mock_fs.exists.return_value = True
mock_fs.size.return_value = 456

mocker.patch(
"fsspec.core.url_to_fs",
return_value=(mock_fs, "/path/to/existing/file.csv"),
)

# Call get_file_size and expect it to return the mocked file size
file_size = example_dataset_stats_hook_obj.get_file_size(mock_dataset)
assert file_size == 456
19 changes: 18 additions & 1 deletion package/tests/test_launchers/test_cli/test_run.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
from kedro_viz.autoreload_file_filter import AutoreloadFileFilter
from kedro_viz.launchers.cli import main
from kedro_viz.launchers.cli.run import _VIZ_PROCESSES
from kedro_viz.launchers.utils import _PYPROJECT
from kedro_viz.launchers.utils import _PYPROJECT, _find_available_port
from kedro_viz.server import run_server


Expand Down Expand Up @@ -217,6 +217,9 @@ def test_kedro_viz_command_run_server(
"kedro_viz.launchers.utils._wait_for.__defaults__", (True, 1, True, 1)
)

# Mock _is_port_in_use to speed up test.
mocker.patch("kedro_viz.launchers.utils._is_port_in_use", return_value=False)

# Mock finding kedro project
mocker.patch(
"kedro_viz.launchers.utils._find_kedro_project",
Expand Down Expand Up @@ -394,3 +397,17 @@ def test_kedro_viz_command_with_autoreload(
kwargs={**run_process_kwargs},
)
assert run_process_kwargs["kwargs"]["port"] in _VIZ_PROCESSES

# Test case to simulate port occupation and check available port selection
def test_find_available_port_with_occupied_ports(self, mocker):
mock_is_port_in_use = mocker.patch("kedro_viz.launchers.utils._is_port_in_use")

# Mock ports 4141, 4142 being occupied and 4143 is free
mock_is_port_in_use.side_effect = [True, True, False]

available_port = _find_available_port("127.0.0.1", 4141)

# Assert that the function returns the first free port, 4143
assert (
available_port == 4143
), "Expected port 4143 to be returned as the available port"
18 changes: 18 additions & 0 deletions package/tests/test_launchers/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@

from kedro_viz.launchers.utils import (
_check_viz_up,
_find_available_port,
_find_kedro_project,
_is_project,
_start_browser,
Expand Down Expand Up @@ -99,3 +100,20 @@ def test_toml_bad_encoding(self, mocker):
def test_find_kedro_project(project_dir, is_project_found, expected, mocker):
mocker.patch("kedro_viz.launchers.utils._is_project", return_value=is_project_found)
assert _find_kedro_project(Path(project_dir)) == expected


def test_find_available_port_all_ports_occupied(mocker):
mocker.patch("kedro_viz.launchers.utils._is_port_in_use", return_value=True)
mock_display_message = mocker.patch("kedro_viz.launchers.utils.display_cli_message")

# Check for SystemExit when all ports are occupied
with pytest.raises(SystemExit) as exit_exception:
_find_available_port("127.0.0.1", 4141, max_attempts=5)
assert exit_exception.value.code == 1

mock_display_message.assert_any_call(
"Error: All ports in the range 4141-4145 are in use.", "red"
)
mock_display_message.assert_any_call(
"Please specify a different port using the '--port' option.", "red"
)

0 comments on commit 41c93cb

Please sign in to comment.