Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve notebook execution performance #6

Merged
merged 21 commits into from
Sep 23, 2024
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
02df14d
update viewer version
molotgor Jul 9, 2024
e7049f6
Added papermill custom engine to reuse it for notebook execution
Nikita-Smirnov-Exactpro Jul 9, 2024
dad3a3c
Corrected j-sp image in compose
Nikita-Smirnov-Exactpro Jul 9, 2024
8df9fbe
update file path to absolute instead of relative
molotgor Jul 26, 2024
61bc47d
update versions in compose
molotgor Jul 26, 2024
f9c7795
update version of viewer in compose
molotgor Jul 29, 2024
5567dad
update viewer version
molotgor Aug 5, 2024
d12f90b
update viewer version
molotgor Aug 5, 2024
db4f850
add ability for getting customization file from result
molotgor Aug 15, 2024
bd25fe4
update version in compose
molotgor Aug 15, 2024
6fa424c
update vieewer version
molotgor Aug 15, 2024
0467005
update viewer version
molotgor Aug 26, 2024
c45fa85
update viewer version
molotgor Sep 18, 2024
0956df7
Implemented logic to use a separate papermill notebook client for eac…
Nikita-Smirnov-Exactpro Sep 18, 2024
148436e
Implemented async_execute for papermill engine
Nikita-Smirnov-Exactpro Sep 20, 2024
6ec70be
Added EngineBusyError
Nikita-Smirnov-Exactpro Sep 20, 2024
5282e27
Updated j-sp image in compose.yml
Nikita-Smirnov-Exactpro Sep 23, 2024
73c7bb1
Configured compose.yml to build j-sp image instead of pulling from re…
Nikita-Smirnov-Exactpro Sep 23, 2024
1c66faf
Configured compose.yml to build j-sp image instead of pulling from re…
Nikita-Smirnov-Exactpro Sep 23, 2024
1c43f92
updated readme with changes in rpt-viewer
molotgor Sep 23, 2024
e84a588
Optimised Dockerfile
Nikita-Smirnov-Exactpro Sep 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions NOTICE
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This project includes code from https://github.com/nteract/papermill/blob/2.6.0 which is licensed under the BSD License.
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ This python server is made to launch Jupyter notebooks (*.ipynb) and get results
* `notebooks` (Default value: /home/jupyter-notebook/) - path to the directory with notebooks. `j-sp` search files with `ipynb` extension recursively in the specified folder.
* `results` (Default value: /home/jupyter-notebook/results) - path to the directory for run results. `j-sp` resolves result file with `jsonl` extension against specified folder.
* `logs` (Default value: /home/jupyter-notebook/logs) - path to the directory for run logs. `j-sp` puts run logs to specified folder.
* `out-of-use-engine-time` (Default value: 3600) - out-of-use time interval in seconds. `j-sp` unregisters engine related to a notebook when user doesn't run the notebook more than this time

### mounting:

Expand Down Expand Up @@ -37,6 +38,7 @@ spec:
notebooks: /home/jupyter-notebook/
results: /home/jupyter-notebook/j-sp/results/
logs: /home/jupyter-notebook/j-sp/logs/
out-of-use-engine-time: 3600
mounting:
- path: /home/jupyter-notebook/
pvcName: jupyter-notebook
Expand Down Expand Up @@ -119,6 +121,14 @@ docker compose build

## Release notes:

### 0.0.6

* Added papermill custom engine to reuse it for notebook execution.
A separate engine is registered for each notebook and unregistered after 1 hour out-of-use time by default.
* update local run with jupyter-notebook:
* updated th2-rpt-viewer:
* `JSON Reader` page pulls execution status each 50 ms instead of 1 sec

### 0.0.5

* added `umask 0007` to `~/.bashrc` file to provide rw file access for `users` group
Expand Down
259 changes: 259 additions & 0 deletions custom_engines.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,259 @@
# Copyright 2024 Exactpro (Exactpro Systems Limited)
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import logging.config
import time
from datetime import datetime

from papermill.clientwrap import PapermillNotebookClient
from papermill.engines import NBClientEngine, NotebookExecutionManager, PapermillEngines
from papermill.utils import remove_args, merge_kwargs, logger


class EngineKey:
def __init__(self, client_id, notebook_file):
self.client_id = client_id
self.notebook_file = notebook_file

def __hash__(self):
# Combine attributes for a unique hash
return hash((self.client_id, self.notebook_file))

def __eq__(self, other):
if isinstance(other, EngineKey):
return self.client_id == other.client_id and self.notebook_file == other.notebook_file
return False

def __iter__(self):
return iter((self.client_id, self.notebook_file))

def __str__(self):
return f"{self.client_id}:{self.notebook_file}"


class EngineHolder:
_key: EngineKey
_client: PapermillNotebookClient
_last_used_time: float
_busy: bool = False

def __init__(self, key: EngineKey, client: PapermillNotebookClient):
self._key = key
self._client = client
self._last_used_time = time.time()

def __str__(self):
return f"Engine(key={self._key}, last_used_time={self._last_used_time}, is_busy={self._busy})"

async def async_execute(self, nb_man):
if self._busy:
raise EngineBusyError(
f"Notebook client related to '{self._key}' has been busy since {self._get_last_used_date_time()}")

try:
self._busy = True
# accept new notebook into (possibly) existing client
self._client.nb_man = nb_man
self._client.nb = nb_man.nb
# reuse client connection to existing kernel
output = await self._client.async_execute(cleanup_kc=False)
# renumber executions
for i, cell in enumerate(nb_man.nb.cells):
if 'execution_count' in cell:
cell['execution_count'] = i + 1

return output
finally:
self._busy = False

def get_last_used_time(self) -> float:
return self._last_used_time

def close(self):
self._client = None

def _get_last_used_date_time(self):
return datetime.fromtimestamp(self._last_used_time)


class EngineBusyError(RuntimeError):
pass


class CustomEngine(NBClientEngine):
out_of_use_engine_time: int = 60 * 60
metadata_dict: dict = {}
logger: logging.Logger

# The code of this method is derived from https://github.com/nteract/papermill/blob/2.6.0 under the BSD License.
# Original license follows:
#
# BSD 3-Clause License
#
# Copyright (c) 2017, nteract
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice, this
# list of conditions and the following disclaimer.
#
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# * Neither the name of the copyright holder nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
# Modified by Exactpro for https://github.com/th2-net/th2-json-stream-provider-py
@classmethod
async def async_execute_notebook(
cls,
nb,
kernel_name,
output_path=None,
progress_bar=True,
log_output=False,
autosave_cell_every=30,
**kwargs,
):
"""
A wrapper to handle notebook execution tasks.

Wraps the notebook object in a `NotebookExecutionManager` in order to track
execution state in a uniform manner. This is meant to help simplify
engine implementations. This allows a developer to just focus on
iterating and executing the cell contents.
"""
nb_man = NotebookExecutionManager(
nb,
output_path=output_path,
progress_bar=progress_bar,
log_output=log_output,
autosave_cell_every=autosave_cell_every,
)

nb_man.notebook_start()
try:
await cls.async_execute_managed_notebook(nb_man, kernel_name, log_output=log_output, **kwargs)
finally:
nb_man.cleanup_pbar()
nb_man.notebook_complete()

return nb_man.nb

# this method has been copied from the issue comment
# https://github.com/nteract/papermill/issues/583#issuecomment-791988091
@classmethod
async def async_execute_managed_notebook(
cls,
nb_man,
kernel_name,
log_output=False,
stdout_file=None,
stderr_file=None,
start_timeout=60,
execution_timeout=None,
**kwargs
):
"""
Performs the actual execution of the parameterized notebook locally.

Args:
nb_man (NotebookExecutionManager): Wrapper for execution state of a notebook.
kernel_name (str): Name of kernel to execute the notebook against.
log_output (bool): Flag for whether or not to write notebook output to the
configured logger.
start_timeout (int): Duration to wait for kernel start-up.
execution_timeout (int): Duration to wait before failing execution (default: never).
"""

def create_client(): # TODO: should be static
# Exclude parameters that named differently downstream
safe_kwargs = remove_args(['timeout', 'startup_timeout'], **kwargs)

# Nicely handle preprocessor arguments prioritizing values set by engine
final_kwargs = merge_kwargs(
safe_kwargs,
timeout=execution_timeout if execution_timeout else kwargs.get('timeout'),
startup_timeout=start_timeout,
kernel_name=kernel_name,
log=logger,
log_output=log_output,
stdout_file=stdout_file,
stderr_file=stderr_file,
)
cls.logger.info(f"Created papermill notebook client for {key}")
return PapermillNotebookClient(nb_man, **final_kwargs)

# TODO: pass client_id
key = EngineKey("", nb_man.nb['metadata']['papermill']['input_path'])
engine_holder: EngineHolder = cls.get_or_create_engine_metadata(key, create_client)
return await engine_holder.async_execute(nb_man)

@classmethod
def create_logger(cls):
cls.logger = logging.getLogger('engine')

@classmethod
def set_out_of_use_engine_time(cls, value: int):
cls.out_of_use_engine_time = value

@classmethod
def get_or_create_engine_metadata(cls, key: EngineKey, func):
cls.remove_out_of_date_engines(key)

engine_holder: EngineHolder = cls.metadata_dict.get(key)
if engine_holder is None:
engine_holder = EngineHolder(key, func())
cls.metadata_dict[key] = engine_holder

return engine_holder

@classmethod
def remove_out_of_date_engines(cls, exclude_key: EngineKey):
now = time.time()
dead_line = now - cls.out_of_use_engine_time
out_of_use_engines = [key for key, metadata in cls.metadata_dict.items() if
key != exclude_key and metadata.get_last_used_time() < dead_line]
for key in out_of_use_engines:
engine_holder: EngineHolder = cls.metadata_dict.pop(key)
engine_holder.close()
cls.logger.info(
f"unregistered '{key}' papermill engine, last used time {now - engine_holder.get_last_used_time()} sec ago")


class CustomEngines(PapermillEngines):
async def async_execute_notebook_with_engine(self, engine_name, nb, kernel_name, **kwargs):
"""Fetch a named engine and execute the nb object against it."""
return await self.get_engine(engine_name).async_execute_notebook(nb, kernel_name, **kwargs)


# Instantiate a ExactproPapermillEngines instance, register Handlers and entrypoints
exactpro_papermill_engines = CustomEngines()
exactpro_papermill_engines.register(None, CustomEngine)
exactpro_papermill_engines.register_entry_points()
4 changes: 2 additions & 2 deletions local-run/with-jupyter-notebook/compose.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
services:
json_stream_provider:
image: ghcr.io/th2-net/th2-json-stream-provider-py:0.0.5-dev
image: ghcr.io/th2-net/th2-json-stream-provider-py:0.0.6-notebooks-launch-time-10961501708-6ec70be
ports:
- "8081:8080"
volumes:
Expand All @@ -26,7 +26,7 @@ services:
- th2_network

th2_rpt_viewer:
image: ghcr.io/th2-net/th2-rpt-viewer:5.2.8
image: ghcr.io/th2-net/th2-rpt-viewer:5.2.9-notebooks-launch-time-10918718527
ports:
- "8083:8080"
networks:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
{
"notebooks": "/home/jovyan",
"results": "/home/jovyan/j-sp/results",
"logs": "/home/jovyan/j-sp/logs"
"logs": "/home/jovyan/j-sp/logs",
"out-of-use-engine-time": 3600
}
20 changes: 17 additions & 3 deletions log_configuratior.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,24 @@
# Copyright 2024 Exactpro (Exactpro Systems Limited)
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import logging.config
import os

log4py_file = '/var/th2/config/log4py.conf'
def configureLogging():


def configure_logging():
if os.path.exists(log4py_file):
logging.config.fileConfig(log4py_file, disable_existing_loggers=False)
logging.getLogger(__name__).info(f'Logger is configured by {log4py_file} file')
Expand Down Expand Up @@ -30,5 +46,3 @@ def configureLogging():
}
logging.config.dictConfig(default_logging_config)
logging.getLogger(__name__).info('Logger is configured by default')


2 changes: 1 addition & 1 deletion package_info.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{
"package_name": "th2-json-stream-provider",
"package_version": "0.0.5"
"package_version": "0.0.6"
}
Loading