Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump KFP Controller Python image #414

Merged
merged 1 commit into from
May 8, 2024

Conversation

kimwnasptd
Copy link
Contributor

@kimwnasptd kimwnasptd commented Mar 27, 2024

The KFP Profile Controller is using Python 3.7 which has 7 Critical CVEs. Updating the image to a newer version to reduce the number of CVEs.

I tested the above image on upstream KF and

  1. The KFP Profile Controller pod runs as expected without errors
  2. the workloads were successfully replicated to user namespaces

In the PR I also used the alpine flavor as it's not based on Debian, since alpine is more focused on security. The image also has significantly less CVEs reported on DockerHub compared to the debian based one
https://hub.docker.com/layers/library/python/3.11.9-alpine/images/sha256-3912f7fe31112ee0f747848328e1a2b225a3aad18d0800bac6e13042642fd202?context=explore
https://hub.docker.com/layers/library/python/3.11.9/images/sha256-106b12f51f3e577da3f1a230db914951e0a75402ed49eaeba391312ba1e3289b?context=explore

@kimwnasptd
Copy link
Contributor Author

kimwnasptd commented Mar 27, 2024

The CI keeps on failing, so not a transient error. I see the following:
https://github.com/canonical/kfp-operators/actions/runs/8451889002/job/23162127152

Logs
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.4.2, pluggy-1.3.0 -- /home/runner/work/kfp-operators/kfp-operators/.tox/bundle-integration-v2/bin/python
cachedir: .tox/bundle-integration-v2/.pytest_cache
rootdir: /home/runner/work/kfp-operators/kfp-operators
configfile: pyproject.toml
plugins: operator-0.29.0, asyncio-0.21.1, anyio-4.0.0
asyncio: mode=strict
collecting ... collected 6 items

tests/integration/test_kfp_functional_v2.py::test_build_and_deploy PASSED
tests/integration/test_kfp_functional_v2.py::test_upload_pipeline Forwarding from 127.0.0.1:8080 -> 3000
Forwarding from [::1]:8080 -> 3000
Handling connection for 8080
Handling connection for 8080
PASSED
tests/integration/test_kfp_functional_v2.py::test_create_and_monitor_run Handling connection for 8080
Handling connection for 8080
Handling connection for 8080
Handling connection for 8080
Handling connection for 8080
Handling connection for 8080
Handling connection for 8080
error: lost connection to pod
Experiment details: http://localhost:8080/#/experiments/details/15b90f71-307b-4432-a677-4d3e56f64be9
Experiment details: http://localhost:8080/#/experiments/details/15b90f71-307b-4432-a677-4d3e56f64be9
Run details: http://localhost:8080/#/runs/details/015e4907-81fe-4ff9-a7b4-1f81ef470f29
FAILED
tests/integration/test_kfp_functional_v2.py::test_create_and_monitor_run ERROR
tests/integration/test_kfp_functional_v2.py::test_create_and_monitor_recurring_run ERROR
tests/integration/test_kfp_functional_v2.py::test_apply_sample_viewer FAILED
tests/integration/test_kfp_functional_v2.py::test_viz_server_healthcheck FAILED
tests/integration/test_kfp_functional_v2.py::test_viz_server_healthcheck ERROR

==================================== ERRORS ====================================
_______________ ERROR at teardown of test_create_and_monitor_run _______________
Traceback (most recent call last):
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/bundle-integration-v2/lib/python3.8/site-packages/urllib3/connection.py", line [17](https://github.com/canonical/kfp-operators/actions/runs/8451889002/job/23162127152#step:5:18)4, in _new_conn
    conn = connection.create_connection(
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/bundle-integration-v2/lib/python3.8/site-packages/urllib3/util/connection.py", line 95, in create_connection
    raise err
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/bundle-integration-v2/lib/python3.8/site-packages/urllib3/util/connection.py", line 85, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/bundle-integration-v2/lib/python3.8/site-packages/urllib3/connectionpool.py", line 714, in urlopen
    httplib_response = self._make_request(

...

  File "/home/runner/work/kfp-operators/kfp-operators/.tox/bundle-integration-v2/lib/python3.8/site-packages/urllib3/connectionpool.py", line 798, in urlopen
    retries = retries.increment(
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/bundle-integration-v2/lib/python3.8/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /apis/v2beta1/experiments/15b90f71-307b-4432-a677-4d3e56f64be9 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f214067f760>: Failed to establish a new connection: [Errno 111] Connection refused'))

Could this be related to kubernetes/kubectl#1169, considering we see a error: lost connection to pod?

@kimwnasptd
Copy link
Contributor Author

Giving it a pass with self-hosted runners.

I'm still hitting the issue of httpx and not being able to talk to the microK8s API server. I remember a solution there was to set the env var NO_PROXY, but then there's also aproxy which is enabled by default in the edge runners that should remove this need. canonical/charmed-kubeflow-uats#27

Looking into it.

@kimwnasptd kimwnasptd force-pushed the kimwnasptd-bump-python-image branch 7 times, most recently from 7248c7b to f31f4d1 Compare March 31, 2024 09:13
@kimwnasptd kimwnasptd force-pushed the kimwnasptd-bump-python-image branch 3 times, most recently from 603bd67 to 4108b18 Compare April 10, 2024 13:53
@kimwnasptd
Copy link
Contributor Author

Reverted everything and rebased on top of latest branch, that included the patch to increase the CI runner space

@kimwnasptd kimwnasptd force-pushed the kimwnasptd-bump-python-image branch 3 times, most recently from 3d0ffcf to 6f4edf2 Compare April 26, 2024 09:31
The KFP Profile Controller is using Python 3.7 which has 7 Critical
CVEs. Updating the image to a newer version to reduce the number of
CVEs.
Copy link
Contributor

@DnPlas DnPlas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kimwnasptd

@kimwnasptd kimwnasptd force-pushed the kimwnasptd-bump-python-image branch from 6f4edf2 to 5be3abd Compare May 6, 2024 11:41
@kimwnasptd kimwnasptd merged commit 0e15f79 into track/2.0 May 8, 2024
46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants