Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure mlflow-minio UATs to run behind a proxy #109

Closed
DnPlas opened this issue Aug 26, 2024 · 4 comments · Fixed by #113
Closed

Configure mlflow-minio UATs to run behind a proxy #109

DnPlas opened this issue Aug 26, 2024 · 4 comments · Fixed by #113
Labels
enhancement New feature or request

Comments

@DnPlas
Copy link
Contributor

DnPlas commented Aug 26, 2024

Context

When running the MLflow UATs from inside a Notebook, we need to be able to run the UATs behind proxy.

What needs to get done

Based on the exploration done in #76 :

  • Add a PodDefault that defines the proxy env vars to the repo
  • Add labels to the workload containers in each test to match the PodDefault
  • Add instructions in the README.md on how to run behind proxy from inside a Notebook

Definition of Done

UATs can be run behind proxy from inside a Notebook

@DnPlas DnPlas added the enhancement New feature or request label Aug 26, 2024
Copy link

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-6166.

This message was autogenerated

@DnPlas DnPlas changed the title Configure mlflow-minio UATs to run behind a proxy #1042 Configure mlflow-minio UATs to run behind a proxy Aug 26, 2024
@DnPlas DnPlas transferred this issue from canonical/bundle-kubeflow Aug 26, 2024
Copy link

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-6169.

This message was autogenerated

@orfeas-k
Copy link
Contributor

1st attempt to run the UAT behind a proxy, after setting up proxied notebook as instructed in README and it failed at the following cell during pd.read_csv()

local = pd.read_csv(LOCAL_CSV, delimiter=";")
uploaded = pd.read_csv(f"s3://{BUCKET}/{UPLOADED_CSV}", delimiter=";",storage_options={
    "key": os.environ["AWS_ACCESS_KEY_ID"],
    "secret": os.environ["AWS_SECRET_ACCESS_KEY"],
    "client_kwargs":{
        "endpoint_url": os.environ["MINIO_ENDPOINT_URL"]
    }
})

with the following error

---------------------------------------------------------------------------
ClientError                               Traceback (most recent call last)
File [/opt/conda/lib/python3.11/site-packages/s3fs/core.py:113](http://10.0.132.52/opt/conda/lib/python3.11/site-packages/s3fs/core.py#line=112), in _error_wrapper(func, args, kwargs, retries)
    112 try:
--> 113     return await func(*args, **kwargs)
    114 except S3_RETRYABLE_ERRORS as e:

File [/opt/conda/lib/python3.11/site-packages/aiobotocore/client.py:411](http://10.0.132.52/opt/conda/lib/python3.11/site-packages/aiobotocore/client.py#line=410), in AioBaseClient._make_api_call(self, operation_name, api_params)
    410     error_class = self.exceptions.from_code(error_code)
--> 411     raise error_class(parsed_response, operation_name)
    412 else:

ClientError: An error occurred (503) when calling the HeadObject operation (reached max retries: 4): Service Unavailable

The above exception was the direct cause of the following exception:

OSError                                   Traceback (most recent call last)
Cell In[13], line 2
      1 local = pd.read_csv(LOCAL_CSV, delimiter=";")
----> 2 uploaded = pd.read_csv(f"s3://{BUCKET}/{UPLOADED_CSV}", delimiter=";",storage_options={
      3     "key": os.environ["AWS_ACCESS_KEY_ID"],
      4     "secret": os.environ["AWS_SECRET_ACCESS_KEY"],
      5     "client_kwargs":{
      6         "endpoint_url": os.environ["MINIO_ENDPOINT_URL"]
      7     }
      8 })

File [/opt/conda/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1026](http://10.0.132.52/opt/conda/lib/python3.11/site-packages/pandas/io/parsers/readers.py#line=1025), in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend)
   1013 kwds_defaults = _refine_defaults_read(
   1014     dialect,
   1015     delimiter,
   (...)
   1022     dtype_backend=dtype_backend,
   1023 )
   1024 kwds.update(kwds_defaults)
-> 1026 return _read(filepath_or_buffer, kwds)

File [/opt/conda/lib/python3.11/site-packages/pandas/io/parsers/readers.py:620](http://10.0.132.52/opt/conda/lib/python3.11/site-packages/pandas/io/parsers/readers.py#line=619), in _read(filepath_or_buffer, kwds)
    617 _validate_names(kwds.get("names", None))
    619 # Create the parser.
--> 620 parser = TextFileReader(filepath_or_buffer, **kwds)
    622 if chunksize or iterator:
    623     return parser

File [/opt/conda/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1620](http://10.0.132.52/opt/conda/lib/python3.11/site-packages/pandas/io/parsers/readers.py#line=1619), in TextFileReader.__init__(self, f, engine, **kwds)
   1617     self.options["has_index_names"] = kwds["has_index_names"]
   1619 self.handles: IOHandles | None = None
-> 1620 self._engine = self._make_engine(f, self.engine)

File [/opt/conda/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1880](http://10.0.132.52/opt/conda/lib/python3.11/site-packages/pandas/io/parsers/readers.py#line=1879), in TextFileReader._make_engine(self, f, engine)
   1878     if "b" not in mode:
   1879         mode += "b"
-> 1880 self.handles = get_handle(
   1881     f,
   1882     mode,
   1883     encoding=self.options.get("encoding", None),
   1884     compression=self.options.get("compression", None),
   1885     memory_map=self.options.get("memory_map", False),
   1886     is_text=is_text,
   1887     errors=self.options.get("encoding_errors", "strict"),
   1888     storage_options=self.options.get("storage_options", None),
   1889 )
   1890 assert self.handles is not None
   1891 f = self.handles.handle

File [/opt/conda/lib/python3.11/site-packages/pandas/io/common.py:728](http://10.0.132.52/opt/conda/lib/python3.11/site-packages/pandas/io/common.py#line=727), in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    725     codecs.lookup_error(errors)
    727 # open URLs
--> 728 ioargs = _get_filepath_or_buffer(
    729     path_or_buf,
    730     encoding=encoding,
    731     compression=compression,
    732     mode=mode,
    733     storage_options=storage_options,
    734 )
    736 handle = ioargs.filepath_or_buffer
    737 handles: list[BaseBuffer]

File [/opt/conda/lib/python3.11/site-packages/pandas/io/common.py:432](http://10.0.132.52/opt/conda/lib/python3.11/site-packages/pandas/io/common.py#line=431), in _get_filepath_or_buffer(filepath_or_buffer, encoding, compression, mode, storage_options)
    427     pass
    429 try:
    430     file_obj = fsspec.open(
    431         filepath_or_buffer, mode=fsspec_mode, **(storage_options or {})
--> 432     ).open()
    433 # GH 34626 Reads from Public Buckets without Credentials needs anon=True
    434 except tuple(err_types_to_retry_with_anon):

File [/opt/conda/lib/python3.11/site-packages/fsspec/core.py:147](http://10.0.132.52/opt/conda/lib/python3.11/site-packages/fsspec/core.py#line=146), in OpenFile.open(self)
    140 def open(self):
    141     """Materialise this as a real open file without context
    142 
    143     The OpenFile object should be explicitly closed to avoid enclosed file
    144     instances persisting. You must, therefore, keep a reference to the OpenFile
    145     during the life of the file-like it generates.
    146     """
--> 147     return self.__enter__()

File [/opt/conda/lib/python3.11/site-packages/fsspec/core.py:105](http://10.0.132.52/opt/conda/lib/python3.11/site-packages/fsspec/core.py#line=104), in OpenFile.__enter__(self)
    102 mode = self.mode.replace("t", "").replace("b", "") + "b"
    104 try:
--> 105     f = self.fs.open(self.path, mode=mode)
    106 except FileNotFoundError as e:
    107     if has_magic(self.path):

File [/opt/conda/lib/python3.11/site-packages/fsspec/spec.py:1303](http://10.0.132.52/opt/conda/lib/python3.11/site-packages/fsspec/spec.py#line=1302), in AbstractFileSystem.open(self, path, mode, block_size, cache_options, compression, **kwargs)
   1301 else:
   1302     ac = kwargs.pop("autocommit", not self._intrans)
-> 1303     f = self._open(
   1304         path,
   1305         mode=mode,
   1306         block_size=block_size,
   1307         autocommit=ac,
   1308         cache_options=cache_options,
   1309         **kwargs,
   1310     )
   1311     if compression is not None:
   1312         from fsspec.compression import compr

File [/opt/conda/lib/python3.11/site-packages/s3fs/core.py:688](http://10.0.132.52/opt/conda/lib/python3.11/site-packages/s3fs/core.py#line=687), in S3FileSystem._open(self, path, mode, block_size, acl, version_id, fill_cache, cache_type, autocommit, size, requester_pays, cache_options, **kwargs)
    685 if cache_type is None:
    686     cache_type = self.default_cache_type
--> 688 return S3File(
    689     self,
    690     path,
    691     mode,
    692     block_size=block_size,
    693     acl=acl,
    694     version_id=version_id,
    695     fill_cache=fill_cache,
    696     s3_additional_kwargs=kw,
    697     cache_type=cache_type,
    698     autocommit=autocommit,
    699     requester_pays=requester_pays,
    700     cache_options=cache_options,
    701     size=size,
    702 )

File [/opt/conda/lib/python3.11/site-packages/s3fs/core.py:2182](http://10.0.132.52/opt/conda/lib/python3.11/site-packages/s3fs/core.py#line=2181), in S3File.__init__(self, s3, path, mode, block_size, acl, version_id, fill_cache, s3_additional_kwargs, autocommit, cache_type, requester_pays, cache_options, size)
   2180         self.details = s3.info(path)
   2181         self.version_id = self.details.get("VersionId")
-> 2182 super().__init__(
   2183     s3,
   2184     path,
   2185     mode,
   2186     block_size,
   2187     autocommit=autocommit,
   2188     cache_type=cache_type,
   2189     cache_options=cache_options,
   2190     size=size,
   2191 )
   2192 self.s3 = self.fs  # compatibility
   2194 # when not using autocommit we want to have transactional state to manage

File [/opt/conda/lib/python3.11/site-packages/fsspec/spec.py:1742](http://10.0.132.52/opt/conda/lib/python3.11/site-packages/fsspec/spec.py#line=1741), in AbstractBufferedFile.__init__(self, fs, path, mode, block_size, autocommit, cache_type, cache_options, size, **kwargs)
   1740         self.size = size
   1741     else:
-> 1742         self.size = self.details["size"]
   1743     self.cache = caches[cache_type](
   1744         self.blocksize, self._fetch_range, self.size, **cache_options
   1745     )
   1746 else:

File [/opt/conda/lib/python3.11/site-packages/fsspec/spec.py:1755](http://10.0.132.52/opt/conda/lib/python3.11/site-packages/fsspec/spec.py#line=1754), in AbstractBufferedFile.details(self)
   1752 @property
   1753 def details(self):
   1754     if self._details is None:
-> 1755         self._details = self.fs.info(self.path)
   1756     return self._details

File [/opt/conda/lib/python3.11/site-packages/fsspec/asyn.py:118](http://10.0.132.52/opt/conda/lib/python3.11/site-packages/fsspec/asyn.py#line=117), in sync_wrapper.<locals>.wrapper(*args, **kwargs)
    115 @functools.wraps(func)
    116 def wrapper(*args, **kwargs):
    117     self = obj or args[0]
--> 118     return sync(self.loop, func, *args, **kwargs)

File [/opt/conda/lib/python3.11/site-packages/fsspec/asyn.py:103](http://10.0.132.52/opt/conda/lib/python3.11/site-packages/fsspec/asyn.py#line=102), in sync(loop, func, timeout, *args, **kwargs)
    101     raise FSTimeoutError from return_result
    102 elif isinstance(return_result, BaseException):
--> 103     raise return_result
    104 else:
    105     return return_result

File [/opt/conda/lib/python3.11/site-packages/fsspec/asyn.py:56](http://10.0.132.52/opt/conda/lib/python3.11/site-packages/fsspec/asyn.py#line=55), in _runner(event, coro, result, timeout)
     54     coro = asyncio.wait_for(coro, timeout=timeout)
     55 try:
---> 56     result[0] = await coro
     57 except Exception as ex:
     58     result[0] = ex

File [/opt/conda/lib/python3.11/site-packages/s3fs/core.py:1374](http://10.0.132.52/opt/conda/lib/python3.11/site-packages/s3fs/core.py#line=1373), in S3FileSystem._info(self, path, bucket, key, refresh, version_id)
   1372 if key:
   1373     try:
-> 1374         out = await self._call_s3(
   1375             "head_object",
   1376             self.kwargs,
   1377             Bucket=bucket,
   1378             Key=key,
   1379             **version_id_kw(version_id),
   1380             **self.req_kw,
   1381         )
   1382         return {
   1383             "ETag": out.get("ETag", ""),
   1384             "LastModified": out.get("LastModified", ""),
   (...)
   1390             "ContentType": out.get("ContentType"),
   1391         }
   1392     except FileNotFoundError:

File [/opt/conda/lib/python3.11/site-packages/s3fs/core.py:365](http://10.0.132.52/opt/conda/lib/python3.11/site-packages/s3fs/core.py#line=364), in S3FileSystem._call_s3(self, method, *akwarglist, **kwargs)
    363 logger.debug("CALL: %s - %s - %s", method.__name__, akwarglist, kw2)
    364 additional_kwargs = self._get_s3_method_kwargs(method, *akwarglist, **kwargs)
--> 365 return await _error_wrapper(
    366     method, kwargs=additional_kwargs, retries=self.retries
    367 )

File [/opt/conda/lib/python3.11/site-packages/s3fs/core.py:145](http://10.0.132.52/opt/conda/lib/python3.11/site-packages/s3fs/core.py#line=144), in _error_wrapper(func, args, kwargs, retries)
    143         err = e
    144 err = translate_boto_error(err)
--> 145 raise err

OSError: [Errno 16] Service Unavailable

@orfeas-k
Copy link
Contributor

Tailing the logs from squid with sudo tail /var/log/squid/access.log -f, we see those logs being generated when re-running the read_csv command alone.

1724933362.913     12 10.0.132.52 TCP_MISS_ABORTED/503 353 HEAD http://mlflow-minio.kubeflow:9000/kf-testing-minio/uploaded-sample.csv - HIER_NONE/- text/html
1724933363.346      0 10.0.132.52 TCP_MISS_ABORTED/503 353 HEAD http://mlflow-minio.kubeflow:9000/kf-testing-minio/uploaded-sample.csv - HIER_NONE/- text/html
1724933364.358      0 10.0.132.52 TCP_MISS_ABORTED/503 353 HEAD http://mlflow-minio.kubeflow:9000/kf-testing-minio/uploaded-sample.csv - HIER_NONE/- text/html
1724933367.729      0 10.0.132.52 TCP_MISS_ABORTED/503 353 HEAD http://mlflow-minio.kubeflow:9000/kf-testing-minio/uploaded-sample.csv - HIER_NONE/- text/html
1724933372.152      0 10.0.132.52 TCP_MISS_ABORTED/503 353 HEAD http://mlflow-minio.kubeflow:9000/kf-testing-minio/uploaded-sample.csv - HIER_NONE/- text/html

which means that requests to the s3 storage go through the proxy, instead of looking for the storage in the cluster.

Talking with @NohaIhab, it looks like for kfp uats we also needed to add .kubeflow to the no_proxy values here. Updating the pod-default and creating a new notebook where the no_proxy includes also .kubeflow resolved the above issue and the UAT runs without issues. Thus, we 'll open a PR to update the poddefault.

We reran also kfp,katib,training and kserve UATs to verify that this doesn't affect them and they succeeded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants
@DnPlas @orfeas-k and others