-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running the UATs behind a proxy #27
Labels
documentation
Improvements or additions to documentation
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
An important part of the Kubeflow UATs effort is their integration into the bundle CI, so that we can have automated checks in place whenever something changes in the Kubeflow bundle:
Due to the sheer size of this bundle, it can only be deployed on beefy self-hosted runners. Currently, our runners are configured to run behind a proxy meant to restrict access to external networks. Unfortunately, even if the proxy allows everything, configuring the UATs to work with it is not trivial. There are 2 main concerns which we'll dive into shortly:
Setup
The UATs are intended to run on a self-hosted VM with MicroK8s and Charmed Kubeflow deployed on top of it. As mentioned before, this VM runs behind a proxy, which is responsible for routing HTTP/S traffic to/from external networks. This is reflected in the
HTTP_PROXY
andHTTPS_PROXY
environment variables (in our case both pointing to the same proxy), which most tools use when connecting to the internet.Access Internal Addresses
The UATs need to access the Kubernetes API server in order to create and manage the Profile and Job responsible for executing the test suite. In order to achieve that, we use
lightkube
. Connections to internal resources (such as the API server) do not need to go through the proxy. For this reason, we have to instructlightkube
to ignore it, for which we have 2 options:NO_PROXY
listlightkube
to ignore the environment variables (among which areHTTP_PROXY
andHTTPS_PROXY
)Although the 1st option looked promising, it's actually harder to work with, due to limitations in the way the
httpx
package used bylightkube
interprets the list of CIDRs provided throughNO_PROXY
:Going with the 2nd option turned out to be more straightforward since it only entails initialising the
lightkube.Client
with thetrust_env
option set toFalse
. This is propagated tohttpx
, essentially instructing it toignore environment variables
. This way, thelightkube
client ignores the configured proxy and attempts to access the K8s API server directly, which succeeds since it's an internal network.Access External Addresses
Contrary to the situation described above, accessing external resources is only possible through the proxy, which enforces the following requirements:
Given that the self-hosted runners are still under active development, the first point is not expected to be an issue. More specifically, all destination addresses are allowed through the proxy at the moment. Later on, we might have to be more careful and deliberate in the resources we're accessing.
On the other hand, ensuring that all requests go through the proxy can be an arduous task. As mentioned before, resources on the VM are configured to work with the proxy through the
HTTP_PROXY
andHTTPS_PROXY
environment variables. Things become a bit more complicated though when we start deploying workloads on the MicroK8s cluster; these workloads are expected to have internet access (e.g. for installing Python dependencies) but do not share the host configuration by default. Our goal, therefore, is to propagate this configuration (the 2 environment variables, essentially) to any workload created on the cluster.Propagate Host Environment to Workloads
When it comes to propagating the environment variables into the created workloads, there are both fine and coarse-grained approaches, which we'll briefly explore below.
Using a ConfigMap
The best and most fine-grained approach to injecting environment variables to K8s Pods is using a ConfigMap with the desired data and explicitly specifying where these are consumed when creating each workload. An example ConfigMap could look like this:
Then, if we want a specific workload to get this configuration, we can do so by adding the following to the corresponding ContainerSpec:
Note, however, that the ConfigMap has to be in the same namespace as the created workload.
We've already implemented this approach in this branch of the UATs for the driver Job that runs the test suite to be able to access the internet. More specifically, if a file is provided through the
--env
option at invocation, the driver will use it to create a ConfigMap. Then, the deployed Job attempts to consume data from the ConfigMap (given that it exists, hence the optional part) and set them as environment variables in the Pod that will be created to run the suite.An example can be found in this branch of the CKF bundle, where we add the 2 environment variables to a
params.env
file and pass that to the UATs invocation.Limitations
Although this approach works for the driver Job, we need to take into account that many of the notebook tests could be deploying workloads themselves. These can be Argo Workflow Pods when running Kubeflow Pipelines, Katib Trial Pods, or KServe/Seldon deployment Pods, all of which could potentially require internet access. There are 2 issues here:
Using a MutatingAdmissionWebhook
A more global, coarse-grained approach would be through the implementation and deployment of a MutatingAdmissionWebhook that would inject the required data into any arbitrary workload deployed. An example of this can be found in the Kubeflow Admission Webhook that watches for submitted Pods and patches their environment based on the available PodDefaults. Although this could solve our issues, it is a major undertaking and is therefore unlikely to be prioritised if the Kubeflow team is the only one that ends up needing it.
The text was updated successfully, but these errors were encountered: