Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for overriding default dependency source #315

Merged
merged 9 commits into from
Mar 19, 2024

Conversation

bitgully
Copy link
Contributor

Summary

Using the introduced environment variable BP_DEPENDENCY_SOURCE_OVERRIDE we can override the default source (scheme://host:port) of dependencies. Excluding those explicitly defined in bindings.

Use Cases

In (air-gapped) corporate networks, it is common to work with mirror registries. Dependencies cannot be pulled directly from the public internet but must run through the private mirror for security and performance/resilience reasons.
In such cases, we would have to manually define several individual dependency-mapping bindings for each dependency version. It is more practical to set the mirror's address once for all dependencies.

Sample

Setting variable BP_DEPENDENCY_SOURCE_OVERRIDE=https://mirror.acme.com would lead to this URI replacement:
Before: https://github.com/bell-sw/Liberica/releases/download/11.0.8+10/bellsoft-jre11.0.8+10-linux-amd64.tar.gz
After: https://mirror.acme.com/bell-sw/Liberica/releases/download/11.0.8+10/bellsoft-jre11.0.8+10-linux-amd64.tar.gz

Checklist

  • I have viewed, signed, and submitted the Contributor License Agreement.
  • I have linked issue(s) that this PR should close using keywords or the Github UI (See docs)
  • I have added an integration test, if necessary.
  • I have reviewed the styleguide for guidance on my code quality.
  • I'm happy with the commit history on this PR (I have rebased/squashed as needed).

@bitgully bitgully requested a review from a team as a code owner February 23, 2024 20:29
@dmikusa
Copy link
Contributor

dmikusa commented Feb 26, 2024

This is something that we've resisted doing in the past. Dependency mappings would be the suggested way to do this instead, or build your own buildpack images and then you can point the dependencies to wherever you want.

Security and provenance have been the primary motivators for not doing this previously. What you've proposed is a little different though, only being able to modify the scheme://host:port. Off the top of my head, I'd like this more if it were just allowing you to change the host:port, because everything should always be HTTPS. Would that still work for you?

I appreciate the PR. I need to think about this some more and see what the other maintainers think as well.

Thanks!

@anthonydahanne anthonydahanne added semver:minor A change requiring a minor version bump type:enhancement A general enhancement labels Feb 26, 2024
Copy link
Member

@anthonydahanne anthonydahanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bitgully
Copy link
Contributor Author

Sure, HTTPS only would work for me too. Overriding the scheme was just meant for backwards compatibility or in case anyone has a use case without TLS (maybe for performance reasons).
Let me know your preference when you are ready, and I will adapt it accordingly.

@dmikusa
Copy link
Contributor

dmikusa commented Feb 26, 2024

Excellent. Yes, please update to require HTTPS.

Other thoughts:

  1. I hate naming things cause it's hard and I don't like being picky about names cause it's subject but I think we should discuss the env variable name a bit. The term "source" was a little confusing to me. My preference would be to try to name it using the same terminology that we're using now. We have "dependency mappings" to remap a particular dependency, so maybe we have a "dependency host mapping"? or a "dependency domain mapping"? or a "dependency server mapping"? To indicate you're remapping an entire host/domain/server. What do others think about this?

    I'd be OK using the "mirror" terminology as well. I know that buildpacks use this and it's a pretty standard concept.

  2. I'm curious about using an env variable versus a binding, like with a dependency mapping? Did you have a specific reason for env variable? Just thinking through things in my head. If we used a binding instead of an env variable, then the process would be more consistent with the existing dependency mapping process. It would also facilitate using embedded HTTP basic credentials, which is something that would be nice to support since you wouldn't want to put those in an env variable.

  3. HTTP basic credentials. I feel like this is something people will want/need. If you don't want to include this now, I'm fine with that, but it's something we should think through so that we can support it in the future.

  4. I think we should somehow indicate a change like this has happened to the user. Definitely at build time. Possibly in the SBOM information or possibly as a label on the generated container? 🤔 but I would prefer in the SBOM. I just want to make sure that it's clear how the container was created and that provenance is clear as well.

@loewenstein
Copy link

I don't know if the sbom needs any update. We are still sticking to the sha that the buildpacks have configured, no? So, I cannot come up with any potential tampering that could have happened.

I'd definitely put something in the build log though.

@dmikusa
Copy link
Contributor

dmikusa commented Feb 26, 2024

My thought in regards to the SBOM is that we've now got different potential sources for the files. You can't tamper with them, because of the SHA256 hash, but it would still be helpful to know exactly where the files were sourced from. It's probably something we should do with dependency mapping, but I don't believe we do at the moment.

I'm also OK with spinning that out into another issue, but just so that we're thinking it through and doing things in a way where it is possible.

Oh, the other thing I was thinking about. We need to support file:// as well as https://. A big driver of dependency mapping is to switch from external to local URLs (typically added via a volume mount). We should support that here, because it'll make using local files a lot simpler. If that's a lot more work, again, I'm OK spinning that out into a separate issue.

@bitgully A lot of this is me thinking out loud. When it comes to additional features/functionality, implement what you're comfortable implementing. We'll move the rest into other issues.

@bitgully
Copy link
Contributor Author

@loewenstein: A hint, that the original URI was overridden, is printed to the build log here.

@dmikusa:

  1. True, naming is hard. I also thought of things like: "dependency registry", "dependency mirror", "dependency mapping proxy", "dependency mapping location". But I'm happy with any other version really.
  2. The reason I chose an env variable was that I mostly work in Kubernetes environments, where the env variable can be injected into the build container from a Kubernetes Secret and therefore would only exist inside the container. RBAC rules define who can view/edit such Secrets. I agree, when not running on a container orchestration platform, using an env variable might not be feasible. But having embedded HTTP credentials saved inside a binding file seemed less than ideal too to me.
  3. The HTTP basic credentials would in general be supported in this change here.
  4. Regarding the SBOM: I simply considered that mirror more like an external cache and didn't see it as an alternative location since artefacts would only be proxied through this. However, one could also use this override mechanism to simply point to any other server that actually isn't set up as a mirror/proxy. As the same applies to any dependency mapping, it would make sense to move this into a separate issue though.

@bitgully
Copy link
Contributor Author

bitgully commented Mar 1, 2024

I have decided to use "BP_DEPENDENCY_MIRROR" as a name now and limited the support for the https:// and file:// schemes only (no more http:// allowed).
I'm planning to add the option of setting a mirror using a binding instead of an env variable at a later stage as well. But I'm tied up in a few other projects at the moment.
But I have introduced an additional feature "BP_DEPENDENCY_MIRROR_PRESERVE_HOST" too. This can be necessary for use cases, where it isn't sufficient to just replace the hostname but also preserve the original hostname in the path on the mirror. Depending on the mirror's setup, this can be vital to ensure uniqueness of the (mirrored/proxied) sources.

dependency_cache.go Outdated Show resolved Hide resolved
dependency_cache.go Outdated Show resolved Hide resolved
@bitgully
Copy link
Contributor Author

bitgully commented Mar 3, 2024

I have reworked this now. The changes are:

  • Placeholder {originalHost} can be used in arbitrary locations of the mirror path, eliminating BP_DEPENDENCY_MIRROR_PRESERVE_HOST.
  • Mirror URIs can also be set using a binding of type 'dependency-mirror' and the filename 'uri'.
  • A warning is logged if a dependency mapping is found when a mirror is defined at the same time.

@bitgully bitgully requested a review from dmikusa March 7, 2024 11:55
@dmikusa
Copy link
Contributor

dmikusa commented Mar 18, 2024

Thank you! Sorry for the delay. I will get back to this today or tomorrow.

@dmikusa
Copy link
Contributor

dmikusa commented Mar 19, 2024

I think this came out really awesome. Thanks so much for sticking through the review and incorporating feedback.

@anthonydahanne Are you 👍 also? If so, I'll merge.

Copy link
Member

@anthonydahanne anthonydahanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your contribution @bitgully ! Looks great!

@dmikusa dmikusa merged commit 7d4bdbc into paketo-buildpacks:main Mar 19, 2024
4 checks passed
@loewenstein
Copy link

Late to the game... I am currently trying to understand how one could for example mirror the SapMachine as used in the sap-machine buildpack and there doesn't seem to be any documentation of this feature to have been considered in the PR and review.

So

  1. Given the original dependency is https://github.com/SAP/SapMachine/releases/download/sapmachine-11.0.22/sapmachine-jdk-11.0.22_linux-x64_bin.tar.gz would have to take the full path into the mirror, i.e. /SAP/SapMachine/releases/download/sapmachine-11.0.22/sapmachine-jdk-11.0.22_linux-x64_bin.tar.gz, right?
  2. Should we consider to document this new feature at least in the repository's repo and maybe also on the paketo.io documentation?

@dmikusa
Copy link
Contributor

dmikusa commented Mar 21, 2024

Moving this to the RFC.

dmikusa added a commit that referenced this pull request Nov 12, 2024
Signed-off-by: Daniel Mikusa <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver:minor A change requiring a minor version bump type:enhancement A general enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants