-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RFC for dependency mirrors #302
Conversation
Signed-off-by: Daniel Mikusa <[email protected]>
@dmikusa I think we should support cases, where the mirrors might not have a common root. Just imagine the platform team taking care of buildpacks in an air gapped environment is not the same team as the one maintaining the mirrors. Pointing Something like this for bindings
and something similar for the environment variable (probably the separators to use would need careful thought. |
Thanks for this RFC details and also the changes to libpak
Thank you |
@loewenstein The current implementation allows for setting mirror URIs using the {originalHost} placeholder. E.g.: "https://mirrors.example.org/{originalHost}". |
|
Yes @bitgully, I do not necessarily have control over the path, so |
Alright, @loewenstein, we could expand this according to your suggestion above and use the current "uri" binding as default/fallback to apply to all dependencies that don't have an explicit binding matching their hostname. If no "default" binding is set, we could simply override the ones where the hostname matches a binding and download all others from their original (public) location. |
What I'm taking away is that I'm not seeing that we need to be able to have a separate mirror for every dependency. I don't think we could support this because it's going to create a lot of complexity. I also think that bindings are uniquely positioned to meet this need because they already allow you to remap every individual dependency. What I think we could do and what I think meets the needs that @loewenstein mentioned is to offer some sort of So for the given example, we'd set Do you think that would work? I don't like that it requires JSON embedded into the env variable, but I couldn't think of another way to do that. Thoughts on that as well? |
@shresthaujjwal The other RFC is pretty large and ambitious. This RFC is something that we can implement pretty easily. In libpak, it required changes in two files, probably less than 100 LOC. The idea is that this is something we can give people now, pretty easily, that will help with managing dependencies & make the buildpacks more accessible. I also think that even when we move dependencies out of buildpack.toml, we'll still want to support options for dependency mirrors. This establishes mirrors as a first-class option, and we'll need to continue to support it regardless of where the dependency metadata lives. |
@loewenstein you'd also mentioned these two points on the original issue:
Yes, I don't think we can do any path mapping. We can insert a static prefix, we can insert the original host, and we can probably do something like I mentioned above where we do a mapping from the original host to some other value. I don't think we can remap the entire URL though. We have to assume that the mirror is going to have the same file structure as the original link, but with a different host/prefix. So you could set Does that work?
Yes, 100%. When the RFC is sorted, we'll document this on Paketo's Docs page. I was thinking under the config section where we have the dependency mapping documented. |
@dmikusa How is the additional environment variable make things any easier? Asked differently, how is a host map going to be complex? func (d DependencyCache) setDependencyMirror(urlD *url.URL) {
if d.DependencyMirror != "" {
d.Logger.Bodyf("%s Download URIs will be overridden.", color.GreenString("Dependency mirror found."))
urlOverride, err := url.ParseRequestURI(d.DependencyMirror)
if strings.ToLower(urlOverride.Scheme) == "https" || strings.ToLower(urlOverride.Scheme) == "file" {
urlD.Scheme = urlOverride.Scheme
urlD.User = urlOverride.User
urlD.Path = strings.Replace(urlOverride.Path, "{originalHost}", urlD.Hostname(), 1) + urlD.Path
urlD.Host = urlOverride.Host
} else {
d.Logger.Debugf("Dependency mirror URI is invalid: %s\n%w", d.DependencyMirror, err)
d.Logger.Bodyf("%s is ignored. Have you used one of the supported schemes https:// or file://?", color.YellowString("Invalid dependency mirror"))
}
}
} we would have access to the mapping via |
Sorry, let me clarify what I meant. I mean more complex for the user/from a UX perspective. With the map as you propose and as I understand what you propose (if I'm misunderstood, apologizes please correct me), everyone would need to do it that way. Even if you just have one host. With the second env variable, I do agree that the implementation is basically the same either way. My main concern is just hiding additional complexity from users who don't need it. I'm not locked into this way of doing it, but I think that's an important consideration. |
@dmikusa @bitgully i fully agree, this is nice solution. Awesome work thinking outside the box. If we are thinking of adding {
"github.com": {
"mirror": "public-github",
"path" : [
{
"regex-search-replace-description": "Replace /SAP/SapMachine/releases/download with /internal-SAP/generic/releases",
"regex-search": "",
"regex-replace": ""
},
{
"regex-search-replace-description": "Replace /SAP/SapMachine/releases/download with /internal-SAP/generic/releases",
"regex-search": "",
"regex-replace": ""
}
]
},
"nodejs.org": {
"mirror": "node-dist",
"path" : [
{
"regex-search-replace-description": "Replace /SAP/SapMachine/releases/download with /internal-SAP/generic/releases",
"regex-search": "",
"regex-replace": ""
},
{
"regex-search-replace-description": "Replace /SAP/SapMachine/releases/download with /internal-SAP/generic/releases",
"regex-search": "",
"regex-replace": ""
}
]
}
} |
I would see two options to keep simple things simple and make complex things possible:
With 1. we would have to think about separators to allow multiple entries, so I would tend to either 2 or 3 - slightly leaning towards 2, because it also keeps the more complex case a little simpler. The 2. could also be augmented with the proposed regex capabilities for even more. |
It's nice to see all the enthusiasm for adding more flexibility. It's just that the core intention was to give the user a simple string to redirect from the public internet to an on-prem location; kind of like a proxy setting in the OS. Whilst addressing some additional corner cases, I would consider dynamic path replacement out of scope here.
The proposed solution of using |
I agree. I was just hinting at the possibility to extend the solution if we choose 2. for the initial implementation.
I this that BP_DEPENDENCY_MIRROR_PREFIX_MAP is unnecessarily complex on the UX side. I.e. with BP_DEPENDENCY_MIRROR=https://mirror.example.org/{originalHost}
BP_DEPENDENCY_MIRROR_PREFIX_MAP={"github.com":"public-github") we need two variables instead of one and we need to understand that the PREFIX_MAP with influence the originalHost that get's injected. Instead with BP_DEPENDENCY_MIRROR={"github.com":"https://mirror.example.org/public-github/{originalHost}") this a single env variable and it is quite easy to grasp what it is doing. Note: I added At the same time, we can continue to support the folowing. BP_DEPENDENCY_MIRROR=https://mirror.example.org/{originalHost} The idea of separate env variables encoding the host in there has it's appeal. But it doesn't match too well with also supporting the binding, does it? |
I was hoping, we could solve the bindings like this: File Content
/platform
└── bindings
└── dependency-mirror
├── default https://mirror.example.org/{originalHost}
├── github.com https://mirror.example.org/public-github
├── nodejs.org https://mirror.example.org/node-dist
└── type dependency-mirror And the environment variables like that: BP_DEPENDENCY_MIRROR https://mirror.example.org/{originalHost}
BP_DEPENDENCY_MIRROR_GITHUB_COM https://mirror.example.org/public-github
BP_DEPENDENCY_MIRROR_NODEJS_ORG https://mirror.example.org/node-dist For most users, there wouldn't be any additional complexity as they have to set only one (default) value. For those needing more detailed separation, the configuration pattern stays the same. The thing with using JSON in environment variables is, that we might also have to escape those double quotes when setting them (e.g. in Kubernetes manifests). |
@bitgully 100%. I'd be fine with that approach. One implementation detail, if we were to take that approach. How do we get from I worry a little about edge cases. If a URL has There are also some limitations as to what you can put into an env variable's name. Could we get into a case where a URL has some character in it that's not a character someone could put into the env variable name? Do we care, since there's the option to fall back and use bindings? |
Maybe a variation on the env variable approach? It would avoid having the hostname in the env variable name. I think it would be OK as well because you can't have an For a single mapping:
For multiple mappings:
|
@shresthaujjwal - Sorry, I think full path replacement is out of scope. We can handle some prefix mapping, like what's been discussed with hostnames and mapping, but doing replacements to the actual URLs is a lot more work and a lot more configuration that has to be conveyed to the buildpack so it can do the mappings. I apologize if that doesn't fit into your current mirror structure and I acknowledge rebuilding a mirror can be a lot of work. For what it's worth, I do want to create some tooling to quickly create a mirror from the given buildpacks, something where you can give it a list of buildpacks and it'll go download all the dependencies and create the file structure. Once this RFC lands and we have all the details sorted out, I'll look into that. |
After digging out some documents from around the days when I was born, I'd argue it is sufficient. It seems RFC-921 and RFC-952 referenced here suggest hostnames should be considered case-insensitive and consist only of a-z, 0-9 or hyphens. No underscores, and periods are allowed only for separating domain levels. I also did a quick search of the paketo-buildpacks repos. It looks like there are around 20 different hostnames in use for dependencies, all of which comply with this naming convention. I'd prefer not to complicate this unnecessarily and refrain from using wildcards and mappings (=, :) in the actual values. |
@bitgully Hyphens are the ones that worry me. You can have them in hostnames, but they're not allowed in env variable names. Some shells might support it, but some don't. I've experienced this pain in the past. If there were to be a domain like Since you looked, thank you very much, and we don't have any dependencies that would be impacted by this then maybe we go ahead with the approach anyway. I do like it as the simplest approach. I do think we should maybe consider what we would do if a hostname comes up with a |
Oh, now I get it. Sorry. Yes, we already have hostnames with hyphens in existing paketo-buildpacks (e.g. download.bell-sw.com). The This would only affect users that
|
Ok, sounds like it's time for me to update the RFC. I'll take all this feedback and get it integrated this weekend. Please keep the feedback coming! Thanks all! |
Signed-off-by: Daniel Mikusa <[email protected]>
Ok, please take a second look. I've updated and added a section to include the additional hostname mapping capabilities. I also took out the unresolved question. I'll post it here if anyone has feedback on it.
Where I'm landing on this is:
Thanks for all the great feedback! |
The updated description looks good to me and I assume the mirror feature will become quite popular for those deploying paketo-buildpacks in corporate networks. Regarding the unresolved question: |
Why do we require a default mirror to be set? If none is specified for a specific host I'd say we can just try to access the original host. |
I was thinking about it from the security use case. If I'm setting a mirror to ensure that certain dependencies (perhaps ones our security team has approved) are consumed, then I don't want to have a case where the buildpack could accidentally pull other dependencies. Requiring the dependency mirror means that it would pull from a mapping or fall back to the default. Also, the way I've been thinking about mirrors and this feature is that using a dependency mirror signals that I want to redirect everything to my mirror. If I only wanted to redirect certain dependencies, then I'd use a dependency mapping like we have already. How are others thinking about this? Is there a concrete use case for being able to use this feature to only map certain things? |
It's not about a concrete use case for partial mapping actually. It's more about not having a meaningful default. |
Should we fail if there's no default then? So options would be:
I'd be OK with that as it still preserves the concept that a mirror remaps everything (whereas dependency mappings are used for partial remappings). I don't think we should do this for the RFC, but if we get an issue or a future use case surfaces where falling back to the original URLs is the right thing to do then we could add a configuration flag to make that work. I personally feel that the expectation if you use a mirror is that everything gets resolved from the mirror. If it can't resolve from the mirror, trying somewhere else feels magical and unexpected to me. If others feel differently please let me know. That's just my opinion. |
I guess it would be okay to fall back to the original URL if no default is set. Even though it might not be reachable from most networks. Maybe we cannot assume that all mirrors are resolving all types of artifacts. There might be cases where access to the internet would be available but certain host names should still be mapped to a local mirror. E.g.: There is a mirror that provides docker images only. All images should be pulled from the mirror for performance reasons or because vulnerability scanning and quarantining are done here. But no other artifacts (binaries etc.) are hosted on-prem due to a missing functionality or a lack of control over what is mirrored in a certain organization. |
@dmikusa I can see your point regarding unexpected magic happening. Maybe this is caused a little by our choice of the "dependency mirror" terminology and way of thinking now; assuming there either is one or there is not. I think we should bear in mind that the original URL (from buildpack.toml) is the true DEFAULT location. The proposed change would just give us an option The functionality is the same but I find conceptually shifting away from having a default mirror, host-specific mirrors and the original location as a potential fallback makes a difference. |
I'd prefer optional mirror behaviour like @bitgully describes over failing in case of some hosts are not mapped like @dmikusa describes. |
Thinking about it more, I'm fine with this. My concern was that calling it a mirror might set a false expectation, but given the way we're naming the env variables I think we're being as direct with what it's doing as we can be.
Let me update the RFC, and we can see how things look. Sounds like we're getting close to agreement on everything. Maybe we can have a call for votes next week. |
I'm working on a draft PR to packit with the above implementation. |
Signed-off-by: Daniel Mikusa <[email protected]>
RFC has been updated. Please all take a look and let me know if there's any more feedback. If it's OK, please add your votes! Thanks |
This looks good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The libpak implementation has been adapted accordingly: paketo-buildpacks/libpak#322 |
I am struggling the understand how the hostname specific proxy functions. If I have a |
@ForestEckhardt The back half is preserved and the URL turns into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Thanks everyone! We have enough votes to move this forward. This is a call for any final comments. You have until the end of the day Friday April 5th. |
Summary
Presently, you can either take the dependencies shipped with a Paketo buildpack or you can create a whole bunch of dependency mapping bindings to change each dependency you want to override individually. There is no option to override all dependencys and point them to a mirror that is convenient.
Big thanks to @bitgully for the idea and libpak implementation.
Readable.