Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build/launch private repositories as the logged-in user (within an authenticated binderhub instance) #1154

Open
rprimet opened this issue Sep 30, 2020 · 5 comments

Comments

@rprimet
Copy link
Contributor

rprimet commented Sep 30, 2020

This issue is related to #1117, and possibly also to alan-turing-institute/hub23-deploy#272
See also https://discourse.jupyter.org/t/binderhub-with-private-gitlab-and-user-scopes/3502

Proposed change

Make it possible to fill repo provider / git credentials for a forge (e.g. gitlab instance) dynamically using the currently logged-in user information. This would help set up binderhub instances for use with private repositories for organizations that run a forge (e.g. gitlab instance).

Alternative options

The alternative option today is to have a 'technical' user (e.g. 'binderhub') on the forge (e.g. gitlab instance) that has (at least) read access to repositories. Then, a personal access token is created for that technical user and passed to the binderhub via the configuration system in GitLabRepoProvider.private_token. This token will be used to pull all repositories for all users.

This is not ideal:

  • from a usability perspective, it requires the user to grant access to their repos and revoke it as needed, and involves several steps that need to be documented on the private binderhub instance
  • from a security perspective, I am unsure of the implications. The project access level that has to be granted to the technical user are sometimes higher than a simple read access (e.g. on gitlab, I believe that the 'reporter' access level is required at least) -- but even when running as the logged-in user, binder may require capabilities that are not strictly necessary because of the somewhat coarse grain of the gitlab permissions model.

Who would use this feature?

Organizations who would like to run a private binderhub for their teams along with a forge (e.g. gitlab instance) and who want to build and launch private repositories.

(Optional): Suggest a solution

Since the repo provider will have to resolve private refs using the currently logged-in user identity, it might be useful to pass user information to the repo providers, at construction time. I'm unsure how clean it is design-wise (and looking for feedback/comments!) but an option might be to pass the handler to RepoProviders at init time.

Then, assuming that a repo provider can access the current_user for the handler, a sketch implementation of a private repo provider for a gitlab instance might look like this:

class AuthGitLabProvider(GitLabRepoProvider):
    def __init__(self, *args, handler, **kwargs):
        super().__init__(*args, **kwargs)
        self.handler = handler
        ud = self.get_user_data(self.handler.get_current_user()['name'])
        self.access_token = ud['auth_state']['access_token']

    def get_user_data(self, username):
        r = requests.get(c.HubOAuth.api_url + f'/users/{username}',
            headers={
             'Authorization': 'token %s' % c.HubOAuth.api_token,
            }
        )
        r.raise_for_status()
        return r.json()

    @default('git_credentials')
    def _default_git_credentials(self):
        if self.access_token:
            return r'username=oauth2\npassword={token}'.format(token=self.access_token)
        return ""

c.BinderHub.repo_providers = {'gl': AuthGitLabProvider}
c.GitLabRepoProvider.hostname = "gitlab.example.com"

Of couse the sketch implementation here is just for discussion, has blocking calls, etc. (and obviously the api_token and api_url need to be fetched properly somehow)

But I'd like to gather feedback on the design and possible issues?

Thanks!

@rprimet rprimet changed the title Allow builds of private repositories, using the logged-in user's credentials Build/launch private repositories as the logged-in user (within an authenticated binderhub instance) Sep 30, 2020
@rprimet
Copy link
Contributor Author

rprimet commented Oct 13, 2020

Any thoughts on this? I'm mostly looking for opinions about passing the handler to the RepoProviders at init time -- would the project be open to merging a PR that does this? (there would be no need to merge the specific RepoProviders e.g. AuthGitLabProvider, those could be maintained separately)

Also, any pitfalls around asynchronicity? (e.g. could there be a way of lazily fetching the access_token -- I guess fetching it at init time couldn't be done asynchronously)

CC @sgibson91

@sgibson91
Copy link
Member

Hi @rprimet! This is something I'm very interested in solving (for GitHub repos) but haven't had the time to figure it out/work on. A lot of the team are at JupyterCon this week, but we have our team meeting the week after if you're available to attend and talk some more about it?

@rprimet
Copy link
Contributor Author

rprimet commented Oct 13, 2020

@sgibson91 great! I'll join the team meeting on the 22nd then!

@jtpio
Copy link
Contributor

jtpio commented Oct 13, 2020

What this means at first glance, is that to enable this use case there would be a change similar to this in binderhub:

index aadf2aa..0140396 100644
--- a/binderhub/base.py
+++ b/binderhub/base.py
@@ -43,14 +43,14 @@ class BaseHandler(HubOAuthenticated, web.RequestHandler):
         spec = self.request.path[idx + len(prefix) + 1:]
         return spec

-    def get_provider(self, provider_prefix, spec):
+    def get_provider(self, provider_prefix, spec, handler):
         """Construct a provider object"""
         providers = self.settings['repo_providers']
         if provider_prefix not in providers:
             raise web.HTTPError(404, "No provider found for prefix %s" % provider_prefix)

         return providers[provider_prefix](
-            config=self.settings['traitlets_config'], spec=spec)
+            config=self.settings['traitlets_config'], spec=spec, handler=handler)

     def get_badge_base_url(self):
         badge_base_url = self.settings['badge_base_url']
diff --git a/binderhub/builder.py b/binderhub/builder.py
index bb72782..559de52 100644
--- a/binderhub/builder.py
+++ b/binderhub/builder.py
@@ -224,7 +224,7 @@ class BuildHandler(BaseHandler):

         # get a provider object that encapsulates the provider and the spec
         try:
-            provider = self.get_provider(provider_prefix, spec=spec)
+            provider = self.get_provider(provider_prefix, spec=spec, handler=self)
         except Exception as e:
             app_log.exception("Failed to get provider for %s", key)
             await self.fail(str(e))

cc @rprimet feel free to correct if the diff is different based on your local testing.

a sketch implementation of a private repo provider for a gitlab instance might look like this:

And this logic can indeed stay in the config outside of BinderHub.

@rprimet
Copy link
Contributor Author

rprimet commented Oct 13, 2020

@jtpio Yes pretty much. The only difference with my diff is that I only changed base.py and kept the get_provider function signature unmodified, so the diff would be

--- a/binderhub/base.py
+++ b/binderhub/base.py
@@ -50,7 +50,9 @@ class BaseHandler(HubOAuthenticated, web.RequestHandler):
             raise web.HTTPError(404, "No provider found for prefix %s" % provider_prefix)
 
         return providers[provider_prefix](
-            config=self.settings['traitlets_config'], spec=spec)
+            config=self.settings['traitlets_config'], 
+            spec=spec,
+            handler=self)
 
     def get_badge_base_url(self):
         badge_base_url = self.settings['badge_base_url']

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants