Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[All] Authorize allowed_users, admin_users, _or_ other allowed/admin groups #594

Merged
merged 57 commits into from
Jun 23, 2023

Conversation

GeorgianaElena
Copy link
Member

@GeorgianaElena GeorgianaElena commented Apr 11, 2023

Updated PR description

Previously authorization logic resided in OAuthenticator.user_is_authorized called by OAuthenticator.authenticate. The downside is that the Authenticator.get_authenticated_user that in turn called authenticate also ran Authenticator.check_allowed after authenticate, which included a check if the user was part of Authenticator.allowed_users. This led to authorization done in two places, one called from authenticate, and one later. That meant that even if for example GitHubOAuthenticator.allowed_organizations made a user authorized, it could still be denied access if allowed_users was configured by the later call to check_allowed.

Due to this, we decided to put authorization logic in check_allowed directly, allowing us to declare for example that either the user was part of allowed_users or part of allowed_organizations.

Not all logic in the OAuthenticator.user_is_authorized methods was relocated to the new overrides of Authenticator.check_allowed, logic related to getting information about the user was mostly put in OAuthenticator.update_auth_model while the logic making authorization decisions on such information was put in the check_allowed overrides.

Breaking changes

The changelog is updated with the following breaking changes.

Breaking changes

  • [All] Users are now authorized based on either being part of Authenticator.admin_users, Authenticator.allowed_users, an Authenticator specific allowed team/group/organization, or declared in JupyterHub.load_roles or JupyterHub.load_groups.
  • [Google] If GoogleOAuthenticator.admin_google_groups is configured, users logging in not explicitly there or in Authenticator.admin_users will get their admin status revoked.
  • [Generic, Google] GenericOAuthenticator.allowed_groups, GenericOAuthenticator.allowed_groups, GoogleOAuthenticator.allowed_google_groups, and GoogleOAuthenticator.admin_google_groups are now Set based configuration instead of List based configuration. It is still possible to set these with lists as as they are converted to sets automatically, but anyone reading and adding entries must now use set logic and not list logic.
  • [Google] Authentication state's google_groups is now a set, not a list.

Related

Outdated PR description

Fixes #591 for GenericOAuthenticator, but also for the other oauthenticators with the same bug.

The main changes this PR introduces are:

See #594 (comment) for the latest decided changes. The list below might be outdated.

Changes planned initially
  • the authenticate function (through the update_auth_model) will return info about the admin status of the user only if user is not already in admin_users list and an admin_groups type of config was specified. Otherwise, the admin status will be added through get_authenticated_user
  • authorize access based on group membership if user was not already authorized through allowed_users
  • user is considered authorized in cases where they're member of an admin group, even though that admin group is not explicitly listed in the allowed_groups
  • move the function that checks if the users groups are among the allowed groups in the base class
  • add a test that caches the bug in Admins configured in admin_users lose admin status if admin_groups is not configured #591 for all the authenticators that had it

Related

Todos after merging PR:

  • open issue about standardizing a flag like populate_teams_in_auth_state in all authenticators and the naming for the teams key, holding the info about the teams the user is a member of. (Eg. in bitbucket.py the key is called user_teams and we set it by default without any flag, but in github.py we have a flag and call it teams). Consider allowed_orgs too as for standadization.

@manics
Copy link
Member

manics commented Apr 11, 2023

I haven't reviewed this, but I've marked it as a breaking bug fix since it changes who is authorised, and I think this should be highlighted to admins just in case.

What's the expected behaviour when both allowed_users and allowed_groups is specified, but some allowed_users aren't in allowed_groups? Is it the union or intersection that's allowed? I think that's worth documenting to avoid confusion.

The docs for allowed_users says

Use this with supported authenticators to restrict which users can log in. This is an
additional list that further restricts users, beyond whatever restrictions the
authenticator has in place. Any user in this list is granted the 'user' role on hub startup.

which makes sense when there are no allowed_groups or equivalent, but it's less obvious when allowed_groups is defined.

@GeorgianaElena
Copy link
Member Author

What's the expected behaviour when both allowed_users and allowed_groups is specified, but some allowed_users aren't in allowed_groups? Is it the union or intersection that's allowed? I think that's worth documenting to avoid confusion.

In this PR, if both allowed_users is set and allowed_groups, then allowed_groups is ignored, which I now realize is not the right thing to do.

The LocalAurthenticator for example says it does the reverse, i.e. completely ignores the allowed_users when allowed_groups is specified.

Proposed solution

Since for deciding whether an user is admin, we're using the union of both the admin_users and admin_groups, what do you think about doing the same for deciding if an user is allowed?
For allowing an user, I believe we should be looking at the union of all the four lists: allowed_users, allowed_groups, admin_users, admin_groups.

@manics
Copy link
Member

manics commented Apr 11, 2023

Intuitively I think the union of allowed_users and allowed_groups makes sense, especially as it's easy to understand. However I also think we should try to be consistent across the JupyterHub maintained authenticators.

Do you think we should treat the LocalAuthenticator behaviour as an anomaly and use the union behaviour here?

Regarding the union of users and admins..... having thought about this a bit more I'm not sure. Ideally with RBAC you only have "users", and admin permissions are then given to selected user groups.

According to the JupyterHub docs
https://github.com/jupyterhub/jupyterhub/blob/4.0.0b2/jupyterhub/auth.py#L102-L141

  • admin_users: Set of users that will have admin rights on this JupyterHub
  • allowed_users: Set of usernames that are allowed to log in

which implies admin_users is a subset of allowed_users. I think we need more input from others!

@GeorgianaElena
Copy link
Member Author

@manics, after doing a bit more reading, I realized that:

  • the get_authenticated_user in the jupyterhub package takes care about all the logic behind admin_users and allowed_users. That function will call the authenticate function specific to each authenticator and add extra logic on top of it, after the user was successfully authenticated.

So maybe we shouldn't care too much about taking any action about admin_users or allowed_users in authenticate, but just don't dismiss the info there. For example: if the user didn't specify an admin_groups, then we shoulnd't dismiss additional info in admin_users.

  • admin_groups is part of the base Authenticator class in jupyterhub, with a clear description saying that it won't override what's in admin_users, but it will add to it instead
  • allowed_groups is authenticator specific I believe, as it's not part of the base Authenticator class and not even the name is fixed. Which I believe "makes it ok" (?) for LocalAuthenticator to do whatever with the info there, as long as that's documented?

For the oauthenticator's case, wondering what should we be doing?

  1. Follow what's in LocalAuthenticator, i.e. completely ignores the allowed_users when allowed_groups is specified in order to be consistent across authenticators
  2. or do the same we're doing for deciding the admin status, i.e. check if the user is member of any allowed_groups only if not already allowed though allowed_users?

I would incline towards option 2 (which is what I ended up doing in this PR), as it makes more sense in my head and it's what I would be expecting if I wasn't reading the docs.

(cc @minrk, what do you think?)

@consideRatio
Copy link
Member

I think I align with your thinking @GeorgianaElena!

  1. to authorize users in allowed_users or allowed_groups, or for getting admin status set
  2. to set admin status if users are part of admin_users or admin_groups
  3. to unset admin status if they aren't part of admin_users or admin_groups and admin_groups is configured

I'm in the process of developing a PR to this PR with some fixes, I'll keep working on it tomorrow.

@consideRatio
Copy link
Member

consideRatio commented Apr 23, 2023

Technical background

A key challenge for us is that we implement authenticate that by its name is meant to identify/authenticate a user, but we are also using it to authorize to some degree. The base class get_authenticated_user that we don't re-implement also includes authorization logic though.

I look to provide a technical background to help answer if we can do what we want from the implementation of authenticate alone, or if we can or should do something beyond this - such as influencing get_authenticated_user.

About get_authenticated_user

The Authenticator base class defines get_authenticated_user and authenticate.

  • get_authenticated_user is provided an implementation in the Authenticator base class that calls authenticate, normalize_username, validate_username, check_blocked_users, check_allowed, run_post_auth_hook.
  • authenticate isn't implemented by the Authenticator base class, and we are required to implement it in this project.

About get_authenticated_user called methods

Technical choices

Regarding implementing our own get_authenticated_user

  1. We don't implement our own get_authenticated_user
  2. We implement our own get_authenticated_user
    • a) by calling super().get_authenticated_user and doing some extra things
    • b) by replacing it entirely

I see 1 as the default choice until we better understand the need to consider 2a or 2b.

What we do in authenticate

  • About admin
    We must set auth_model["admin"] to False sometimes as we need to declare that a user is no longer an admin by being removed from a group of admin users. But, we shouldn't do that if the user is part of admin_users, so we must check for that as part of authenticate. Assuming we rely on the Authenticator.get_authenticated_user, we have no other choice.
  • Scope of things to do in authenticate that is returning either None or a dictionary about a user
    Either we return None or a auth_model dictionary. If we return an auth_model dictionary, the username will be normalized, verified, and then if it passes checks against allowed_users and blocked_users it will be considered authorized.
    Based on this, can we manage to authorize a user either if its part of allowed_users or allowed_groups, while also letting the check for being part of allowed_users be done by get_authenticated_user? No I don't think so, I think if we look to do this in authenticate alone, we will need to consider allowed_users there as well making get_authenticated_user run an additional check (after username normalization) which we also would need to do as part of authenticate.

Can we override other Authenticate functions?

Yes I think specifically we need to implement check_allowed.

  1. To allow users either in allowed_users or allowed_groups, like in the The LocalAuthenticator implementation of check_allowed that combines a check of allowed_users with a check of allowed_groups.
  2. To allow users with auth_model["admin"] set to True access

Conclusion

Implementing #594 (comment) still seems like a good idea, that is:

  • to authorize users in allowed_users or allowed_groups, or for getting admin status set
  • to set admin status if users are part of admin_users or admin_groups
  • to unset admin status if they aren't part of admin_users or admin_groups and admin_groups is configured

To do so though would motivate another refactoring where we extract authorization logic out from authenticate and put it in check_allowed instead.

Vision of implementing check_allowed ourselves

  • We would reduce the complexity of authenticate significantly by extracting authorization logic.
  • We would avoid taking on the complexity of doing get_authenticated_user ourselves
  • We would systematically be able to say that the list of allowed_users and blocked_users refers to normalized and mapped usernames.
  • We could fix a not yet reported bug for authenticators providing allowed_groups
    Right now LocalGenericOAuthenticator and LocalOpenShiftOAuthenticator has the LocalAuthenticators implementation of check_allowed that calls check_allowed_groups and treats allowed_groups as the UNIX groups rather than groups from the OAuth2 provider.
    This would require us to define class LocalGenericOAuthenticator(GenericOAuthenticator, LocalAuthenticator) instead of putting LocalAuthenticator before GenericOAuthenticator though, which seems fine.

@manics
Copy link
Member

manics commented Apr 23, 2023

How much of this is specific to OAuthenticator, and how much is applicable to all remote authenticators? Can we push any of this logic into jupyterhub.auth instead, either in the existing abstract Authenticator, or a new abstract RemoteAuthenticator? It's more work, but it means we can standardise and document the behaviour across more authenticators instead of having each authenticator package doing it's own thing.

@consideRatio
Copy link
Member

How much of this is specific to OAuthenticator, and how much is applicable to all remote authenticators?

  • I think the check_allowed function could be made to authorize users in found in admin_users.
    I'm not fully confident on this though and would like to persue it separately to avoid bundling too much work. If done, we could reduce complexity in this package a bit further.
  • I think this package could let authenticators share logic for handling groups, and that such logic could also make its way towards jupyterhub.auth, but these thoughts are immature and inactiable for me at the moment.

@consideRatio
Copy link
Member

consideRatio commented Apr 23, 2023

@GeorgianaElena I wanted to expedite this work to get 16.0.0 out faster, and ended up immersed in the logic and worked it onwards from your latest commit (bece7b8).

@GeorgianaElena is it okay if I add commits or make a PR to this PR to keep working this, and switching the roles a bit - letting you review?

Strategy in progress

  1. Make OAuthenticator.authenticate be authentication only - no authorization logic
    • Practically by replacing the internal user_is_authorized with an override of Authenticator.check_allowed:
      • user_is_authorized was called by authenticate
      • check_allowed is called by Authenticator.get_authenticated_user
  2. Make OAuthenticator.authenticate initialize auth_model["admin"] based on users being listed in admin_users
    • get_authenticated_user doesn't determine admin status until its too late for us to make use of it for authorization decisions otherwise.
    • A pre-requisite for this is to do username = self.normalize_username(username)
      Otherwise username in self.admin_users checks would be done without the normalized username.
  3. To systematically ley update_auth_state be about fetching relevant information for an authorization decision, and check_allowed be about making use of it from auth state.

Strategy exemplified

OAuthenticator.authenticate
    async def authenticate(self, handler, data=None, **kwargs):
        """
        A JupyterHub Authenticator's authenticate method's job is:

        - return None if the user isn't successfully authenticated
        - return a dictionary if authentication is successful with name, admin
          (optional), and auth_state (optional)

        Subclasses should not override this method.

        ref: https://jupyterhub.readthedocs.io/en/stable/reference/authenticators.html#authenticator-authenticate-method
        ref: https://github.com/jupyterhub/jupyterhub/blob/4.0.0/jupyterhub/auth.py#L581-L611
        """
        # build the parameters to be used in the request exchanging the oauth code for the access token
        access_token_params = self.build_access_tokens_request_params(handler, data)
        # exchange the oauth code for an access token and get the JSON with info about it
        token_info = await self.get_token_info(handler, access_token_params)
        # use the access_token to get userdata info
        user_info = await self.token_to_user(token_info)
        # extract the username out of the user_info dict and normalize it
        username = self.user_info_to_username(user_info)
        username = self.normalize_username(username)

        # check if there any refresh_token in the token_info dict
        refresh_token = token_info.get("refresh_token", None)
        if self.enable_auth_state and not refresh_token:
            self.log.debug(
                "Refresh token was empty, will try to pull refresh_token from previous auth_state"
            )
            refresh_token = await self.get_prev_refresh_token(handler, username)
            if refresh_token:
                token_info["refresh_token"] = refresh_token

        # build the auth model to be read if authentication goes right
        auth_model = {
            "name": username,
            "admin": True if username in self.admin_users else None,
            "auth_state": self.build_auth_state_dict(token_info, user_info),
        }

        # update the auth_model with info to later authorize the user in
        # check_allowed, such as admin status and group memberships
        return await self.update_auth_model(auth_model)
OAuthenticator.check_allowed
    async def check_allowed(self, username, auth_model):
        """
        Returns True for users allowed to be authorized

        Overrides Authenticator.check_allowed that is called from
        `Authenticator.get_authenticated_user` after
        `OAuthenticator.authenticate` has been called, and therefore also after
        `update_auth_model` has been called.

        Subclasses with authorization logic involving allowed groups should
        override this.
        """
        # authorize users to become admins by admin_users or logic in
        # update_auth_model
        if auth_model["admin"]:
            return True

        # if allowed_users is configured, authorize/unauthorize based on that
        if self.allowed_users:
            return username in self.allowed_users

        # otherwise, authorize all users
        return True
GenericOAuthenticator.update_auth_model
    async def update_auth_model(self, auth_model):
        """
        Set the admin status based on finding the username in `admin_users` or
        finding a user group part of `admin_groups`.
        """
        user_info = auth_model["auth_state"][self.user_auth_state_key]

        username = auth_model["name"]
        if username in self.admin_users:
            auth_model["admin"] = True
        elif self.admin_groups:
            # if admin_groups is configured, we must either set or unset admin
            # status and never leave it at None, otherwise removing a user from
            # the admin_groups won't have an effect
            user_groups = self.get_user_groups(user_info)
            auth_model["admin"] = any(user_groups & self.admin_groups)

        return auth_model
GenericOAuthenticator.check_allowed
    async def check_allowed(self, username, auth_model):
        """
        Returns True for users allowed to be authorized.

        Overrides the OAuthenticator.check_allowed implementation to allow users
        either part of `allowed_users` or `allowed_groups`, and not just those
        part of `allowed_users`.
        """
        # allow admin users recognized via admin_users or update_auth_model
        if auth_model["admin"]:
            return True

        # if allowed_users or allowed_groups is configured, we deny users not
        # part of either
        if self.allowed_users or self.allowed_groups:
            user_info = auth_model["auth_state"][self.user_auth_state_key]
            user_groups = self.get_user_groups(user_info)
            if username in self.allowed_users:
                return True
            if any(user_groups & self.allowed_groups):
                return True
            return False

        # otherwise, authorize all users
        return True

I pushed my additions to https://github.com/consideRatio/oauthenticator/commits/fix-membership for now, still not confident this strategy pans out but optimistic it could pan out well.

@consideRatio
Copy link
Member

consideRatio commented Apr 26, 2023

Status update

I've had a video call with @GeorgianaElena about this where we decided to git push changes I've partially developed on top of this PR for now.

We agreed on the goal of authorizing users either part of admin_users or allowed_users or other allowed groups across all authenticators, and that the implementation of doing this involves refactoring to remove the OAuthenticator defined function user_is_authorized in favor of overriding the Authenticator defined function check_allowed.

I need to switch focus to https://github.com/2i2c-org/binderhub-service until my vacation starting at the end of the week, I'll be able to consider focusing on this again after May 13. I'm hands off until after my vacation ending after May 8th at least.

OAuthenticator base class

I've updated the OAuthenticator base class' authenticate function to:

  • call normalize_username
  • not call user_is_authorized
  • initialize auth_model["admin"]
    None by default and True for users in admin_users
  • overridden check_allowed
    Authenticator.check_allowed didn't authorize users with auth_model["admin"], only users in allowed_users. This is changed in this override that now authorizes users part of either. Only users users not part of a non-empty allowed_users set are denied authorization like before though.

GenericOAuthenticator

This is a good authenticator to look into extra, because other authenticators will make very similar changes for the additional features they provide over the OAuthenticate base class. For this authenticator, it is concepts of allowed_groups and admin_groups.

  • generic
    • allowed_groups, admin_groups

All other authenticators

The comments below indicates noteworthy complexities touched.

  • auth0
  • azuread
  • bitbucket
    • allowed_teams
  • cilogon
    • allowed_idps, allowed_idps>allowed_domains
  • github
    • allowed_organizations
  • gitlab
    • allowed_project_ids, allowed_gitlab_groups
  • globus
    • allowed_globus_groups, admin_globus_groups, identity_provider
  • google (some changes made)
    • allowed_google_groups, admin_google_groups, verified_email, hosted_domain
  • mediawiki
    • normalize_username
  • okpy
  • openshift
    • allowed_groups, admin_groups

Breaking changes

  • Google, generic: previously the groups used List based config, but it now use Set based config. Its possible to pass Python lists as they are converted to Python sets automatically, but anyone reading config from this and adding something to them must start using set logic and not list logic.

Copy link
Member

@consideRatio consideRatio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wieee I think we are starting to reach the goal here!!

oauthenticator/cilogon.py Outdated Show resolved Hide resolved
oauthenticator/cilogon.py Outdated Show resolved Hide resolved
oauthenticator/generic.py Outdated Show resolved Hide resolved
oauthenticator/github.py Outdated Show resolved Hide resolved
oauthenticator/github.py Outdated Show resolved Hide resolved
oauthenticator/globus.py Outdated Show resolved Hide resolved
oauthenticator/globus.py Outdated Show resolved Hide resolved
oauthenticator/globus.py Show resolved Hide resolved
oauthenticator/google.py Outdated Show resolved Hide resolved
oauthenticator/globus.py Outdated Show resolved Hide resolved
@consideRatio consideRatio changed the title [Bugfix: multiple oauthenticators] Fix admin status when admin_groups is set [All] Refactor to authorize users part of allowed users _or_ allowed groups Jun 14, 2023
@consideRatio
Copy link
Member

@GeorgianaElena I updated the PR title and description, and classified this as an enhancement.

@minrk thank you soo much for helping out discussing check_allowed, this PR is now ready for review and is returning True when check_allowed is called and passed None as authentication dictionary.

@yuvipanda yuvipanda requested a review from minrk June 20, 2023 17:07
Copy link
Member

@consideRatio consideRatio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to do a final review pass, and I plan to do so today - submitting this dummy review to indicate that.

@consideRatio
Copy link
Member

consideRatio commented Jun 23, 2023

@GeorgianaElena I pushed a few commits and structured them to be realtively easy to review.

  • I pushed 26d4c75 to include things unrelated to logic which could break something by mistake
  • I pushed three refactor PRs, not supposed to change any behavior:
    • 8556e27, openshift
    • 3c62101, github - I refactored away a helper function to retain some contexts in docstring/comments which seemed like a good idea, but I ended up needing to remove a test of the helper function which wasn't great - but perhaps acceptable.
    • fd06b47, google
  • I pushed 8acf905 with a breaking fix, and a breaking refactor
    • I observed a different behavior for admin_google_groups compared to the equivalent in globus and generic, where removing a user from the admin_google_groups wouldn't revoke the users admin rights as it now will.
    • I did a breaking refactor, making the "google_groups" a set instead of a list, and added a changelog entry about it.
  • I pushed 8b138c7 with a non-breaking change of letting globus save group membership in auth state like other authenticators typically do.

I've looked at the code in detail to make sure we avoid breaking changes, and I think its looks good! If you agree, I think we should go for a merge.

@consideRatio consideRatio changed the title [All] Refactor to authorize users part of allowed users _or_ allowed groups [All] Authorize allowed_users, admin_users, _or_ allowed groups Jun 23, 2023
@consideRatio consideRatio changed the title [All] Authorize allowed_users, admin_users, _or_ allowed groups [All] Authorize allowed_users, admin_users, _or_ other allowed/admin groups Jun 23, 2023
@GeorgianaElena
Copy link
Member Author

@GeorgianaElena I pushed a few commits and structured them to be realtively easy to review.

Thank you @consideRatio! I did a quick check and the changes look ok. Let's merge 🚀

@consideRatio
Copy link
Member

Wieeeeeeeeee nice work @GeorgianaElena!!!! Your work with this has made this project far more easy to think about and maintain!!

@floriandeboissieu
Copy link

Many thanks, I tested it with gitlab and it solves #545!

Just a small question, is there any reason that I missed for limiting the members of allowed_projects_ids to those with Developper and above access level to the project?

Personnaly, I would also allow reporters, as they would have access with read-only permission to the content of a private gitlab project (which would be the case of members of a tutorial for example).

As a workaround, it can still be done creating a group with these reporters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment