Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backend handling of setting user-role, user-group, and group-role associations #18777

Merged
merged 44 commits into from
Sep 24, 2024

Conversation

jdavcs
Copy link
Member

@jdavcs jdavcs commented Sep 5, 2024

Fixes #18636

REQUIRES #18493

To do prior to merging:

  • Fill in bodies of scripts to automatically fix inconsistent data in database for violated not null constraints and unique constraints.
    (I think these scripts should be optional, as opposed to triggered automatically from the migration script. Instead, the script will display a helpful error message with the name of the script the admin can run.)
  • Rebase on top of @jmchilton's PR(s) that contain db migrations; update migration scripts.

Harden database schema of user_role_association, user_group_association, group_role_association tables:

  1. Add not null constraint for user_id, group_id, role_id fields
    (A null value for any of these fields makes the record meaningless + it could only result from a bug, which now will be detected on first occurrence.)

  2. Add unique constraint for combinations of values: user_id/group_id, user_id/role_id, group_id/role_id.
    (Helps fix linked issue; duplicate associations are meaningless and may cause bugs.)

Rewrite the backend handling of setting user-role, user-group, and group-role associations.

The bug described in the issue is largely due to the inefficient handling of these associations (see discussion in issue). Here's a step-by-step pseudo-code description of what the code did when assigning users to a group (the example from the linked issue):

- for each selected user:
  - retrieve user from database
- for each selected role:
  - retrieve role from database
- retrieve current group role associations for group
- retrieve current user group associations for group
- for each retrieved group/role and user/group association:
  - delete association  (implicitly starts new transaction)
  - commit transaction
- for each role in retrieved roles:
  - create new group role association and add to session
  - (implicitly start new transaction)
  - commit transaction
- for each user in retrieved users:
  - create new user group association and add to session  (implicitly starts new transaction)
  - commit transaction

The above has linear complexity. If we apply the numbers from the example in the issue (add 1000 users to a group) + use some modest assumptions (there are currently 10 group role associations and 100 user group associations for this group), we end up with approximately the following database accesses:

- 1000 select user statement executions
- 10 select role statement executions
- 1 select group-role-association statement execution
- 1 select user-group-association statement execution
- 10 transactions: delete group-role-association
- 100 transactions: delete user-group-association
- 10 transactions: insert group-role-association
- 1000 transactions: insert user-group-association

TOTAL: 1110 transactions (inserts and deletes) including 1022 selects (as part of first transaction)

In the new version we bypass the ORM and use SQLAlchemy Core instead. This results in constant complexity. Running the same update operation looks like this:

- 1 flush if group is a new record and needs an id
- 1 delete group-role-associations by group
- 1 delete user-group-associations by group
- 1 insert group-role-associations w/multiple values
- 1 insert user-group-association w/multiple values
- 1 commit

TOTAL: 1 transaction with 2 delete and 2 insert statements. (setting user roles is an exception since we do select each role from the database to verify it's not a private role; there's still only 1 transaction though).

Unlike the previous version, the new version wraps everything in one transaction, so there can be no errors from interrupted partial updates (e.g. old associations deleted, new associations not added).

User private roles are handled with care: they can be neither assigned, nor deleted (see unit tests).

Old unused code, as well as redundant logic has been removed (refreshing the session, flushing conditionally, checking if a transaction is started, etc.). I've also tried to simplify the logic (nested loops where the outer loop was guaranteed to have only 1 element removed, etc.)

Database unit tests:

For each association class, we test the following:

  • add associations to new and to existing item
  • update item associations (add some, drop some)
  • drop all item associations
  • verify error when adding an invalid id for an associated item
  • verify error when adding a duplicate association

In addition:

  • test setting default user permissions and default history permissions when updating role users
  • verify private roles cannot be assigned
  • verify user associations with private roles are not affected by updating user roles
  • verify user associations with private roles are not affected by updating role users

For follow-up PRs:

  • Add same constraints + apply same refactoring to similar models (DatasetPermission, UserQuotaAssociation, etc.)

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. Login as admin, use the admin UI to add/drop user/role/group associations via the "edit permissions" link.

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

@jdavcs jdavcs added kind/bug kind/refactoring cleanup or refactoring of existing code, no functional changes area/testing area/admin area/database Galaxy's database or data access layer area/scripts area/security highlight/admin Included in admin/dev release notes labels Sep 5, 2024
@jdavcs jdavcs marked this pull request as ready for review September 6, 2024 17:01
@github-actions github-actions bot added this to the 24.2 milestone Sep 6, 2024
@jdavcs
Copy link
Member Author

jdavcs commented Sep 6, 2024

This is ready for review, but not for merging: waiting for a db migration that has been run on main to be merged first.

@mvdbeek mvdbeek removed the highlight/admin Included in admin/dev release notes label Sep 9, 2024
@mvdbeek
Copy link
Member

mvdbeek commented Sep 9, 2024

Can you please break out the performance improvements which looks solid to me (and for all intents and purposes fixes #18636) and the refactoring from the migration ? The migration itself should not involve manual admin action. Either we're certain we're dropping duplicates and we remove the duplicates during the migration, or we're not. If we're not I don't think we should do anything here.

@jdavcs
Copy link
Member Author

jdavcs commented Sep 9, 2024

Can you please break out the performance improvements which looks solid to me (and for all intents and purposes fixes #18636) and the refactoring from the migration ? The migration itself should not involve manual admin action. Either we're certain we're dropping duplicates and we remove the duplicates during the migration, or we're not. If we're not I don't think we should do anything here.

I can't break out the performance improvements from the migration because they utilize the new constraints in the database: we let the database check the validity of the data instead of doing it in the app for every single row, one row at a time. So they have to go together. However, we can run the data fix-up scripts as part of the migration. The scripts are quite simple: the not null constraint fix removes records that are completely useless (e.g. user-role-association with a user id but without a role id is useless), whereas the unique constraint fix removes the newer duplicate associations, so the only information we'd lose is when a duplicate association was added (those records do not contain any additional information). I'll fix this.

@jdavcs jdavcs force-pushed the dev_assoc_handling2 branch 2 times, most recently from 4c7ac82 to 139f1ca Compare September 10, 2024 05:36
@jdavcs
Copy link
Member Author

jdavcs commented Sep 10, 2024

Done. I've removed the scripts and any interactive stuff (user confirmation, etc.), and moved the code that fixes the data into the migrations directory since it's only called from migration scripts and should not be called from anywhere else.

delete_stmt = delete(UserRoleAssociation).where(UserRoleAssociation.user_id == user.id)
private_role = get_private_user_role(user, self.sa_session)
if not private_role:
log.warning("User %s does not have a private role assigned", user)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This likely won't be seen by anyone. If we do need this we should raise an exception here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that an exception is more appropriate. However, it appears there's something going on in user private role processing: we either don't assign a private role in some edge cases, or we delete them. As a result, we have 551 users without a private role on main:

galaxy_main=> select count(*) from galaxy_user u where u.id not in (select ura.user_id from user_role_association ura inner join role r on ura.role_id = r.id and r.type = 'private');
 count 
-------
   511

I'll post the probable cause on the committers channel. This PR solves it (if that indeed is the cause), but since there are records of users w/o a private role in the database, I think we should fix those records before raising an error here.

@@ -107,7 +107,9 @@ def test_update(self):
another_user_id = self.dataset_populator.user_id()
another_role_id = self.dataset_populator.user_private_role_id()
assert another_user_id is not None
update_response = self._put(f"groups/{group_id}", data={"user_ids": [another_user_id]}, admin=True, json=True)
update_response = self._put(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a breaking API change, right ? This will likely need double checking in the UI and in bioblend.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mvdbeek Yes, but that change does not affect the UI. I have removed the delete_existing_assocs=True argument from the 3 set_entity_FOO_associations methods of the model/security.py module (with FOO=group/user/role). The behavior of those method has not changed: existing associations are deleted; that's why no changes to the client code were necessary. There was only one place in the code base where one of those methods was called with delete_existing_assocs=False - and that was the api test, which is why I added the existing user, so that the test passes the full set of user ids that should be associated with the group after the update (which is what the UI does). That said, it is a change nevertheless. I'll make sure to add this to the release notes - would that be sufficient? @nsoranzo do we need to make any changes in bioblend? I've tested the UI, of course.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does look like a breaking change since delete_existing_assocs=False was used since this API endpoint was introduced in 2012.
On the other hand, the BioBlend documentation is actually consistent with what you are implementing here, and would fix a deficiency of the API already found in #18374 (comment)
Currently BioBlend tests only check that adding a new user to an empty group works, so we would need to extend the tests to verify the impact of this change.

Copy link
Member Author

@jdavcs jdavcs Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also noted in a TODO inline comment in the api test. I can add new API tests to test this for group, as well as API tests for testing basic create/delete/update for the user and role associations. That said, the unit db tests in this PR already test this extensively.

@jdavcs jdavcs force-pushed the dev_assoc_handling2 branch from 99bb7ba to 01319d6 Compare September 10, 2024 13:50
Comment on lines +1475 to +1476
If the provided list is empty, existing associations will be removed.
If the provided value is None, existing associations will not be updated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think user_ids and role_ids are ever None, see https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/schema/groups.py#L65-L70 . So maybe restore the user_ids = user_ids or [] and role_ids = role_ids or [] and drop the ifs?

Suggested change
If the provided list is empty, existing associations will be removed.
If the provided value is None, existing associations will not be updated.
If the provided list is empty or None, existing associations will be removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are never None simply because the UI currently provides for updating two types of associations at the same time, sending the full set of associated items for each relationship, which is then processed via the same endpoint on the backend. This applies equally to users, groups, and roles. If we wanted to add a UI feature that allows to update only one relationship (for example, user groups, but not user roles), we'd still have to retrieve all associations for groups AND roles, and send them back to the API. Which is wasteful and, I think, not great for many reasons. If we distinguish between an empty list and null, we provide for two options: "delete all associations not in the list" or "don't change anything", which is impossible with the current API. Finally, this makes testing easier and more straightforward: you only specify and pass what's being tested (I've just added a commit to actually do this!). And, finally, to second this comment, a null argument here, in my opinion, should be interpreted as "make no change".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is this possibility is not exposed via the API, so since this will already likely break the current API it would make sense to rewrite GroupUpdatePayload so that user_ids and role_ids are Optional lists with default None.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nsoranzo does this look right? 5d5065a

lib/galaxy/model/security.py Show resolved Hide resolved
lib/galaxy/model/security.py Show resolved Hide resolved
@@ -107,7 +107,9 @@ def test_update(self):
another_user_id = self.dataset_populator.user_id()
another_role_id = self.dataset_populator.user_private_role_id()
assert another_user_id is not None
update_response = self._put(f"groups/{group_id}", data={"user_ids": [another_user_id]}, admin=True, json=True)
update_response = self._put(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does look like a breaking change since delete_existing_assocs=False was used since this API endpoint was introduced in 2012.
On the other hand, the BioBlend documentation is actually consistent with what you are implementing here, and would fix a deficiency of the API already found in #18374 (comment)
Currently BioBlend tests only check that adding a new user to an empty group works, so we would need to extend the tests to verify the impact of this change.

jdavcs and others added 22 commits September 23, 2024 16:11
Add unique constraint, set not nullable for user-id, role-id
Add unique constraint, set not nullable for user-id, group-id
Add unique constraint, set not nullable for group-id, role-id
In the previous version,  we only passed NEW associations, which were
ADDED to the current associations. The new version implements a true
UPDATE operation, which replaces the existing set of associations
with the provided set of associations.
Co-authored-by: Marius van den Beek <[email protected]>
Co-authored-by: Marius van den Beek <[email protected]>
@jdavcs jdavcs force-pushed the dev_assoc_handling2 branch from 9daa07c to 2fc21f9 Compare September 23, 2024 21:05
@jdavcs
Copy link
Member Author

jdavcs commented Sep 24, 2024

I've added tests for migration data fixes: same as here #18493 (comment)

@jdavcs jdavcs merged commit f420f53 into galaxyproject:dev Sep 24, 2024
54 checks passed
@jdavcs jdavcs added the highlight/admin Included in admin/dev release notes label Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/admin area/database Galaxy's database or data access layer area/scripts area/security area/testing highlight/admin Included in admin/dev release notes kind/bug kind/refactoring cleanup or refactoring of existing code, no functional changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Editing a group through admin UI can lead to duplication of user_group_association entries
4 participants