Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-31761 LDAP only uses first cached default file scope permission #18651

Merged
merged 1 commit into from
Jun 19, 2024

Conversation

kenrowland
Copy link
Contributor

@kenrowland kenrowland commented May 14, 2024

Added a per user default file scope permission cache

Signed-Off-By: Kenneth Rowland [email protected]

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Copy link

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-31761

Jirabot Action Result:
Workflow Transition: Merge Pending
Updated PR

@kenrowland kenrowland changed the title HPCC-31761 LDAP caches first user's default file scope permission and… HPCC-31761 LDAP only uses first cached default file scope permission May 14, 2024
@kenrowland kenrowland requested review from ghalliday and jakesmith May 14, 2024 17:45
@kenrowland
Copy link
Contributor Author

@ghalliday @jakesmith preliminary review. Will add a flag to operate as it did. Also need to add read and write locks for the default file scope permission cache. Both will be in the next commit.

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kenrowland - looks good in principle.

But it needs to be made thread safe, afaics as it stands, 1 thread could be clearing or altering m_userDefaultFileScopePermissions, whilst another read from it, which would lead to a crash.

I think if the clear() goes in removeAllManagedFileScopes (with it's WriteLockBlock lock) , and the query is protected by a ReadLockBlock on m_scopesRWLock, it should make it thread safe.

It looks like m_managedFileScopesMap can change whilst it is being processed by another thread

m_defaultPermission = SecAccess_None;
defaultPermission = m_secMgr->queryDefaultPermission(user);
std::string userName(username);
m_userDefaultFileScopePermissions.insert(std::pair<std::string, SecAccessFlags>(userName, defaultPermission));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial: more concisely expressed (and more efficient) as:

m_userDefaultFileScopePermissions.emplace(username, defaultPermission);

NB: no need for explicit std::string conversion.

@kenrowland
Copy link
Contributor Author

@jakesmith Don't know if you saw my previous comment, but locks were something I am adding.

@jakesmith
Copy link
Member

@jakesmith Don't know if you saw my previous comment, but locks were something I am adding.

No, sorry, I missed it.
Okay, initial commit looks okay.

@kenrowland kenrowland requested a review from jakesmith May 21, 2024 15:06
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kenrowland - please see comments.

@@ -664,6 +664,7 @@ class CLdapConfig : implements ILdapConfig, public CInterface
m_sdfieldname.append("aci");
else if(m_serverType == OPEN_LDAP)
m_sdfieldname.append("aci");

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial: unintended new newline (?) - nice to not alter this file for git history if so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not sure why, but removed it.

system/security/shared/caching.cpp Outdated Show resolved Hide resolved
defaultScopesReadLock.clear();
WriteLockBlock defaultScopesWriteLock(m_defaultScopesRWLock);
defaultPermission = m_secMgr->queryDefaultPermission(user);
std::string userName(username);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least in my setup, this line is not necessary, i.e. you can pass const char * to the emplace and it will implicitly construct a std::string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The find, in my experience, has to construct an object of the key type. So, I left the username std::string variable, but cleaned it up a bit.

@@ -711,8 +741,16 @@ void CPermissionsCache::flush()
delete (*ui).second;
m_userCache.clear();
}
if (m_useLegacyDefaultFileScopePermissionCache)
{
m_defaultPermission = SecAccess_None;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be :
m_defaultPermission = SecAccess_Unknown

.. to trigger the next call to queryDefaultPermission to get new permissions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch.

@@ -664,6 +664,7 @@ class CLdapConfig : implements ILdapConfig, public CInterface
m_sdfieldname.append("aci");
else if(m_serverType == OPEN_LDAP)
m_sdfieldname.append("aci");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not sure why, but removed it.

system/security/shared/caching.cpp Outdated Show resolved Hide resolved
defaultScopesReadLock.clear();
WriteLockBlock defaultScopesWriteLock(m_defaultScopesRWLock);
defaultPermission = m_secMgr->queryDefaultPermission(user);
std::string userName(username);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The find, in my experience, has to construct an object of the key type. So, I left the username std::string variable, but cleaned it up a bit.

@@ -711,8 +741,16 @@ void CPermissionsCache::flush()
delete (*ui).second;
m_userCache.clear();
}
if (m_useLegacyDefaultFileScopePermissionCache)
{
m_defaultPermission = SecAccess_None;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch.

if (m_secMgr)
m_defaultPermission = m_secMgr->queryDefaultPermission(user);
else
m_defaultPermission = SecAccess_None;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to SecAccess_Full. If no security manager then full access. I'm pretty sure the code essentially prevents this from being called if there is no security manager, so it may be moot, but correct none the less.

@kenrowland kenrowland requested a review from jakesmith May 22, 2024 15:29
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kenrowland - looks good. Please squash for final scan.

@kenrowland
Copy link
Contributor Author

@jakesmith squashed as requested. You mentioned final scan, so I re-requested review.

@kenrowland kenrowland requested a review from jakesmith May 24, 2024 13:37
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kenrowland Looks good.

@ghalliday - should this go into 9.6 ?

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kenrowland a couple of questions/comments. No need to change to a critical section since tested as-is.

if (m_secMgr)
{
const std::string username(user.getName());
ReadLockBlock defaultScopesReadLock(m_defaultScopesRWLock);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have used a simpler critical section.

  • The R/W lock complicates this code (relock and potential for deadlock you have avoided).
  • This is unlikely to be heavily contended.
  • R/W lock is less efficient than a a critical section.
  • If it contended and needs a write then the other threads will need to wait anyway.
    The R/W blocks are good for situations where parallel reads can continue and not have the write block them. (Even then it is only likely to be helpful when slow and heavily contended.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I have seen, it appears that default file scopes are used a lot. Making this code as efficient as possible is probably the right thing to do. I agree that lock for read, release, lock for write and check again, is not optimal. Since you mentioned a critical section, I think the time to change it is now. I will make that change in the interest of keeping default scope checking efficient.

if (m_secMgr)
m_defaultPermission = m_secMgr->queryDefaultPermission(user);
else
m_defaultPermission = SecAccess_Full;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is different from the code in 9.2.x/9.4.x. Is this deliberate? I suspect it is, but could do with an indication why.

Copy link
Contributor Author

@kenrowland kenrowland May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A previous outdated comment indicates why it is set to Full. If there is no security manager, access should be full for all users and all scopes. However, technically as currently implemented, this code will never be called if there is no security manager. However, must still handle the case. I will add short comment on that line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case where the legacy cache is not used still returns SecAccess_None. It be clearer to test for m_secMgr at the start of the function and return SecAccess_Full.

else
m_defaultPermission = SecAccess_Full;

DBGLOG("Legacy default file scope permission set to %s(%d) for all users, based on User '%s'", getSecAccessFlagName(m_defaultPermission),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like this tracing could be excessive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That line will appear once whenever the cache is reset and when using the legacy code that has the error. I think it is important to leave in if the legacy option is chosen, then you can search back in the logs, if needed, to see what default scope permission is in effect.

{
defaultPermission = m_secMgr->queryDefaultPermission(user);
m_userDefaultFileScopePermissions.emplace(username, defaultPermission);
DBGLOG("Added user '%s' to default file scope permissions with access %s(%d)", username.c_str(), getSecAccessFlagName(defaultPermission),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this logging be protected with a flag e.g. traceLDAP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we move towards more secure permissions and removal of Authenticated users from default file scope permissions, I think this information is useful in helping lock down a cluster. In fact, I debated making it a PROGLOG so that the information on default scopes assigned to users is available. If you think a flag is more appropriate, I can add one. I think now is the time if we are going to do it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be logged for the first lookup for each user every time the cache is cleared. Given the potential cost of logging that might add up. Keep as is for the moment, but review how much logging it is generating once this is being used on a live system.
(Simple support for trace flags is not yet available in esp, when it is it will be much simpler to add a guard.)

@kenrowland
Copy link
Contributor Author

@jakesmith @ghalliday I converted the synchronization protection for the default scope permission map to use a mutex. This was based on the complexity of having to release the read lock, acquire the write lock, and recheck for a user's permission. A critical section was suggested, but the map needed to be protected when cleared after a cache timeout and when flushed.

@kenrowland kenrowland requested review from ghalliday and jakesmith May 31, 2024 13:25
@jakesmith
Copy link
Member

A critical section was suggested, but the map needed to be protected when cleared after a cache timeout and when flushed.

curious, can you elaborate, why did this preclude using a CriticalSection ?

@kenrowland
Copy link
Contributor Author

The semantics of a critical section is to protect a section of code from being reentered. The scope permission map needed to be protected in three different places, so a critical section did not seem appropriate. Instead a mutex protecting access to the data structure seemed appropriate.

@ghalliday
Copy link
Member

The semantics of a critical section is to protect a section of code from being reentered. The scope permission map needed to be protected in three different places, so a critical section did not seem appropriate. Instead a mutex protecting access to the data structure seemed appropriate.

That is probably the original use for critical sections, but they provide very similar functionality to mutxes. I previously thought that the only difference was that Mutexes supported named inter-process mutexs. However it was a great question - because it made me look at the current implementation of mutexes.

They appear to be using condition variables in addition to the mutex. The code is ancient, but I am wondering why they are implemented like that.
So for interest I added them to the AtomicTimingStressTest. Here are the comparisons:

10:01:06.590708 1593187 UNK UNK CriticalSection,unsigned __int64@1/1 threads(1) 18ns/iteration lost(0)
10:01:06.691740 1593187 UNK UNK CriticalSection,unsigned __int64@1/1 threads(2) 50ns/iteration lost(0)
10:01:08.514772 1593187 UNK UNK CriticalSection,unsigned __int64@1/1 threads(32) 56ns/iteration lost(0)
10:01:12.619658 1593187 UNK UNK CriticalSection,unsigned __int64@1/1 threads(64) 64ns/iteration lost(0)
10:01:16.767319 1593187 UNK UNK CriticalSection,unsigned __int64@1/1 threads(65) 63ns/iteration lost(0)
10:01:24.923653 1593187 UNK UNK CriticalSection,unsigned __int64@1/1 threads(128) 63ns/iteration lost(0)

and

10:02:37.931778 1593187 UNK UNK Mutex,unsigned __int64@1/1 threads(1) 22ns/iteration lost(0)
10:02:39.980819 1593187 UNK UNK Mutex,unsigned __int64@1/1 threads(2) 1024ns/iteration lost(0)
10:04:20.179637 1593187 UNK UNK Mutex,unsigned __int64@1/1 threads(32) 3130ns/iteration lost(0)
10:07:42.497517 1593187 UNK UNK Mutex,unsigned __int64@1/1 threads(64) 3160ns/iteration lost(0)
10:11:06.593294 1593187 UNK UNK Mutex,unsigned __int64@1/1 threads(65) 3138ns/iteration lost(0)
10:17:39.614748 1593187 UNK UNK Mutex,unsigned __int64@1/1 threads(128) 3069ns/iteration lost(0)

i.e. they perform terribly. They perform even worse as amount of work performed in the mutex increases.

I will open a jira to switch Mutexes to use the critical section code!

@ghalliday
Copy link
Member

@kenrowland having taken a closer look at the mutex implementation/performance this code should be switched to use a CriticalSection instead of a Mutex. (I agree we should really use Mutex in many places we use CriticalSection.)

Longer term we will rewrite the Mutex class so it is efficient.

@kenrowland
Copy link
Contributor Author

@ghalliday Switched to use a critical section

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kenrowland - 1 question, but looks fine.

@@ -549,7 +549,8 @@ inline void CPermissionsCache::removeAllManagedFileScopes()

etc. Until full scope path checked, or no read permissions hit on ancestor scope.
*/
static CriticalSection msCacheSyncCS;//for managed scopes cache syncronization
static CriticalSection msCacheSyncCS;//for managed scopes cache synchronization
static CriticalSection syncDefaultScopePermissions;//for cached default file scope permissions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curios why this (and msCacheSyncCS) aren't members of CPermissionCache? (

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The protection for managed scopes was implemented that way, so I kept the same paradigm. I'd have to look deeper, but perhaps it is related to the option to share the cache across multiple instances of the security manager (as is the case with an ESP with multiple services loaded).

@kenrowland
Copy link
Contributor Author

@ghalliday Any final comments?

@kenrowland
Copy link
Contributor Author

@ghalliday please merge if no further comments

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kenrowland I'm not sure. I think best to merge as is, and have a follow on PR that moves the code out of the CriticalSection


SecAccessFlags defaultPermission = SecAccess_None;
CriticalBlock defaultScopePermissionBlock(syncDefaultScopePermissions);
const std::string username(user.getName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: This code doesn't need to be inside the critical section. Minimising the code with a mutex helps reduce contention. I will merge as-is.

{
defaultPermission = m_secMgr->queryDefaultPermission(user);
m_userDefaultFileScopePermissions.emplace(username, defaultPermission);
DBGLOG("Added user '%s' to default file scope permissions with access %s(%d)", username.c_str(), getSecAccessFlagName(defaultPermission),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is best to avoid any logging inside a critical section - because it could be blocked on io. Better to set a flag and trace once the critical section is left.

@ghalliday
Copy link
Member

@kenrowland I have created a Jira to further clean up the code.
Should this target an earlier version (with the option disabled by default) and the default on in 9.8.x?

@kenrowland
Copy link
Contributor Author

kenrowland commented Jun 11, 2024

@ghalliday

First, do you want the logging code moved as part of a group of PRs to be merged at once?

Second, IMO, this fix should target both 9.6 and 9.8, both with the option to use the old code disabled. It fixes security holes. Plus any dependence on the older "way" is not predictable.

I sent a separate email with information about how we can handle the fix.

@kenrowland
Copy link
Contributor Author

@ghalliday request for merging may have gotten lost in the recent comments. Once merged, making the changes for the follow up Jira will be much easier.

@ghalliday
Copy link
Member

ok. @kenrowland please can you retarget this to 9.6.x with the option defaulting to legacy, and then when that is merged create a separate pr enabling it targetting master.

@kenrowland kenrowland changed the base branch from master to candidate-9.6.x June 13, 2024 12:54
@kenrowland kenrowland changed the base branch from candidate-9.6.x to master June 13, 2024 12:55
@kenrowland kenrowland changed the base branch from master to candidate-9.6.x June 13, 2024 12:59
@kenrowland
Copy link
Contributor Author

@ghalliday Retargeted as requested

… uses it for future requests

Added a per user default file scope permission cache

Signed-Off-By: Kenneth Rowland [email protected]
@ghalliday ghalliday merged commit de2efba into hpcc-systems:candidate-9.6.x Jun 19, 2024
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants