-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conserve memory when resolving paths, recursively handle symlinks, check methods now more verbose and granular. #18
Open
LucasFaudman
wants to merge
20
commits into
pixee:main
Choose a base branch
from
LucasFaudman:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…entation by properly handling globbing, chained commands via other binaries, directory traversal and missing of banned binaries accessed in unconventional ways. Added checking for owner/group and uncommon path types that should not be in args. Added typeing
… use precomputed abs_path_strings for .endswith check
…age>) not == since my execptions have slighly different message format but the check is still the exact same
…t and FuzzDB tests
…d shell syntax with recursive_shlex_split. Compatible with Python3.8
…t and FuzzDB tests
…_EXPANSION_OPERATORS. - Handle arithmetic expansion of parameters using -+ and/or nesting and shell variables. - Track env vars set/modified by expansion and use them in the final command. - Use expanuser to handle ~ and ~user tilde expansion when resolving paths. - Correctly hand alphanumeric sequences expansions - Block process substitution via input redirection. - More tests, refactoring and comments. My _shell_expand implementation is still in progress but is neccessary since there are no other viable solutions in Python. The best I have seen is https://github.com/kojiromike/parameter-expansion but it cannot be used in a security context because they say: "All the standard shell expansions are supported, including some level of nested expansion, as long as this is not too complex or ambiguous." and ambigous cases are exactly what needs to be handled.
…D_CHAINING_EXECUTABLES
…d if shell expansion is needed. - _shell expand is only used when shell=True or the executable is a shell. (list of shells derived from https://www.in-ulm.de/~mascheck/various/shells/) but reading /etc/shells could be used to get a more accurate/concise list on a per-system basis. - The executable Popen kwarg is now handled correctly and takes precedence over the shell kwarg in determining if shell expansion is needed. See subprocess.py 1593-1596: When an executable is given and shell is True, the shell executable is replaced with the given executable. - the initial venv state is a copy of the env Popen kwarg to not modify the original Popen kwargs. - _resolve_executable_path now takes venv into account and uses the PATH env variable to find the executable if env is set in the Popen kwargs since this is how the subprocess module behaves. - test_popen_kwargs added to test the Popen kwargs and the initial env state. Other tests adjusted accordingly and refactored EXCEPTIONS outside class so they can be used as pytest params.
…y converting _resolve_path_in_parsed_command to a generator of Path objects that takes a single cmd_part and yields all the resolved paths for it. - Split check_* methods based on if they are checking the expanded_command, a single command part, a single path string, or single path. - First the command is expanded and expanded_command is checked. Then the if expanded_command passes checks it is used to create a generator of cmd_parts. - Checks are then preformed on each part and its paths as they are generated, so all the parts/paths do not need to be stored in memory at once. - Reduces the space complexity from O(n) to O(1) where n is the number of command parts or paths. - More loops over the BANNED_* are needed which does slightly increase the time complexity, but this trade off is worth it since: - This can actually reduce the overall check execution time by raising an exception sooner before fully resolving all the parts/paths. - The number of BANNED_* is relatively small and constant. - The number of command parts and more importantly paths could be very large and unpredictable. - Recursively handle symlinks to symlinks, and/or symlinks to directories which may contain symlinks. - _path_is_executable now will not return True for dirs since dirs can have the x bit but they are not executables. - max_resolved_paths, and rglob_dirs kwargs can be used to control this behavior. - SecurityException raised if the number of resolved paths exceeds max_resolved_paths. - updated test_resolve_paths_in_parsed_command and added test_max_resolved_paths to test this behavior. - Underlying check implmentation logic is unchanged.
Split BANNED_COMMAND_CHAINING_ARGUMENTS from BANNED_COMMAND_CHAINING_EXECUTABLES. Add -ok -okdir system( (for awk and similar) to BANNED_COMMAND_CHAINING_ARGUMENTS Refactor check_path_is_chaining_executable and check_path_is_shell to path checks.
…memory at once. - Checks performed on paths as they are generated to catch banned paths as early as possible. - _recursive_resolve_symlinks handles symlinks to symlinks and symlinks to directories which may contain symlinks - max_resolved_paths kwarg can be used to limit the number of paths resolved to prevent the case where a large directory is injected into a command causing long rglob run-times needed to accurately preform checks which could allow the injection to be used as a DDoS mechanism. - rglobdirs kwarg can be used to control the behavior of whether files in directories are considered when preforming checks. This is useful to reduce check run-time when a user knows the command they are using does not need to consider files in directories since it does not have recursive capabilities.
https://sonarcloud.io/project/issues?open=AY4rNwVcynvJrFFtuQvF&id=pixee_python-security - Remove or correct this useless self-assignment. - Use concise character class syntax '\w' instead of '[a-zA-Z0-9_]'. - Add replacement fields or use a normal string instead of an f-string. - Replace the unused local variable expanded_command with _. - Rename this local variable Popen_env to match the regular expression ^[_a-z][a-z0-9_]*$.
Quality Gate passedIssues Measures |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
To understand why this is neccessary consider a worst case scenario where this module is user on a system with low RAM like a t2.nano or similar that contains symlinks to NFS mounted directories on other systems in its domain with very large directory trees.
If an attacker was able to get a path through the developers input filters that resolved to a symlink to a directory on the NFS mount the module would need to resolve all of the paths in the directory tree to preform checks on them which could take a very long time and consume a lot of memory potentially crashing the system.
This approach prevents this and other similar attacks against the check mechanism since it only needs to store one path at a time in memory and can stop resolving paths if the number of resolved paths exceeds the max_resolved_paths kwarg.
Additionally, once I add the ability to customize the
BANNED_*
sets using kwargs in the future, this approach will allow for directories to be added toSENSITIVE_FILE_PATHS
and raise a SecurityException before resolving the its children.