-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue #1313 added artifacts filter support for files and Windows regi… #1732
Issue #1313 added artifacts filter support for files and Windows regi… #1732
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial round of comments
docs/plaso.engine.rst
Outdated
@@ -1,4 +1,5 @@ | |||
plaso\.engine package | |||
plaso\.engine package |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please undo these changes, these will happen on merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't know how this changed, but I think I got it back to the original.
'--artifacts_filter_file', '--artifacts-filter-file', | ||
dest='artifacts_filter_file', type=str, default=None, | ||
action='store', help=( | ||
'Path to a directory containing artifact filter definitions, which ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
directory ? I assume the artifacts filter file is a single file containing names of artifact definitions, please correct the help description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
raise errors.BadConfigObject( | ||
'Configuration object is not an instance of CLITool') | ||
|
||
artifacts_filter_file = cls._ParseStringOption(options, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please break line after ( and continue with a 4 space continuation indent (to match the style of the rest of the code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
plaso/cli/image_export_tool.py
Outdated
filter_file_object = filter_file.FilterFile(filter_file_path) | ||
find_specs = filter_file_object.BuildFindSpecs( | ||
environment_variables=environment_variables) | ||
if artifacts_filter_file_path: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move this to a separate method like _BuildFilterFindSpecs or equiv?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved this to the tools.py so that it can be shared by image_export, log2timeline and psteal.
plaso/cli/image_export_tool.py
Outdated
find_specs = self._knowledge_base.GetValue( | ||
artifacts_filter_file.ARTIFACTS_FILTER_FILE)[ | ||
artifact_types.TYPE_INDICATOR_FILE] | ||
elif filter_file_path: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise if about filter_filter and artifacts_filter_file are set, and indicate in the help that they are mutual exclusive
or merge the results of both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved this to the tools.py so that it can be shared by image_export, log2timeline and psteal.
environment variables. | ||
|
||
Returns: | ||
path_attributes dict[str]: Dictionary containing the path attributes, per |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
path_attributes dict[str] => dict[str]
also 1 space after :
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -170,23 +198,25 @@ def _ParseRecurseKeys(self, parser_mediator, root_key): | |||
if parser_mediator.abort: | |||
break | |||
|
|||
matching_plugin = None | |||
self._ParseKey(parser_mediator, registry_key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the value of replacing this method with another method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I build _ParseKey to avoid duplication of the same code in _ParseRecurseKeys and _ParseKeysFromFindSepcs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
plaso/parsers/winreg.py
Outdated
@@ -195,6 +225,7 @@ def ParseFileObject(self, parser_mediator, file_object, **kwargs): | |||
parser_mediator (ParserMediator): parser mediator. | |||
file_object (dfvfs.FileIO): a file-like object. | |||
""" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for this white line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
plaso/parsers/winreg.py
Outdated
find_specs = parser_mediator.knowledge_base.GetValue( | ||
artifacts_filter_file.ARTIFACTS_FILTER_FILE) | ||
|
||
if (find_specs and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flatten this if else by changing the order in which things are checked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored this a bit
tests/parsers/winreg.py
Outdated
with tempfile.NamedTemporaryFile(delete=False) as temp_file: | ||
test_filter_file = artifacts_filter_file.ArtifactsFilterFile( | ||
temp_file.name, knowledge_base) | ||
temp_file.write(b'name: TestRegistryKey\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move the data to write to a string or list and write this in a single call, also looks like you can move the creation of the test data out of the with
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Just one outstanding after the initial round now. |
I've refactored this to now use the artifacts registry that is built as part of the pre-existing artifacts_definitions work. I've also fixed a few other comments. This will now accept artifacts as follows:
This probably needs a bit more polish, but wanted you to check in on the approach if possible. |
please rebase from upstream master:
|
@jnettesheim please rebase from upstream master:
|
Rebased it, not sure if there is a better way to do it so that I didn't suddenly have all of the commits in my PR. |
did you use:
with this upstream: that should realign with your branch with upstream master so that you don't see all the commits that have happened in the mean time |
I used that command with that upstream, but fro my own fork of the repo .... Not sure if that impacted it. I think it's probably easiest to just create a new PR now as it isn't obvious to Adam or myself how to fix this. Does that work for you? |
Likely you'll need to do this on your feature branch that contains your PR. @jnettesheim please do not create a new PR try this:
Or I can (likely) do this remotely as well |
I've run a manual rebase on my own clone of j's fork, and there are several dozen conflicts. I'm not certain the cleanest way out of this, but given the size of this PR and the delta between masters, it won't be easy no matter how it's done. |
let me have a look |
6334810
to
1f1d100
Compare
Codecov Report
@@ Coverage Diff @@
## master #1732 +/- ##
==========================================
+ Coverage 84.17% 84.21% +0.04%
==========================================
Files 436 437 +1
Lines 31321 31439 +118
==========================================
+ Hits 26365 26477 +112
- Misses 4956 4962 +6
Continue to review full report at Codecov.
|
Done this:
|
Yeah, no resets for me. Not my code. :P |
it was that or resolving merge conflicts manually, the squash just reapplies the changes ontop of the rebased branch |
@jnettesheim let's chat about this PR, it is getting large I have a lot of nits ;) I opt we break out engine.ArtifactFilters into a separate PR first |
@@ -86,11 +95,19 @@ def ParseOptions(cls, options, configuration_object): | |||
raise errors.BadConfigOption( | |||
'Unable to determine path to artifact definitions.') | |||
|
|||
custom_artifacts_path = getattr( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
github does not allow me to create comment on 122 but you'll need to set custom_artifacts_path in the configuration object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I need to as I'm processing the file into the artifacts_registry within this helper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, I see you just load the artifact definitions from the path
argparse group. | ||
""" | ||
argument_group.add_argument( | ||
'--artifact_filters', '--artifact-filters', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe split this in 2 different options, artifact_filters and artifact_filters_file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
'formats. (1) Directly on the command line (comma separated), in a' | ||
'in a file with one artifact name per line, or one operating system' | ||
'specific keyword which will process all artifacts supporting that' | ||
'OS (windows, linux, darwin). Forensic artifacts are stored ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Linux, MacOS (Darwin), Windows are names, so please start them with a captial
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
'specific keyword which will process all artifacts supporting that' | ||
'OS (windows, linux, darwin). Forensic artifacts are stored ' | ||
'in .yaml files that are directly pulled from the artifact ' | ||
'definitions project. You can also specify a custom artifacts yaml' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: plaso does not use the double spaces after the end of a sentence. Please use a single space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
'definitions project. You can also specify a custom artifacts yaml' | ||
'file (see --custom_artifact_definitions). Artifact definitions ' | ||
'can be used to describe and quickly collect data of interest, such' | ||
' as specific files or Windows Registry keys.')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for consistency put the space at the end of the previous line (inside the string)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
plaso/engine/artifact_filters.py
Outdated
"""Initializes a filter file. | ||
|
||
Args: | ||
path (str): path to a file that contains one or more forensic artifacts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
artifacts_registry missing from docstring
artifacts_registry (artifacts.ArtifactDefinitionsRegistry]): artifact
definitions registry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merged in #1883
plaso/engine/artifact_filters.py
Outdated
self._knowledge_base = knowledge_base | ||
|
||
def BuildFindSpecs(self, environment_variables=None): | ||
"""Build find specification from a forensic artifacts file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"from a forensic artifacts file." => "from artifact definitions."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merged in #1883
plaso/engine/artifact_filters.py
Outdated
artifact_types.TYPE_INDICATOR_WINDOWS_REGISTRY_VALUE): | ||
# TODO: Handle Registry Values Once Supported in dfwinreg. | ||
logging.warning(('Unable to handle Registry Value, extracting ' | ||
'key only: {0:s} ').format(source.key_value_pairs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
source.key_value_pairs is a list of tuples, so {:s} is likely the wrong formatter
use an explicit join e.g. ','.join(['(key: {0:s}, value: {1:s})'.format(key_path, value_name) for key_path, value_name in source.key_value_pairs])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merged in #1883
plaso/engine/artifact_filters.py
Outdated
@@ -0,0 +1,239 @@ | |||
# -*- coding: utf-8 -*- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe copy this to a separate PR and review this first?
This PR is getting big
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merged in #1883
plaso/engine/artifact_filters.py
Outdated
environment_variables (Optional[list[EnvironmentVariableArtifact]]): | ||
environment variables. | ||
""" | ||
find_specs = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I stopped my review here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merged in #1883
134a34c
to
1f1d100
Compare
3517325
to
21ed290
Compare
0c6c993
to
747bc94
Compare
@joachimmetz - I think this is pretty close now, PTAL. |
Please make linter pass: https://travis-ci.org/log2timeline/plaso/jobs/396010009 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits and questions, I'll have closer look at winreg a bit later today
plaso/cli/extraction_tool.py
Outdated
@@ -75,6 +75,8 @@ def _CreateProcessingConfiguration(self, knowledge_base): | |||
""" | |||
# TODO: pass preferred_encoding. | |||
configuration = configurations.ProcessingConfiguration() | |||
configuration.artifact_filters = self._artifact_filters | |||
configuration.artifacts_registry = self._artifacts_registry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: artifacts_registry
Please don't pass complex objects via configuration configuration.artifacts_registry. Configuration is meant to pass settings. If you need pass complex objects do this via the methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we were storing a complete ArtifactDefinitionsRegistry() in the configuration object, prior to these changes. However, I don't think it was ever used, so I've refactored the helper to store the artifact definitions path and customer artifact definitions path only. All tests are passing after refactoring, so I think I've accounted for any other impact.
plaso/engine/configurations.py
Outdated
@@ -211,6 +212,8 @@ class ProcessingConfiguration(interface.AttributeContainer): | |||
def __init__(self): | |||
"""Initializes a process configuration object.""" | |||
super(ProcessingConfiguration, self).__init__() | |||
self.artifacts_registry = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: artifacts_registry
Please don't pass complex objects via configuration configuration.artifacts_registry. Configuration is meant to pass settings. If you need pass complex objects do this via the methods.
also it is not defined in the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we were storing a complete ArtifactDefinitionsRegistry() in the configuration object, prior to these changes. However, I don't think it was ever used, so I've refactored the helper to store the artifact definitions path and customer artifact definitions path only. All tests are passing after refactoring, so I think I've accounted for any other impact.
@@ -56,6 +65,7 @@ def ParseOptions(cls, options, configuration_object): | |||
BadConfigObject: when the configuration object is of the wrong type. | |||
BadConfigOption: if the required artifact definitions are not defined. | |||
""" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove white line (-
white line)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -86,11 +95,19 @@ def ParseOptions(cls, options, configuration_object): | |||
raise errors.BadConfigOption( | |||
'Unable to determine path to artifact definitions.') | |||
|
|||
custom_artifacts_path = getattr( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, I see you just load the artifact definitions from the path
registry.ReadFromFile(reader, custom_artifacts_path) | ||
elif custom_artifacts_path and not os.path.isfile(custom_artifacts_path): | ||
raise errors.BadConfigOption( | ||
'No such artifacts filter file: {0:s}.'.format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: looks like this and the next line could fit on a single line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, they do not fit on a single line.
plaso/engine/engine.py
Outdated
Raises: | ||
RuntimeError: if no valid FindSpecs are built. | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove white line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
plaso/engine/engine.py
Outdated
environment_variables=environment_variables) | ||
|
||
if (artifact_filter_names or filter_file_path) and not find_specs: | ||
raise RuntimeError('Error processing filters, no valid specifications' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opt to break after (
for consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -170,23 +198,25 @@ def _ParseRecurseKeys(self, parser_mediator, root_key): | |||
if parser_mediator.abort: | |||
break | |||
|
|||
matching_plugin = None | |||
self._ParseKey(parser_mediator, registry_key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
@@ -81,7 +81,7 @@ def testBuildFindSpecsWithFileSystem(self): | |||
test_filter_file.BuildFindSpecs( | |||
environment_variables=[environment_variable]) | |||
find_specs_per_source_type = knowledge_base.GetValue( | |||
test_filter_file._KNOWLEDGE_BASE_VALUE) | |||
test_filter_file.KNOWLEDGE_BASE_VALUE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just override pylint check for access to protected members in the tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do access this in winreg.py as well
tests/parsers/winreg.py
Outdated
find_specs = { | ||
test_filter_file.KNOWLEDGE_BASE_VALUE : knowledge_base.GetValue( | ||
test_filter_file.KNOWLEDGE_BASE_VALUE)} | ||
storage_writer = self._ParseFile(['SYSTEM'], parser, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please break after (
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
plaso/parsers/winreg.py
Outdated
find_specs (dfwinreg.FindSpecs): Keys to search for. | ||
""" | ||
searcher = dfwinreg_registry_searcher.WinRegistrySearcher(win_registry) | ||
for registry_key_path in list(searcher.Find(find_specs=find_specs)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you use iter() instead of list() ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
try: | ||
self._ParseRecurseKeys(parser_mediator, root_key) | ||
except IOError as exception: | ||
parser_mediator.ProduceExtractionError('{0:s}'.format(exception)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add 1 white line below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Mainly nits remaining. If you address them and the linter issues, this CL could be good to go. |
@joachimmetz - This should resolve the open comments and questions. I did change around the artifacts_registry functionality. I'm having trouble running this against a test image right now (not seeing my new command line arguments). I would also like to add an e2e test and some tests to my new engine functions. The e2e test could likely be another PR. |
I've made some minor changes, updated the branch via github. |
@Onager for awareness override of CodeFactor results (it indicated minor code duplication primarily) |
One line description of pull request
Adding forensic artifact based filters for files and Windows registry keys- issue #1313.
Description:
I've added a first pass at artifacts based filters. This will only get files and registry keys for now. I try to log warnings to catch instances where users are including other filters that aren't fully supported yet.
I'm storing the filter find_specs in the knowledge base, if you would prefer it not be there I can parse the file when needed, it would just lead to me parsing the file multiple times in some cases.
This probably also needs an end to end test and would be happy to get that in place.
Sincere apologies for the large PR.
Related issue (if applicable): adds to fixes for #1313
Notes:
One thing to please look at:
LIne 267 in winreg.py: win_registry._registry_files.clear() ---- I had to call this to clear file not being closed errors, but I don't think we want that.
Also, appreciate any notes on glob expansion as I currently have it.
All contributions to Plaso undergo code
review. This makes sure
that the code has appropriate test coverage and conforms to the Plaso style
guide.
One of the maintainers will examine your code, and may request changes.
Checklist:
If new dependencies are required: