Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouples Filesystem implementation #48

Merged
merged 15 commits into from
Jan 11, 2025

Conversation

dgmcdona
Copy link
Contributor

This refactors the way that the parser handles paths in general, removing the filesystem-specific implementation details from the parsing fuctions, and providing a new interface centered around providing the correct readers through the FileProvider trait. Default implementations for live-system and logarchive formats are provided, as are utilities for categorizing files by path. This requird updates throughout the repo's tests, and if merged, will require updates to the larger tests that are currently shipping in the test archive.

@dgmcdona dgmcdona force-pushed the dgmcdona/filesystem_decoupling branch from aec24f6 to 8a4a0e0 Compare December 20, 2024 05:34
@dgmcdona
Copy link
Contributor Author

@puffyCid The tests included in the crate are passing - paths.rs needs more tests but it's late and I figured I'd go ahead and put this forward for some initial review.

There are a couple of things that I'm not sure about right now:

  1. Should the FileProvider trait yield Result items instead of just Box<dyn Read>? Or should error handling when producing readers be left to whoever implements the trait?
  2. Should the trait's iterators also yield path information in addition to the readers? This lead to some changes when updating the examples, since access to the source file names is lost.

@dgmcdona dgmcdona force-pushed the dgmcdona/filesystem_decoupling branch from 9df54d3 to be73d05 Compare December 21, 2024 20:36
@puffyCid
Copy link
Collaborator

@dgmcdona This is looking good. Few thought so far:

  1. Result vs Box. Either one seem fine to me. I am incline to say let the users determine how they want handle trait error handling
  2. Lets try to include path info in the reader. I think that may be useful to keep

For tests, I uploaded test_data.zip to the releases page. I will open PR that will commit the tests/*.rs files and update CI to only pull down the test_data.zip that should then make it possible i think to make changes to those files in this PR

@puffyCid puffyCid mentioned this pull request Dec 21, 2024
@dgmcdona dgmcdona force-pushed the dgmcdona/filesystem_decoupling branch 2 times, most recently from 690bf72 to 7e2f60f Compare December 21, 2024 22:47
This refactors the way that the parser handles paths in general,
removing the filesystem-specific implementation details from the parsing
fuctions, and providing a new interface centered around providing the
correct readers through the `FileProvider` trait. Default
implementations for live-system and logarchive formats are provided, as
are utilities for categorizing files by path. This requird updates
throughout the repo's tests, and if merged, will require updates to the
larger tests that are currently shipping in the test archive.
Updates the `FileProvider` trait to return iterators of boxed
`SourceFile` trait objects. This preserves source file path information
for use by the parsers.
@dgmcdona dgmcdona force-pushed the dgmcdona/filesystem_decoupling branch from 7e2f60f to ae63ec0 Compare December 21, 2024 22:50
Not sure why this increased, but I'm assuming that its related to path
enumeration and is a positive change.
@dgmcdona dgmcdona force-pushed the dgmcdona/filesystem_decoupling branch from 08da0e7 to 1b309b2 Compare December 21, 2024 23:29
Another test value that needed a bump after incorporating changes.
@dgmcdona
Copy link
Contributor Author

Increased two check values in separate tests - more records were being returned, presumably because of the inclusion of the Signpost directory.

examples/unifiedlog_iterator/src/main.rs Outdated Show resolved Hide resolved
examples/unifiedlog_iterator/src/main.rs Outdated Show resolved Hide resolved
@puffyCid
Copy link
Collaborator

This is looking really good, im really happy the tests all pass.

I tried out the example files, looks like some minor issues may slipped through.
Could you make sure the examples still run?

  1. I left some comments for unifiedlog_iterator
  2. parse_tracev3 works fine
  3. Both unifiedlog_parser and unifiedlog_parser_json print errors below:
./unifiedlog_parser_json -l -o out4.csv
Starting Unified Log parser...
error: a value is required for '--input <INPUT>' but none was supplied

./unifiedlog_parser_json -i ~/Downloads/system_logs.logarchive -o out4.csv
Starting Unified Log parser...
thread 'main' panicked at unifiedlog_parser_json/src/main.rs:147:10:
called `Result::unwrap()` on an `Err` value: Os { code: 20, kind: NotADirectory, message: "Not a directory" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@puffyCid
Copy link
Collaborator

puffyCid commented Jan 9, 2025

@dgmcdona if u have time and get chance to at least fixup the unifiedlog_iterator example. I would be happy to merge.
Otherwise, i guess merge now and fix the examples in subsequent PR

@dgmcdona
Copy link
Contributor Author

dgmcdona commented Jan 9, 2025

Thanks, I'll push the remaining fixes shortly

This removes the other examples since all of the functionality is now
contained by the unifiedlog_iterator example. Also renames
unifiedlog_iterator to unifiedlog_parser
@dgmcdona
Copy link
Contributor Author

dgmcdona commented Jan 9, 2025

There was so much functionality shared between the examples that I just bundled it all together with a --mode flag in unifiedlog_iterator, renamed it to unifiedlog_parser, and removed the others. Fixed the bug with read -> read_to_end, as well as the warning being emitted because the dsc directory itself wasn't being filtered from the LiveSystemProvider's dsc file paths.

@puffyCid
Copy link
Collaborator

looks good thanks!

@puffyCid puffyCid merged commit 3881cdb into mandiant:main Jan 11, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants