shorter logpreamble parse implementation by more nom combinators usage #33

jrouaix · 2024-11-21T15:36:02Z

Hello @puffyCid

This is a revival of the too ambitious #22.

I plan to provide you with a lot of small, easy to review&merge PRs like this one.

So in this first one : LogPreamble

more nom combinators usage
- direct parsing integers (no need to take then parse)
- direct parsing 3 elements (using nom tuple combinator)
- direct final struct creation (no mut needed)
added anyhow dev dependency for easy error handling
- no panicable code (I tend to avoid unwraps even in test code when I can)
added a parse function the consumes the input.
- its not really used yet, cause the detect one is use
- but the detect one makes you re-parse preamble again in other chunks parser, it's a waste.
- so i'll probably slowly migrate to the parse/consuming function in the next submissions

Tell me what you think about this plan ?

jrouaix · 2024-11-21T15:54:51Z

Added a license Unicode-3.0 to passe the cargo deny check step

puffyCid · 2024-11-22T05:54:28Z

@jrouaix PR looks good, i like the nom tuple idea.

Only comment/question i have is the anyhow dev dependency. You mention no panicable code even in tests. However, due to the complexity of the log format i am inclined to prefer panics in tests, to catch any possible future errors/issues from updates.

I’ve generally try to reference this blog when determining when to use unwrap.

if u think anyhow will help make the project more maintainable, im fine with merging/keeping it.
Just curious do you know if there are any blog/reddit posts that talk about using anyhow in tests? Or any Rust applications that show its use? The small handful of Rust projects ive looked at seem to use unwrap in tests

jrouaix · 2024-11-22T09:54:58Z

Hi @puffyCid

About anyhow, I think I didn't explain myself well. It's mainly a code style preference.

The way rust check tests completion is through panics & error results too, both tests will fail.

In the example below, both test1 and test2 will fail because assert will panic.
If the_answer returns an error, both will fail too :

test1 will panic on unwrap and lead you to the exact line number
test2 will return an error and fail the test (you'll have to find the exact line, but usually a test function is super short)

  fn the_answer() -> Result<u32, MyLibError> {
    Ok(42)
  }

  #[test]
  fn test1() {
    let forty_two = the_answer().unwrap();
    assert_eq!(forty_two, 43);
  }

  #[test]
  fn test2() -> anyhow::Result<()> {
    let forty_two = the_answer()?;
    assert_eq!(forty_two, 43);
    Ok(())
  }

Personally I tend to consider unwrap & expect the same way I see unsafe : avoid until you can explain why it's way better.

So for instance :

static RE_EMAIL: once_cell::sync::Lazy<Regex> = once_cell::sync::Lazy::new(|| Regex::new(EMAIL_REGEX).expect("Regex should be valid"));

is actual production code because :

There is no way our unit tests won't catch the Regex creation failing
I really don't want to have handle this regex compilation failing at runtime, it's a non sens, it would definitively be UB.

In the end it's just me preferring chaining faillible function calls with ? rather than .unwrap().

at the cost of declaring tests with -> anyhow::Result<()> and ok wrapping on the last test line : Ok(())
that I prefer than having a brain alert when I see .unwrap() everywhere and then I see it is test code so it's fine.

Feel free to decide what style you like and i'll adapt this PR and future ones accordingly.

puffyCid · 2024-11-23T00:38:52Z

Hi @puffyCid

About anyhow, I think I didn't explain myself well. It's mainly a code style preference.

The way rust check tests completion is through panics & error results too, both tests will fail.

In the example below, both test1 and test2 will fail because assert will panic. If the_answer returns an error, both will fail too :
* `test1` will panic on `unwrap` and lead you to the exact line number

* `test2` will return an error and fail the test (you'll have to find the exact line, but usually a test function is super short)
  fn the_answer() -> Result<u32, MyLibError> {
    Ok(42)
  }

  #[test]
  fn test1() {
    let forty_two = the_answer().unwrap();
    assert_eq!(forty_two, 43);
  }

  #[test]
  fn test2() -> anyhow::Result<()> {
    let forty_two = the_answer()?;
    assert_eq!(forty_two, 43);
    Ok(())
  }
Personally I tend to consider unwrap & expect the same way I see unsafe : avoid until you can explain why it's way better.

So for instance :
static RE_EMAIL: once_cell::sync::Lazy<Regex> = once_cell::sync::Lazy::new(|| Regex::new(EMAIL_REGEX).expect("Regex should be valid"));
is actual production code because :
* There is `no way` our unit tests won't catch the Regex creation failing

* I really don't want to have handle this regex compilation failing at runtime, it's a non sens, it would definitively be UB.
In the end it's just me preferring chaining faillible function calls with ? rather than .unwrap().
* at the cost of declaring tests with `-> anyhow::Result<()>` and ok wrapping on the last test line : `Ok(())`

* that I prefer than having a brain alert when I see `.unwrap()` everywhere and then I see it is test code so it's fine.
Feel free to decide what style you like and i'll adapt this PR and future ones accordingly.

understood, thanks for the explanation.

shorter logpreamble parse implementation by more nom combinators usage

51e6026

jrouaix mentioned this pull request Nov 21, 2024

oxidize part 1 ... an Intro to an ambitious vNext ? #22

Draft

added Unicode-3.0 license allowed

ef252b1

puffyCid merged commit 7fd1d1a into mandiant:main Nov 23, 2024
5 checks passed

jrouaix deleted the oxidized_preamble_parsing branch November 25, 2024 09:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

shorter logpreamble parse implementation by more nom combinators usage #33

shorter logpreamble parse implementation by more nom combinators usage #33

jrouaix commented Nov 21, 2024 •

edited

Loading

jrouaix commented Nov 21, 2024 •

edited

Loading

puffyCid commented Nov 22, 2024 •

edited

Loading

jrouaix commented Nov 22, 2024

puffyCid commented Nov 23, 2024

shorter logpreamble parse implementation by more nom combinators usage #33

shorter logpreamble parse implementation by more nom combinators usage #33

Conversation

jrouaix commented Nov 21, 2024 • edited Loading

jrouaix commented Nov 21, 2024 • edited Loading

puffyCid commented Nov 22, 2024 • edited Loading

jrouaix commented Nov 22, 2024

puffyCid commented Nov 23, 2024

jrouaix commented Nov 21, 2024 •

edited

Loading

jrouaix commented Nov 21, 2024 •

edited

Loading

puffyCid commented Nov 22, 2024 •

edited

Loading