Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: JSON/YML multi-pattern input and result output #76

Open
LaurensBrinker opened this issue Nov 14, 2022 · 1 comment
Open

Comments

@LaurensBrinker
Copy link

I'm quite new to Weggli, but as far as I can tell, it currently does not support providing input and output files. And each rule pattern check requires a separate execution of Weggli.

For context - I've been playing around with Semgrep, which allows you to specify a patterns yml file with multiple patterns to check, and can output the findings to a .json files for easy parsing. Keen to hear thoughts, but it would be nice if Weggli could support something like this:

Provide patterns.yml file containing multiple patterns like this:

  - id: double-free
    metadata:
      references:
        - https://cwe.mitre.org/data/definitions/415
        - https://github.com/struct/mms
        - https://www.sei.cmu.edu/downloads/sei-cert-c-coding-standard-2016-v01.pdf
        - https://docs.microsoft.com/en-us/cpp/sanitizers/asan-error-examples
        - https://dustri.org/b/playing-with-weggli.html
      confidence: MEDIUM
    message: >-
      The software calls free() twice on the same memory address,
      potentially leading to modification of unexpected memory locations.
    severity: ERROR
    languages:
      - c
      - cpp
    pattern: "{free($a); NOT: goto _; NOT: break; NOT: continue; NOT: $a = _; free($a);}" 
    extra_args:
      - "--unique"
  
  - id: uninit-pointers
   .....

Run something like weggli --input /path/to/patterns.yml --output /path/to/results.json /path/to/codebase
Where Weggli will run all patterns on a specified codebase (if possible), and e.g. generate a json output file which looks something like this:

{
  "errors": []
  "results: [{
      "id": "double-free",
      "start": { "col": 10, "line": 42, "offset": 701 },
      "end": { "col": 25, "line": 42, "offset": 716 },
      "extra": {
        "fingerprint": "79965871385669e43",
        "is_ignored": false,
        "lines": "  ... 
                int alloc_and_free2()
                {
                    char *ptr = (char *)malloc(MEMSIZE);
                    free(ptr);
                    ptr = NULL;
                    free(ptr);
                }
                ....",
        "message": "The software calls free() twice on the same memory address, potentially leading to modification of unexpected memory locations.",
        "metadata": {
          "confidence": "HIGH",
          "references": [
              - https://cwe.mitre.org/data/definitions/415
              - https://github.com/struct/mms
              - https://www.sei.cmu.edu/downloads/sei-cert-c-coding-standard-2016-v01.pdf
              - https://docs.microsoft.com/en-us/cpp/sanitizers/asan-error-examples
              - https://dustri.org/b/playing-with-weggli.html
          ]
        },
        "metavars": {},
        "severity": "ERROR"
      },
      "path": "test-data/sample_inputs/c-and-cpp/double-free.c"
   }]
}

Again - I know that Weggli doesn't support this kind of behavior atm and that it runs for each individual pattern (afaik, specifying additional patterns with -p is an "AND", rather than an "OR"). But just wanted to see if this is something that has been considered already?

@LordCasser
Copy link

I add this feature in my fork, maybe you want to try it.
weggli-enhance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants