-
-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON lines aggregate results #126
Conversation
@@ -122,7 +122,7 @@ func TestJSONOutput(t *testing.T) { | |||
filetype: filetype, | |||
errors: errors, | |||
infos: infos, | |||
want: `{"url":"http://test.com.pdf?id=5","method":"GET","status_code":200,"words":1,"lines":1,"content_type":"application/pdf","content_length":128,"matches":{"filetype":{"extension":"pdf","severity":7},"parameters":[{"name":"id","attacks":[]}],"errors":[{"name":"MySQL error","match":"it is a MySQL error happening"}],"infos":[{"name":"info1","match":"its my pleasure to inform you on this great day"}],"secrets":[{"name":"mysecret","match":"it's a random day for my secret regex to be found"}]}}`, //nolint:lll | |||
want: `{"url":"http://test.com.pdf?id=5","method":"GET","status_code":200,"words":1,"lines":1,"content_type":"application/pdf","content_length":128,"matches":{"filetype":{"extension":"pdf","severity":7},"parameters":[{"name":"id","attacks":[]}],"errors":{"MySQL error":["it is a MySQL error happening"]},"infos":{"info1":["its my pleasure to inform you on this great day"]},"secrets":{"mysecret":["it's a random day for my secret regex to be found"]}}}`, //nolint:lll |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo this new format is a bit harder to parse than the previous one, for instance with jq
you would need to know the error name in advance, or make 2 different calls to parse it. It also prevents adding more data later, which relates to my other comment about MatcherResult
struct.
That said, the general idea of combining multiple secrets of the same type is good.
Maybe something like:
"secrets":[
{
"type":"Regex",
"results": [
{"match": "it's a random day for my secret regex to be found", "source": "body", "location": "line 42"}
]
}
]
would be better, while still doing some grouping
type MatcherResult struct { | ||
Name string `json:"name"` | ||
Match string `json:"match"` | ||
Errors map[string][]string `json:"errors,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep the previous MatcherResult
struct for the output, so that we can evolve it later. Maybe there will be more keys than 'Name' and 'Match' later on, and we could add them here. For instance we could add details for each match (which line it was matched at, or the location it was found at (HTML body, URL, parameters,...).
Thanks for the advice :)) appreciated!! |
Adding another two cents from my usage of cariddi recently: I came across huge matches like: [
{
"name":"PHP error",
"match":"PHP error"
},
{
"name":"MySQL error",
"match":"warning_forbid_default_priv"<MORE THAN 20000 LINES HERE>"
}
] which completely destroy my terminal 😄 So we might think about either:
We could end up with a JSON format like: [
{
"name": "MySQL Error",
"results": [
{
"type": "Regex",
"details": {"match": "Warning: ...<truncated_output>mysqli error: need new cache refresh... <truncated_output>", "regex": "(?i)Warning.*?mysqli?", "location": "line 42", "source": "body"}
}
]
}
] Additionally, regexes have their limits - ideally we want to see one step further and create some kind of pattern-recognition algorithms, or using even using ML for this kind of tasks. It could be a good evolution for cariddi ;) The [
{
"type": "PatternFinder",
"details": {"match": "Warning: ...<truncated_output>mysqli error: need new cache refresh... <truncated_output>", "matcher": "error-finder", "version": "2.0.1"}
},
{
"type": "ML",
"details": {"model_name": "my-awesome-ml-model", "version": "0.0.1"}
}
] There is also room to improve the findings by filtering which ones are found important or not, for instance:
Those "rules" could be first hardcoded by us on a case-by-case and then learned by ML as well at some point, and a There might be a need to create separate issues for some of those points since it's not directly linked to the JSON lines aggregation. Feel free to copy-paste some of my comments there. |
Hi @ocervell . I've thought a lil bit before commenting on this. Imo the best thing to do is this:
|
This PR closes #115.
@ocervell what do u think?
This is a comparison test with the one shown in the issue: