-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend the authlog / secure log plugin #342
Extend the authlog / secure log plugin #342
Conversation
Extend code for Issue fox-it#286
_TS_REGEX = r"^[A-Za-z]{3}\s*[0-9]{1,2}\s[0-9]{1,2}:[0-9]{2}:[0-9]{2}" | ||
RE_TS = re.compile(_TS_REGEX) | ||
RE_TS_AND_HOSTNAME = re.compile(_TS_REGEX + r"\s\S+\s") | ||
|
||
|
||
# New regex pattern for getting the log entries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add some tests for these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add some tests for this scenario.
Should i place this tests into tests/data/unix/logs/auth/auth.log (and zips) and tests/test_plugins_os_unix_log_auth.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the test data: tests/data/plugins/os/unix/log/auth/auth.log
would make more sense on how it is currently structured in the tests.
The test file path is fine :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see no tests have been added yet. Just to confirm tests/data/plugins/os/unix/auth/auth.log
is a perfect location for the test data.
data.update(self.apply_regex_on_message(RE_SSH_ACCEPTED_PASSWORD, log_entry["message"])) | ||
data.update(self.apply_regex_on_message(RE_SSH_ACCEPTED_PUBLICKEY, log_entry["message"])) | ||
data.update(self.apply_regex_on_message(RE_SSH_PAM_UNIX, log_entry["message"])) | ||
data.update(self.apply_regex_on_message(RE_SSH_CONNECTION, log_entry["message"])) | ||
data.update(self.apply_regex_on_message(RE_SUDO_COMMAND, log_entry["message"])) | ||
data.update(self.apply_regex_on_message(RE_CRON_PAM_UNIX, log_entry["message"])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of always doing reg expressions on the log_entry, would it not be better if you did some short per-processing first?
e.g. Check if the message starts with some unique identifiers.
Basically create a selector for the regex you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've try this with a loop and a break. It is not ideal but bettern than before. I will think about the identifier solution. But with the "pam unix" entries it will be a little bit difficult.
|
||
# New regex pattern for interpreting the SSH message "Accepted password" | ||
# Mar 29 10:43:01 my_unix_host sshd[1193]: Accepted password for test_user from 127.0.0.1 port 52942 ssh2 | ||
RE_SSH_ACCEPTED_PASSWORD = re.compile(r'^Accepted\spassword\sfor\s(?P<user>[\S\_]+)\sfrom\s(?P<remoteip>[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}|[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4})\sport\s(?P<port>[0-9]{1,5})\s(?P<authservice>\w+)$') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see some duplicate parts inside your regular expressions. Could you make it more concise and consistent?
For example I do see that the <user>
information is used in a lot of regular expressions, however they are all different, it might be nice to make them consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Accepted password and Accepted Publickey, both messages handle the same user. If I pump the data into a eks or somthing else, the keys could be filtered on all entries/entities. So i can match the User activity to other data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean more, that in this case you have made a lot of duplicate regexes. Another example would be the ip addresses regex that are duplicated everywhere.
It might also be an idea to split the regex over multiple lines. So instead of:
re.compile('^Accepted\spassword\sfor\s(?P<user>[\S\_]+)\sfrom\s(?P<remoteip>[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}|[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4})\sport\s(?P<port>[0-9]{1,5})\s(?P<authservice>\w+)$')
you can do:
re.compile(
'^Accepted\spassword\sfor\s' # Check for correct preamble
'(?P<user>[\S\_]+)\sfrom\s' # Retrieve user
'(?P<remoteip>[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}|[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4})'
'\sport\s(?P<port>[0-9]{1,5})\s(?P<authservice>\w+)$'
)
which makes it a bit easier to read and comment too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@damasch My apologies for the very long wait, I will pick this up right now and give a more in depth review today.
In addition I will do some more local tests with different distributions to see whether it works well.
|
||
# New regex pattern for interpreting the SSH message "Accepted password" | ||
# Mar 29 10:43:01 my_unix_host sshd[1193]: Accepted password for test_user from 127.0.0.1 port 52942 ssh2 | ||
RE_SSH_ACCEPTED_PASSWORD = re.compile(r'^Accepted\spassword\sfor\s(?P<user>[\S\_]+)\sfrom\s(?P<remoteip>[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}|[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4})\sport\s(?P<port>[0-9]{1,5})\s(?P<authservice>\w+)$') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean more, that in this case you have made a lot of duplicate regexes. Another example would be the ip addresses regex that are duplicated everywhere.
It might also be an idea to split the regex over multiple lines. So instead of:
re.compile('^Accepted\spassword\sfor\s(?P<user>[\S\_]+)\sfrom\s(?P<remoteip>[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}|[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4})\sport\s(?P<port>[0-9]{1,5})\s(?P<authservice>\w+)$')
you can do:
re.compile(
'^Accepted\spassword\sfor\s' # Check for correct preamble
'(?P<user>[\S\_]+)\sfrom\s' # Retrieve user
'(?P<remoteip>[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}|[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4})'
'\sport\s(?P<port>[0-9]{1,5})\s(?P<authservice>\w+)$'
)
which makes it a bit easier to read and comment too
_TS_REGEX = r"^[A-Za-z]{3}\s*[0-9]{1,2}\s[0-9]{1,2}:[0-9]{2}:[0-9]{2}" | ||
RE_TS = re.compile(_TS_REGEX) | ||
RE_TS_AND_HOSTNAME = re.compile(_TS_REGEX + r"\s\S+\s") | ||
|
||
|
||
# New regex pattern for getting the log entries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the test data: tests/data/plugins/os/unix/log/auth/auth.log
would make more sense on how it is currently structured in the tests.
The test file path is fine :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I see some redundant definitions, for example, all the PAM
regexes are pretty much the same. (besides the service it starts with) So i recommend unifying that information.
It might also be an idea to try and detect what you are working with dependent on:
service
and the start of the message
so you can select which regex to use that way. Instead of trying them all out.
But before doing all of that, I would recommend adding your tests first, so it becomes easier to refactor for you :)
("string", "userid"), | ||
("string", "useridassociate"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
("string", "userid"), | |
("string", "useridassociate"), | |
("string", "uid"), | |
("string", "associate_uid"), |
wouldn't this make it more readable?
'ts': ts, | ||
} | ||
|
||
log_entry = re.match(RE_ENTRY, line) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a regex that also gets the message, would a line.split(":", 1)
not also work here?
As far as I can see, the :
is a constant that splits the service
and process_id
from the message
then it would look a bit like this:
prefix, message = line.split(":", 1)
# get service and process_id
log_entry = re.match(RE_ENTRY, prefix)
...
# Do other stuff with message here
|
||
for regex in RE_LIST: | ||
matches = self._apply_regex_on_message(regex, log_entry["message"]) | ||
if matches != {}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can but the accessing of message before the for loop, so it doesn't do it every iteration
And using the :=
in if conditions will tell it to only go into that condition if matches
is a not None
or "empty" such in the case of an empty dict.
for regex in RE_LIST: | |
matches = self._apply_regex_on_message(regex, log_entry["message"]) | |
if matches != {}: | |
message = log_entry["message"] | |
for regex in RE_LIST: | |
if matches := self._apply_regex_on_message(regex, message): |
# New regex pattern for interpreting the session message "cron" | ||
# Mar 29 17:07:25 my_unix_host sshd[4651]: pam_unix(sshd:session): session opened for user ubuntu by (uid=0) | ||
RE_CRON_PAM_UNIX = re.compile(r'^pam_unix\(cron:(?P<cron>.*)\): +session +(?P<sessionmode>closed|opened)\sfor\suser\s(?P<user>\w+)(?:\(uid=(?P<useridassociate>\w+)\))?(?:\sby\s)?(?:\(uid=(?P<userid>\w+)\))?$') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this one not redundant? I see it is very specific for cron
, but basically does the same as RE_SSH_PAM_UNIX
.
I'd suggest changing it to RE_PAM_UNIX
with r"^pam_unix\(?(P<auth_service>\w+):(P<service_call>\w+)\):"
or something similar.
Then it is more flexible in regards of any future pam_unix
information.
|
||
# New regex pattern for interpreting the connection mode on close message "ssh" | ||
# Mar 29 17:07:19 my_unix_host sshd[4649]: Connection closed by 85.245.107.41 port 54790 [preauth] | ||
RE_SSH_CONNECTION = re.compile(r'^Connection\s(?P<conectionmode>closed)\sby\s(?P<remoteip>[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}|[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4}:[0-9a-zA-Z]{0,4})\sport\s(?P<port>[0-9]{1,5})$(?P<misc>.*)$') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is $(?P<misc>.*)$
even used? as I recall that $
defines then end of the line if I remember correctly.
|
||
# New regex pattern for interpreting the PAM UNIX message "ssh" | ||
# Jul 5 13:20:15 test-VirtualBox sudo: pam_unix(sudo:session): session opened for user root(uid=0) by test(uid=0) | ||
RE_SSH_PAM_UNIX = re.compile(r'pam_unix\((?P<authservice>sshd):.*\): +session +(?P<sessionmode>closed|opened)\sfor\suser\s(?P<user>\w+)(?:\(uid=(?P<useridassociate>\w+)\))?(?:\sby\s)?(?:\(uid=(?P<userid>\w+)\))?$') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment about the pam for the ssh
here.
_TS_REGEX = r"^[A-Za-z]{3}\s*[0-9]{1,2}\s[0-9]{1,2}:[0-9]{2}:[0-9]{2}" | ||
RE_TS = re.compile(_TS_REGEX) | ||
RE_TS_AND_HOSTNAME = re.compile(_TS_REGEX + r"\s\S+\s") | ||
|
||
|
||
# New regex pattern for getting the log entries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see no tests have been added yet. Just to confirm tests/data/plugins/os/unix/auth/auth.log
is a perfect location for the test data.
It's been a infinity since I can work for this project. I have to apologize for this very long delay. Are you interested in a remote web talk about this request? Maybe we can organize a talk about this. It is ok for you? Thanks |
@damasch thank you for your contribution! As this is your first code contribution, please read the following Contributor License Agreement (CLA). If you agree with the CLA, please reply with the following information:
Contributor License Agreement
Contribution License AgreementThis Contribution License Agreement ("Agreement") governs your Contribution(s) (as defined below) and conveys certain license rights to Fox-IT B.V. ("Fox-IT") for your Contribution(s) to Fox-IT"s open source Dissect project. This Agreement covers any and all Contributions that you ("You" or "Your"), now or in the future, Submit (as defined below) to this project. This Agreement is between Fox-IT B.V. and You and takes effect when you click an “I Accept” button, check box presented with these terms, otherwise accept these terms or, if earlier, when You Submit a Contribution.
|
@damasch it seems our bot missed you when asking for a CLA.
As for your question, we can set something up to elaborate on the suggested changes. If you send us your email on [email protected] we'll schedule a call |
What is the status of this PR? |
This PR has been a bit back and forth unfortunately. @damasch, if you're still willing to accept the CLA, we can take it from here and process the suggestions ourselves. |
Hello @damasch, we haven't heard a response from you. Could you still sign the CLA? Otherwise, we will close this pr in 2 weeks due to inactivity. |
And re-implement it ourselves? |
yes |
Closed due to inactivity |
Extend the authlog / secure log plugin
#286