-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
B #6759: finalize the SG rules on sg rule error #6760
Conversation
instead of reseting the rules. Open Nebula will raise and log the rule error for further investigation, but the VM will not be left unprotected.
So the original code cleans up everything in case of error, the new patch just closes the processing but the chains can be inconsistent or partial. So probably it is better to keep the original behavior? |
Also note that for a VM with multiple NICs only some will be processed en case of error, so the others will be left unprotected. So yes probably having a deterministic state for the error condition is better |
Hello @rsmontero , in short, the current behavior is, when the administrator trusts OpenNebula to have the VM protected with SG and there is an error introduced in the SG, the VM is left entirely unprotected. Such VMs are easily discovered via services/ports that the administrator "believes" are blocked. So the patch tries to make OpenNebula better at protecting the VM when the SG is broken. I missed the case when there are multiple nics, because the customer had single NIC VM hacked due to SG not applied. I'll try to find time to fix this case. probably stash the error and raise it after the loop so the driver will send a note to opennebula about the issues. On SG error I believe it is better to set the VM "confined" instead of open to the internet. This way, enduser, monitoring, whatever will raise the issue of missing service that needs to be fixed which sounds much better than having a hacked VM due to SG error. |
Problem is that the rules will be semi applied, which means any subsequent update (e.g. in SG ) maybe broken as well or could not be recovered. So I still think that if the SG groups cannot be applied it is better to not apply some. The admin will not have any clue which port is protected or not. It may or may not be opened to the Internet, depending on the rule that failed. So the patch it self does not guarantee any protection, even worse it may lead to a false sense of security. In case of error it should be communicated maybe with a failed deployed if you will or in Sunstone with a warning symbol... etc. |
Let's clarify how SG works so we are on the same page. When an SG change occurs, all iptables rules are dropped and rebuilt following the logic encoded in the SG rules. Also the SG definitions whitelist what should be permitted, assuming everything else is considered blocked—that is the last "system" rule applied at the end of the SG. So, there is no unrecoverable state as everything is flushed...
Well isn't the current situation worse - it is guaranteed that in case of an SG error, the VM is left with at least one NIC entirely open to the internet?
I agree that it is clear that the patch does not consider the multiple NICs case and needs improvement to address this case. But that does not change the spirit of the patch which is my understanding of 'the false sense of security' - Leaving the VM open and vulnerable to the internet (what the ONE currently do!) instead of "locked down" in case of an error in the firewall. |
Yes, let's look into the SG. The drivers generate a set of iptables/ipset commands to implement the defined SG rules. Those commands are then executed, if one of them failed exception is raised. So, if we ignore this and leave the system with some of the rules applied it is clearly a worst situation that removing all the rules and role back:
So IMHO idempotence here is more important, and hence this patch is in the wrong direction We need to focus on the actual source of the problem, the error. Right now How is communicated to the admin, Can we improve that? Leaving the hypervisor in an inconsistent/non-deterministic state is simply not the solution for this. |
try SG on all nick before raising an error
So
Sorry but for me this is the expected behavior EDIT: extended point 2, and why this patch is not valid |
Just as a side note, all your scenario works because the wrong rule was the last one, if it was the first you will end up with an empty iptables set. The PR needs to be reformulated to validate the rules. (sorry hit the submit button before adding this) |
Updated #6759 with the right approach to this. TL;DR: The "bug" is actually in SecurityGroup::is_valid() . This needs to be extended when first port in range is greater than last, the range needs to be properly sorted. |
Sigh... First,
This assumption is not correct :) There will be Second,
True, if after editing the SG rules someone goes to the VMs list and looks for the warnings... and Third,
This is not the only case of erroneous configuration... p.s. |
So sorry, but you have to install like tens of VM any can fail, so yes ignoring the error simply DOES NOT guarantee any security As you see, we read iptables documentation and we do A LOT of validations in code, to inform the user. We are missing the multiport order. So the conclusion is the same, this PR DOES NOT solve any security issue and the proper solution is to validate the rules beforehand. |
Hi @rsmontero, I agree that this patch is not the complete solution for the issue. What I believe is, that it is an additional layer, a safeguard, against the case for the running VMs. It is an improvement of the current OpenNebula behaviour. And not affecting the normal and expected OpenNebula operation.
Here is the initial PR post: "Open Nebula will raise and log the rule error for further investigation, but the VM will not be left unprotected." I've never said or written that any error should be ignored. If such an assumption exists, it is probably due to translation, as I am not a native English speaker. The patch does not remove the The patch is about the impact in the case of error. It mitigates the effects if there is an erroneous case slipped through the rules validation. A VM will not start if there is an error. We are on common ground here. The user is protected from starting a vulnerable VM. The patch addresses the already running VMs, when the SG rule changed. In other words, we are discussing the case with the running VMs that are hit with the broken SG rule... about the time frame from the moment the error is introduced, until the issue is recognized and fixed by the operator. It could be a matter of minutes up to days, depending on various factors. I hope we can agree that in that time window, the running VM are currently left unprotected with no iptables applied... So the patch is doing the alternative to "powering off" the affected VMs - it is issuing the finalizing part of the SG rule algorithm and closing the gate to the open internet. ( Regarding the iptables docs... please put my shoes on and look what I see - you can set an ICMP rule with a port(range) for example and it will pass the validation leading to an error raised by the iptables... So it is not the port range only... I realize that it is a big, enormous work to keep in sync a validation of something outside your control, like any change in the iptables tools. I am trying to point out that there are more cases that should be covered by the validation. Also, I understand that such a validation could have bugs, that should be fixed. With Best Regards and no bad feelings, |
instead of reseting the rules.
Description
Open Nebula will raise and log the rule error for further investigation, but the VM will not be left unprotected.
Branches to which this PR applies