B #6759: finalize the SG rules on sg rule error #6760

atodorov-storpool · 2024-10-22T09:12:40Z

instead of reseting the rules.

Description

Open Nebula will raise and log the rule error for further investigation, but the VM will not be left unprotected.

Branches to which this PR applies

master
one-6.10

Check this if this PR should not be squashed

instead of reseting the rules. Open Nebula will raise and log the rule error for further investigation, but the VM will not be left unprotected.

rsmontero · 2024-11-20T15:46:16Z

Hi @atodorov-storpool

So the original code cleans up everything in case of error, the new patch just closes the processing but the chains can be inconsistent or partial.

So probably it is better to keep the original behavior?

rsmontero · 2024-11-20T15:48:38Z

Also note that for a VM with multiple NICs only some will be processed en case of error, so the others will be left unprotected.

So yes probably having a deterministic state for the error condition is better

atodorov-storpool · 2024-11-22T14:04:35Z

Hello @rsmontero ,
I wrote an answer yesterday but not submitted and got lost (obviously)

in short, the current behavior is, when the administrator trusts OpenNebula to have the VM protected with SG and there is an error introduced in the SG, the VM is left entirely unprotected. Such VMs are easily discovered via services/ports that the administrator "believes" are blocked.

So the patch tries to make OpenNebula better at protecting the VM when the SG is broken.

I missed the case when there are multiple nics, because the customer had single NIC VM hacked due to SG not applied. I'll try to find time to fix this case. probably stash the error and raise it after the loop so the driver will send a note to opennebula about the issues.

On SG error I believe it is better to set the VM "confined" instead of open to the internet. This way, enduser, monitoring, whatever will raise the issue of missing service that needs to be fixed which sounds much better than having a hacked VM due to SG error.

rsmontero · 2024-11-25T18:34:03Z

Problem is that the rules will be semi applied, which means any subsequent update (e.g. in SG ) maybe broken as well or could not be recovered.

So I still think that if the SG groups cannot be applied it is better to not apply some. The admin will not have any clue which port is protected or not. It may or may not be opened to the Internet, depending on the rule that failed. So the patch it self does not guarantee any protection, even worse it may lead to a false sense of security.

In case of error it should be communicated maybe with a failed deployed if you will or in Sunstone with a warning symbol... etc.

atodorov-storpool · 2024-11-26T07:39:35Z

Problem is that the rules will be semi applied, which means any subsequent update (e.g. in SG ) maybe broken as well or could not be recovered.

Let's clarify how SG works so we are on the same page. When an SG change occurs, all iptables rules are dropped and rebuilt following the logic encoded in the SG rules. Also the SG definitions whitelist what should be permitted, assuming everything else is considered blocked—that is the last "system" rule applied at the end of the SG. So, there is no unrecoverable state as everything is flushed...

So I still think that if the SG groups cannot be applied it is better to not apply some. The admin will not have any clue which port is protected or not. It may or may not be opened to the Internet, depending on the rule that failed.

Well isn't the current situation worse - it is guaranteed that in case of an SG error, the VM is left with at least one NIC entirely open to the internet?

So the patch it self does not guarantee any protection, even worse it may lead to a false sense of security.

I agree that it is clear that the patch does not consider the multiple NICs case and needs improvement to address this case. But that does not change the spirit of the patch which is my understanding of 'the false sense of security' - Leaving the VM open and vulnerable to the internet (what the ONE currently do!) instead of "locked down" in case of an error in the firewall.

rsmontero · 2024-11-26T08:17:22Z

Yes, let's look into the SG. The drivers generate a set of iptables/ipset commands to implement the defined SG rules. Those commands are then executed, if one of them failed exception is raised.

So, if we ignore this and leave the system with some of the rules applied it is clearly a worst situation that removing all the rules and role back:

You don't know which rules where applied so the this patch simply does not prevent leaving a VM "vulnerable to the internet", it may or may not depending on the failure.
Any manual recovery process is harder as the admin news to manually rollback the system and fix whatever

So IMHO idempotence here is more important, and hence this patch is in the wrong direction

We need to focus on the actual source of the problem, the error. Right now How is communicated to the admin, Can we improve that? Leaving the hypervisor in an inconsistent/non-deterministic state is simply not the solution for this.

try SG on all nick before raising an error

atodorov-storpool · 2024-11-26T11:47:10Z

Lets try with examples:

Initial state, the VM has 2x NICs and the following SG

and the following iptables:

An operator adds port range 80..443 (but the wrong way...)

After hitting the green Update button

The VM have the following ipptables rules as a "Security Groups applied"...
That is a false sense of security for me

Here the support have a call from an angry client because it's VM got hacked...
The administrator goes to the VM list and see that indeed there is an error?!? (what a surprise!)

##########################################
Here is the same scenario but with the patch v2 applied (just added to this PR)
##########################################

Here the operator has a call from the customer because a service is not working. (which is slightly different than a compromised VM...)

The error is clearly visible in the VM log (nf_tables): invalid portrange specified ... Command Error: sudo -n iptables -w 3 -W 20000 -A one-1-0-i -p tcp -m multiport --dports 443:80 -j RETURN

And here are the iptables rules on the host, as one can see the SG are not leaving the VM unprotected.

There is no impact on the ipset rules, no "unfixable" state, and no probably secured or probably not state... Just a VM with all vulnerable holes(ports) closed.

As could be seen there error message is there, the administrator could take actions. Are the SGs not completely applied - Yes, no argue here. BUT the VM is not left running completely open to the internet.
(As I like to say it is a devil state, yes, but with the small devil/trouble at the head...)

rsmontero · 2024-11-26T14:55:56Z

So

The VM has a clear warning sign
If the VM get this error on poweroff the VM is not allowed to boot and leaves the host in a clean state, with this patch it is not

Sorry but for me this is the expected behavior

EDIT: extended point 2, and why this patch is not valid

rsmontero · 2024-11-26T15:00:13Z

Just as a side note, all your scenario works because the wrong rule was the last one, if it was the first you will end up with an empty iptables set.

The PR needs to be reformulated to validate the rules.

(sorry hit the submit button before adding this)

rsmontero · 2024-11-26T15:11:10Z

Updated #6759 with the right approach to this.

TL;DR:

The "bug" is actually in SecurityGroup::is_valid() . This needs to be extended when first port in range is greater than last, the range needs to be properly sorted.

atodorov-storpool · 2024-11-26T15:34:33Z

Sigh...
Fine, just a three final notes on my side, though

First,

Just as a side note, all your scenario works because the wrong rule was the last one, if it was the first you will end up with an empty iptables set.

This assumption is not correct :) There will be -А one-1-0-i -j DROP iptables rule,effectively locking up the VM instead of leaving it open and unprotected to the internet...

Second,

The VM has a clear warning sign

True, if after editing the SG rules someone goes to the VMs list and looks for the warnings...

and Third,

This needs to be extended when first port in range is greater than last, the range needs to be properly sorted.

This is not the only case of erroneous configuration...

p.s.
Why someone do not just spin up a testing VM and throw some configurations or read the iptables man page?
(joker: there are no ports accepted when defining an ICMP rule, for example...)

rsmontero · 2024-11-27T10:46:33Z

So sorry, but you have to install like tens of VM any can fail, so yes ignoring the error simply DOES NOT guarantee any security

As you see, we read iptables documentation and we do A LOT of validations in code, to inform the user. We are missing the multiport order.

So the conclusion is the same, this PR DOES NOT solve any security issue and the proper solution is to validate the rules beforehand.

atodorov-storpool · 2024-11-27T12:30:22Z

Hi @rsmontero,
Re-reading my last comment now I realise that it sounds too harsh. My apologies! (Noted to not write comments at the end of the workday...)

I agree that this patch is not the complete solution for the issue. What I believe is, that it is an additional layer, a safeguard, against the case for the running VMs. It is an improvement of the current OpenNebula behaviour. And not affecting the normal and expected OpenNebula operation.

So sorry, but you have to install like tens of VM any can fail, so yes ignoring the error simply DOES NOT guarantee any security

Here is the initial PR post: "Open Nebula will raise and log the rule error for further investigation, but the VM will not be left unprotected."

I've never said or written that any error should be ignored. If such an assumption exists, it is probably due to translation, as I am not a native English speaker. The patch does not remove the rise error part of the logic. The error is raised so as to be "heared" by the opennebula service. It is not about replacing the input validation. It does not silence the error. Definitely, every error should be investigated in and timely fixed.

The patch is about the impact in the case of error. It mitigates the effects if there is an erroneous case slipped through the rules validation.

A VM will not start if there is an error. We are on common ground here. The user is protected from starting a vulnerable VM.

The patch addresses the already running VMs, when the SG rule changed. In other words, we are discussing the case with the running VMs that are hit with the broken SG rule... about the time frame from the moment the error is introduced, until the issue is recognized and fixed by the operator. It could be a matter of minutes up to days, depending on various factors. I hope we can agree that in that time window, the running VM are currently left unprotected with no iptables applied... So the patch is doing the alternative to "powering off" the affected VMs - it is issuing the finalizing part of the SG rule algorithm and closing the gate to the open internet. (iptables -A <chain> -j DROP).

Regarding the iptables docs... please put my shoes on and look what I see - you can set an ICMP rule with a port(range) for example and it will pass the validation leading to an error raised by the iptables... So it is not the port range only... I realize that it is a big, enormous work to keep in sync a validation of something outside your control, like any change in the iptables tools. I am trying to point out that there are more cases that should be covered by the validation. Also, I understand that such a validation could have bugs, that should be fixed.

With Best Regards and no bad feelings,
Anton Todorov

B OpenNebula#6759: finalise the SG rules on sg rule error

e18477a

instead of reseting the rules. Open Nebula will raise and log the rule error for further investigation, but the VM will not be left unprotected.

tinova linked an issue Oct 22, 2024 that may be closed by this pull request

VM is left unprotected when there is an error in the Security Group rules #6759

Open

3 tasks

rsmontero added this to the Release 6.10.1 milestone Oct 22, 2024

rsmontero added Type: Bug Category: Drivers - Network Sponsored Status: Accepted Priority: Normal labels Oct 22, 2024

rsmontero requested a review from 1gramos October 22, 2024 10:04

atodorov-storpool changed the title ~~B #6759: finalise the SG rules on sg rule error~~ B #6759: finalize the SG rules on sg rule error Oct 22, 2024

1gramos approved these changes Oct 23, 2024

View reviewed changes

dcarracedo modified the milestones: Release 6.10.1, Release 6.10.2 Nov 7, 2024

dcarracedo assigned 1gramos Nov 8, 2024

rsmontero self-requested a review November 11, 2024 19:07

rsmontero mentioned this pull request Nov 22, 2024

VM is left unprotected when there is an error in the Security Group rules #6759

Open

3 tasks

rsmontero added Status: Pending and removed Status: Accepted labels Nov 26, 2024

B OpenNebula#6759: finalise the SG rules on sg rule error (v2)

d1e4734

try SG on all nick before raising an error

rsmontero closed this Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

B #6759: finalize the SG rules on sg rule error #6760

B #6759: finalize the SG rules on sg rule error #6760

atodorov-storpool commented Oct 22, 2024 •

edited

Loading

rsmontero commented Nov 20, 2024

rsmontero commented Nov 20, 2024

atodorov-storpool commented Nov 22, 2024

rsmontero commented Nov 25, 2024 •

edited

Loading

atodorov-storpool commented Nov 26, 2024

rsmontero commented Nov 26, 2024

atodorov-storpool commented Nov 26, 2024

rsmontero commented Nov 26, 2024 •

edited

Loading

rsmontero commented Nov 26, 2024

rsmontero commented Nov 26, 2024

atodorov-storpool commented Nov 26, 2024

rsmontero commented Nov 27, 2024

atodorov-storpool commented Nov 27, 2024

B #6759: finalize the SG rules on sg rule error #6760

B #6759: finalize the SG rules on sg rule error #6760

Conversation

atodorov-storpool commented Oct 22, 2024 • edited Loading

Description

Branches to which this PR applies

rsmontero commented Nov 20, 2024

rsmontero commented Nov 20, 2024

atodorov-storpool commented Nov 22, 2024

rsmontero commented Nov 25, 2024 • edited Loading

atodorov-storpool commented Nov 26, 2024

rsmontero commented Nov 26, 2024

atodorov-storpool commented Nov 26, 2024

rsmontero commented Nov 26, 2024 • edited Loading

rsmontero commented Nov 26, 2024

rsmontero commented Nov 26, 2024

atodorov-storpool commented Nov 26, 2024

rsmontero commented Nov 27, 2024

atodorov-storpool commented Nov 27, 2024

atodorov-storpool commented Oct 22, 2024 •

edited

Loading

rsmontero commented Nov 25, 2024 •

edited

Loading

rsmontero commented Nov 26, 2024 •

edited

Loading