Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional verification of hostnames in TLS certificates #3078

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

jes
Copy link
Contributor

@jes jes commented May 5, 2023

Summary

This PR adds a "verify_hostname" modparam to TLS domains in tls_mgm, which turns on OpenSSL verification of hostnames when connecting to TLS servers. The option is off by default, which leaves the default behaviour unchanged.

Details

Without verifying hostnames, it is almost completely pointless to verify TLS certificates at all*, because anyone can get a valid TLS certificate very easily (e.g. from LetsEncrypt). The point of verifying the hostname is so that you know you're speaking to the correct peer.

Prior to this PR, OpenSIPS did not support verification of hostnames in TLS certificates.

(* unless you are running a custom CA, in which case you can of course make sure that your CA only issues certificates to people you trust)

Solution

The main complication with this was how to let tls_openssl know what hostname it is trying to connect to. Mostly IP addresses were passed around inside OpenSIPS with union sockaddr_unions. I have turned this into struct sockaddr_union_struct, with an extra char hostname[256] field. This means when OpenSIPS is passing IP addresses around, they keep the hostname attached to them.

Compatibility

The main downside is that there is an increase in memory usage of 256 bytes per instance of struct sockaddr_union_struct.

I did not check that all of the modules compile, so it is likely that some modules that I'm not using do not compile any more, so this PR is probably not ready to merge straight away. Where a piece of code tries to access a field of union sockaddr_union, that needs to change to go through the struct (e.g. foo.sin => foo.u.sin). Nothing else needs to change.

If this PR is likely to be accepted, I am happy to go through all the modules and fix them.

Closing issues

Closes #3064.

@jes
Copy link
Contributor Author

jes commented May 5, 2023

It would be really helpful to know where the OpenSIPS project stands on this, for example:

a.) Desirable and acceptable; would merge once the remaining modules are updated

b.) Desirable but not acceptable; the idea is fine in principle but this implementation is bad

c.) Undesirable; OpenSIPS will never verify hostnames

I'm happy to do more work on this if it would help, but if it will not get merged in a million years then we can cut our losses and I'll stop.

@jes
Copy link
Contributor Author

jes commented May 9, 2023

This code still has the bug that OpenSIPS can reuse a connection to a given IP address even if the hostname is different. So if an attacker can cause your server to connect to mybadname.com which resolves to 1.2.3.4, and then your server tries to connect to mygoodname.com which also resolves to 1.2.3.4, it can send it through the existing connection which is encrypted with the mybadname.com certificate.

@bogdan-iancu
Copy link
Member

Thank you @jes for your work here. I would say we are at a (b) right now considering that

  • the verification will be optional, so you can do it in both ways
  • we need to check a bit deeper the implementation alternatives in terms of impacting as few code as possible (ideally avoiding changing the sock_addr union) - but here we need to allocate some time and evaluate alternatives.

But the idea is Desirable, let us evaluate the approach for the implementation

@bogdan-iancu bogdan-iancu self-assigned this May 12, 2023
@jes
Copy link
Contributor Author

jes commented May 12, 2023

OK, that's encouraging, thanks for the update @bogdan-iancu !

This evening I had a thought about an alternative way to do this that does not require modifying everywhere else that touches sockaddr_union.

The original sockaddr_union was:

union sockaddr_union {
    struct sockaddr     s;
    struct sockaddr_in  sin;
    struct sockaddr_in6 sin6;
};

So one option is to add an extra type to this union, whose size is as big as the largest of those (i.e. as big as a sockaddr_in6), plus the char hostname[256].

One way to achieve this is:

union sockaddr_union {
    struct sockaddr     s;
    struct sockaddr_in  sin;
    struct sockaddr_in6 sin6;
    struct {
        union {
            struct sockaddr     s;
            struct sockaddr_in  sin;
            struct sockaddr_in6 sin6;
        } u; /* this field is unused, but just serves to position the hostname field past the end of the sockaddr_in6 */
        char hostname[256];
    } h;
};

So we'd add another member to the union which starts out with its own union that is the same as the other elements (just so that it ends up at the same size), plus it gains a hostname field. So to access the hostname of a sockaddr_union called u, it would just be u.h.hostname, and accessing the other fields is unchanged, and the type name is unchanged.

Or maybe this is more clear:

/* this must have the same size as the largest field of `sockaddr_union` */
union sockaddr_union_no_hostname {
    struct sockaddr     s;
    struct sockaddr_in  sin;
    struct sockaddr_in6 sin6;
};

union sockaddr_union {
    struct sockaddr     s;
    struct sockaddr_in  sin;
    struct sockaddr_in6 sin6;
    struct {
        union sockaddr_union_no_hostname _padding;
        char hostname[256];
    } h;
};

What do you think about this sort of idea?

@sobomax
Copy link
Contributor

sobomax commented May 17, 2023

The idea is good, but the implementation is too broad IMHO. The core function that does translation between hostname and sockaddr_union is hostent2su() and it is only referenced in couple of places. The function can be extended to store reference to a hostname into a struct that wraps sockaddr_union only in places where it's necessary (i.e. in the TLS path) and pass if through the call chain only in those places, not in other 150 places where sockaddr_union happens to be used.

@jes
Copy link
Contributor Author

jes commented May 17, 2023

I agree that would be simpler in principle, the issue is it's not obvious to me how to work out which places need to be able to do TLS and which ones don't. I thought it was more straightforward to apply the change everywhere.

Similarly, I put the char[256] inside the struct, instead of a pointer, because that way you don't need to worry about when the string can be free'd.

But maybe it's easy, and I just don't know the codebase well enough?

@jes
Copy link
Contributor Author

jes commented May 30, 2023

I didn't mean to close this; I've done the same again but now using the idea from this comment: #3078 (comment)

How are we looking now?

@jes jes reopened this May 30, 2023
@jes
Copy link
Contributor Author

jes commented Jul 10, 2023

hostent_shm_cpy didn't previously copy the h_name field, which is needed with the TLS hostname verification code otherwise it segfaults if it tries to access the h_name field of a hostent that was populated by hostent_shm_cpy, so I've added that now.

It still doesn't copy h_aliases which could lead to similar surprises in the future.

Maybe we'd want hostent_shm_cpy to be exactly the same as hostent_cpy in all cases, except for the shm/pkg distinction?

@sobomax
Copy link
Contributor

sobomax commented Jul 11, 2023

I agree that would be simpler in principle, the issue is it's not obvious to me how to work out which places need to be able to do TLS and which ones don't. I thought it was more straightforward to apply the change everywhere.

Similarly, I put the char[256] inside the struct, instead of a pointer, because that way you don't need to worry about when the string can be free'd.

But maybe it's easy, and I just don't know the codebase well enough?

Well, I have a conceptual problem with the approach you are taking: looks like you've started from the wrong end, i.e. you are pushing hostname into each and every name resolution the OpenSIPS performs, just so that you can consume that hostname at one or two very specific places. Having structure that can have different sizes is a Very Bad Idea[tm] as well, and is going to backfire in many places where sizeof() can be using for allocation, copying etc. You are just seeing some of it, much more is probably not visible just yet because we don't have any appreciable amount of code coverage, especially for code in in modules. And as evident from the V1 of the patch, it touches A LOT of that code.

And this is not the question of knowing the codebase well enough. In reality nobody probably does to a smaller or larger extent.

I still don't see much need in showing in hostname into sockaddr_union. Instead I'd suggest approaching it from the other end: i.e. see where do you need that hostname and how you would modify call chain to save the hostname ALONG with the resolved IP just for that specific code flow. Then create structure for that specific use case and push it in places where it really needs to be.

struct sockaddr_hostname {
  struct sockaddr_union addr;
  struct str hostname;
};

Yes, that's probably slightly longer route, but at the end it might be actually faster way for you to get this to the mergeable state.

-Max

@jes
Copy link
Contributor Author

jes commented Jul 11, 2023

I don't understand what you mean about structures that can be different sizes. Is it the sockaddr/sockaddr_in/sockaddr_in6 thing? That's just how sockaddr's work, it's already that way in the existing codebase. If you're talking about the way a hostent was copied with some fields missing if the destination is in shared memory, that's also already that way in the existing codebase. Otherwise, I'm not sure what you are referring to.

save the hostname ALONG with the resolved IP just for that specific code flow

The "specific code flow" is every single place that could send a SIP packet, which is like everywhere. If the SIP packet turns out to be transmitted over TLS, you need the hostname attached to the sockaddr so that you can check the hostname on the certificate. I think it's more complicated and surprising (and more likely to be done incorrectly) if everything that wants to send a SIP packet has to work out whether it is using TLS or not and then behave differently. It makes more sense to me if we just tack the hostname on to the sockaddr in all cases.

@@ -1881,8 +1881,18 @@ struct hostent* sip_resolvehost( str* name, unsigned short* port,
}

he = do_srv_lookup( tmp, port, dn);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jes why don't we pass name->s in there directly (along with the length of the buffer, perhaps)? This way you won't need to memcpy() the result twice, first into tmp[] and then into the name->s.

@sobomax
Copy link
Contributor

sobomax commented Nov 9, 2023

@jes

I don't understand what you mean about structures that can be different sizes. Is it the sockaddr/sockaddr_in/sockaddr_in6 thing? That's just how sockaddr's work, it's already that way in the existing codebase. If you're talking about the way a hostent was copied with some fields missing if the destination is in shared memory, that's also already that way in the existing codebase. Otherwise, I'm not sure what you are referring to.

save the hostname ALONG with the resolved IP just for that specific code flow

The "specific code flow" is every single place that could send a SIP packet, which is like everywhere. If the SIP packet turns out to be transmitted over TLS, you need the hostname attached to the sockaddr so that you can check the hostname on the certificate. I think it's more complicated and surprising (and more likely to be done incorrectly) if everything that wants to send a SIP packet has to work out whether it is using TLS or not and then behave differently. It makes more sense to me if we just tack the hostname on to the sockaddr in all cases.

@jes yes, you are right, sorry for confusion. I checked the latest version of the patch and it looks much saner now. I am still wondering if extra allocation and copying can somehow be only limited to the TCP targets only. Stacking hostname in all cases has obvious disadvantage of increasing CPU usage as well as increasing memory usage, including shared memory, each time sockaddr_union is used or allocated. In 99.99999% of real-use cases that extra information is not going to be used. How about adding a flag into sockaddr_union to indicate whether it has a name recorded, and avoid any extra work if it has not? Then you can check that flag and emit an error in your checking code if somehow that got misused somehow. Then you can turn it on in sip_resolvehost() based on the value of a configuration flag. It would still cost is some CPU usage on that branch, but since it's going to be a constant, the cost should be properly absorbed by the branch prediction logic.

Thanks!

-Max

@sobomax
Copy link
Contributor

sobomax commented Nov 30, 2023

OK, I've bitten the bullet and went to re-implement the proposed change by introducing a new type: "struct sockaddr_hu", which explicitly combines normal sockaddr_union with semi-optional hostname field and propagated it through the stack. Some work in progress is here: master...sippy:opensips:pr_3078, any comments/suggestions are very welcome. Changes are big, but by no means unsurmountable.

@jes
Copy link
Contributor Author

jes commented Dec 1, 2023

@sobomax Nice one, thanks for looking at this.

I just added a commit that fixes a memory leak that I inadvertently introduced in 2efbf64

The problem is that the tm module free's the hostent's fields manually instead of using free_shm_hostent().

I don't know if you'd prefer to make tm use free_shm_hostent() rather than have it free the fields manually, but either way it needs to make sure h_name gets free'd if h_name is getting allocated.

@sobomax
Copy link
Contributor

sobomax commented Dec 3, 2023

@sobomax Nice one, thanks for looking at this.

I just added a commit that fixes a memory leak that I inadvertently introduced in 2efbf64

The problem is that the tm module free's the hostent's fields manually instead of using free_shm_hostent().

I don't know if you'd prefer to make tm use free_shm_hostent() rather than have it free the fields manually, but either way it needs to make sure h_name gets free'd if h_name is getting allocated.

Thanks @jes for the update. Please note, however, this change is irrelevant for my version of the patch since I've re-worked hostent_shm_cpy() to avoid doing extra allocation for the h_name, instead the buffer allocated for the h_addr_list is extended to also provide space for the h_name. As such, code that frees h_addr_list would also deallocate h_name.

--- a/proxy.c
+++ b/proxy.c
@@ -58,27 +58,29 @@ int disable_dns_failover=0;
 int hostent_shm_cpy(struct hostent *dst, struct hostent* src)
 {
        unsigned int len;
-       int  i;
+       int  hal_len;
        char *p;
+       size_t alen;

-       len = strlen(src->h_name)+1;
-       dst->h_name = (char*)shm_malloc(sizeof(char) * len);
-       if (dst->h_name) strncpy(dst->h_name, src->h_name, len);
-       else return -1;
+       len = strlen(src->h_name) + 1;

-       for( i=0 ; src->h_addr_list[i] ; i++ );
+       for( hal_len=0 ; src->h_addr_list[hal_len] ; hal_len++ ) {};

-       dst->h_addr_list = (char**)shm_malloc
-               (i * (src->h_length + sizeof(char*)) + sizeof(char*));
+       alen = hal_len * (src->h_length + sizeof(char*));
+       alen += sizeof(void*);
+
+       dst->h_addr_list = (char**)shm_malloc(alen + len);
        if (dst->h_addr_list==NULL) {
-               shm_free(dst->h_name);
                return -1;
        }

-       p = ((char*)dst->h_addr_list) + (i+1)*sizeof(char*);
-       dst->h_addr_list[i] = 0;
+       dst->h_name = ((char *)dst->h_addr_list) + alen;
+       memcpy(dst->h_name, src->h_name, len);
+
+       p = ((char*)dst->h_addr_list) + (hal_len+1)*sizeof(char*);
+       dst->h_addr_list[hal_len] = NULL;

-       for( i-- ; i>=0 ; i-- ) {
+       for( int i = hal_len - 1 ; i>=0 ; i-- ) {
                dst->h_addr_list[i] = p;
                memcpy( dst->h_addr_list[i], src->h_addr_list[i], src->h_length );
                p += src->h_length;
@@ -93,8 +95,6 @@ int hostent_shm_cpy(struct hostent *dst, struct hostent* src)

 void free_shm_hostent(struct hostent *dst)
 {
-       if (dst->h_name)
-               shm_free(dst->h_name);
        if (dst->h_addr_list)
                shm_free(dst->h_addr_list);
 }

@jes
Copy link
Contributor Author

jes commented Dec 5, 2023

Btw @sobomax I wonder if you could educate me - why is it that the shm hostent stuff doesn't allocate the name? Apparently this is the obvious behaviour, because tm doesn't bother freeing the name, and you prefer to put the name in the h_addr_list field instead of the h_name field. Could you help me understand?

@sobomax
Copy link
Contributor

sobomax commented Dec 6, 2023

Btw @sobomax I wonder if you could educate me - why is it that the shm hostent stuff doesn't allocate the name? Apparently this is the obvious behaviour, because tm doesn't bother freeing the name, and you prefer to put the name in the h_addr_list field instead of the h_name field. Could you help me understand?

Well, that's kind of easy. In the hostent_shm_cpy() we allocate buffer sufficient to hold h_addr_list entries (plus some extra for a terminating NULL), so instead of allocating another shared memory chunk I am just allocating a bigger chunk to also have sufficient storage for the hostname and point h_name to the start of that section in that bigger chunk.

So we are going from this:

                  +---shm_malloc#1----+
dst->h_addr_list->| h_addr_list[0..N] |
                  +-------------------+


                  +-shm_malloc#2-+
dst->h_name------>| h_name       |
                  +--------------+

To this:

                  +---shm_malloc#1----+
dst->h_addr_list->| h_addr_list[0..N] |
                  +-------------------+
dst->h_name------>| h_name       |
                  +--------------+

Not only it simplifies deallocation code, but is also going to improve cache locality as well as avoid additional shared memory fragmentation.

@sobomax
Copy link
Contributor

sobomax commented Dec 6, 2023

P.S. Lots of complexity of the current name resolution code steams from the unfortunate fact that OpenSIPS historically (ab)uses system-provided "struct hostent" to hold all this data and pass it between threads and that locks us into a certain structure. As such we have no control over the layout of that struct and also forced to use nil-terminated string instead of more safe and convenient str / str_const. I am really inclined to rip that off and roll our very own version of the said struct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] OpenSIPS does not verify hostnames in TLS certificates (?)
3 participants