Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement TLS support for incoming redis connections #141

Merged
merged 3 commits into from
Sep 7, 2021
Merged

Implement TLS support for incoming redis connections #141

merged 3 commits into from
Sep 7, 2021

Conversation

rukai
Copy link
Member

@rukai rukai commented Aug 17, 2021

Design

The plan is to specify paths to keys and certs like this:

sources:
  redis_prod:
    Redis:
      batch_size_hint: 1
      listen_addr: "127.0.0.1:6379"
      tls:
        certificate_path: "examples/redis-tls/tls_keys/redis.crt"
        private_key_path: "examples/redis-tls/tls_keys/redis.key"
chain_config:
  redis_chain:
    - RedisDestination:
        remote_address: "127.0.0.1:1111"
        tls:
          certificate_path: "examples/redis-tls/tls_keys/redis.crt"
          private_key_path: "examples/redis-tls/tls_keys/redis.key"
named_topics:
  testtopic: 5
source_to_chain_mapping:
  redis_prod: redis_chain

I think repeating the same paths when we are just doing a simple pass through like this is fine.
It keeps the configuration format simple and allows flexibility of seperate keys between incoming and outgoing connections and even between different sources and outgoing endpoints.
Open to other ideas though.

This PR

But the scope of this PR is to just implement redis support for incoming TLS connections.
After this PR I will do two more PRs for:

  1. redis outgoing
  2. cassandra ingoing and outgoing

Questions

I have used the key/certificate formats as described here https://redis.io/topics/encryption
Are there any other specific key/certificate formats I should/need to support and test with?

TODO

Currently the redis-tls/redis-cli.sh script can be used to create a connection when shotover is configured to use redis tls.
But I still need to do the following:

  • add integration test
  • address TODO comments
  • trim out unneeded keys from examples/redis-tls/tls_keys, not sure if we need any of these for different test cases.

@benbromhead
Copy link
Member

Looks good!

I think supporting different keys for the listener and upstream totally makes sense (though it makes me have terrible thoughts about yaml and templating...).

cc @johndelcastillo whats your feel on the key formats / having to specify a pub and private as seperate files. Some formats will allow us to just put both in a single file (e.g. pem iirc).

Additionally we should allow (irrespective of where we land above) the ability to configure a store of ca certs to trust when building the connection. That way we will have the ability to configure:

  • Shotovers identity on either side of the connection
  • What CAs shotover will trust and then accept connections from.

From a PCI DSS perspective we may need to also be able to allow/deny specific TLS algorithms or modes?? @johndelcastillo ?

cc @Claudenw -> Any input on cert config ergonomics?

@johndelcastillo
Copy link
Contributor

  • The structure looks good to me.
  • Separate files for certs and keys is pretty standard, so I'm fine with that.
  • There are minimum algorithm requirements, so being able to restrict them would be convenient.

@rukai
Copy link
Member Author

rukai commented Aug 19, 2021

Update on how this is going:
Implementing the integration test has proven to be more complicated than expected.
redis-rs does not support a way to use a custom TLS connection over the sync API. redis-rs/redis-rs#527
So my options for adding TLS integration tests are:

  • Implement redis::Connection::new as per the issue I raised. This looks quite involved and will need to rework a lot stuff in redis-rs
  • convert all the redis tests to be async - this seems undesirable, I think its better for the tests to be sync to keep them simple. But maybe its not so bad?
  • hack in async test logic for just this test - will need to recreate the connection retry logic...
  • write the test in sync code but hack it together without redis-rs, just directly send hardcoded redis message bytes over TLS - will need to recreate the connection retry logic...

I'll probably try one of the last 2 hacky solutions and see how it goes.

Edit:
An idea I had is to replace the connection retry logic with logic on DockerCompose::new to block until we have a port up.
If I can get this working it should make the last 2 solutions more appealing.

@benbromhead
Copy link
Member

Doing async tests isn't so bad with the tokio macro annotations (other tests have them) and it largely doesn't create additional complexity.

See https://docs.rs/tokio/1.10.0/tokio/attr.test.html

@XA21X XA21X linked an issue Aug 24, 2021 that may be closed by this pull request
@rukai
Copy link
Member Author

rukai commented Aug 31, 2021

It turns out that rustls, the TLS library I am using, does not support connecting to servers via an IP address, it only supports connecting via a domain name.
The problem is clearly seen in the TlsConnector::connect API which requires a DNSNameRef. https://docs.rs/tokio-rustls/0.22.0/tokio_rustls/struct.TlsConnector.html
And in this issue: rustls/rustls#184
But problem spans all the way down the stack to the webpki crate briansmith/webpki#54
There is a webpki PR for it, but who knows how long it will take to finish and then end up exposed all the way to the top of the TLS stack: briansmith/webpki#218

Is IP address connections a production requirement?
It would certainly simplify writing integration tests to have access to it. Not sure how we could assign a domain name to localhost from within an integration test...

Because rustls is pure rust, using it greatly simplifies building and deploying.
However given this issue I'll be swapping to tokio-native-tls instead, as that seems to be the only way forward.

@benbromhead
Copy link
Member

tl;dr yup go ahead with tokio-native-tls as it is indeed a req to connect via IP

long version:
We don't generate dns names for the all hosts in all scenarios and I honestly can't remember what the signed subject is for the certs we generate. <- @johndelcastillo any thoughts?

From a security perspective it would be great if we started leveraging certs for host identity but i know we don't atm.

@rukai we can use prod instaclustr redis deployments for testing if needed (we even have a terraform provider to help setup infra for tests).

@rukai
Copy link
Member Author

rukai commented Aug 31, 2021

I have now discovered that the native-tls crate only supports PKCS#12 sfackler/rust-native-tls#27
Apparently PKCS#12 is not well designed and something like PKCS#8 or PEM would be better.
@johndelcastillo thoughts on PKCS#12?

Should I try looking for another library?
the native-tls crate uses openssl on linux, schannel on windows and secure transport on mac os.
Maybe I can get away with just using openssl via https://github.com/sfackler/tokio-openssl

@rukai
Copy link
Member Author

rukai commented Aug 31, 2021

Looks like openssl covers all that we need (pem and ip address connections) so I'll go down that path for now https://docs.rs/openssl/0.10.36/openssl/ssl/index.html

@rukai
Copy link
Member Author

rukai commented Sep 1, 2021

Made a lot of progress but also hit a very complicated issue.
I have completely rewritten the PR with openssl instead of rustls.
I can connect to shotover over TLS with the redis-cli.sh script in the PR.

However when I try to connect via openssl in the integration test it panics with:
thread 'redis_int_tests::basic_driver_tests::test_tls_async' panicked at 'called Result::unwrap() on an Err value: Error { code: ErrorCode(1), cause: Some(Ssl(ErrorStack([Error { code: 337047686, library: "SSL routines", function: "tls_process_server_certificate", reason: "certificate verify failed", file: "ssl/statem/statem_clnt.c", line: 1914 }]))) }', shotover-proxy/tests/helpers/mod.rs:97:51

I can reproduce this issue in the tiny test used in tokio-openssl when modified to use the keys in the way redis uses them.
The regular test passes fine:
https://github.com/sfackler/tokio-openssl/blob/1d22cbdd2104b64778ae1eacd98de65aa8affcd4/src/test.rs#L38

But when I change it to use certificates in the same way redis does, I get the same error I am getting in shotover:
rukai/tokio-openssl@36f9b6c

Using openssl has the nice advantage that I can look into what openssl calls redis and redis-cli are making and compare them to what shotover and the integration test are doing.
I can see that that when I change SSL_CTX_set_verify(SSL_VERIFY_PEER);, to SSL_CTX_set_verify(SSL_VERIFY_NONE); The issue stops occurring.
However after investigating I discovered that redis-cli uses SSL_CTX_set_verify(SSL_VERIFY_PEER); by default. So changing SSL_VERIFY is not the fix.

Tomorrow I will try compiling shotover with the same version of openssl that redis is using and see if that helps.
Edit: They were running the same version, I confused myself here over something silly.

@benbromhead
Copy link
Member

benbromhead commented Sep 1, 2021

Double check the certificates we are using can be verified (if they are the ones pulled directly from redis examples should be ok, or if they can be used with redis set to verify without shotover in the middle). Double check things like the Subject name in the cert.

Otherwise it looks like the error is in how we are setting up the openssl context to verify the peer / connection. From a cursory glance at tls.rs it "looks" ok, but tls stuff is always finicky.

We also appear to setting the tls security to modern which is slightly stricter than intermediate. What profile does the redis server use (if it matches one at all)?

@rukai
Copy link
Member Author

rukai commented Sep 2, 2021

Thanks to your tips I've figured it out.

Redis server and client work fine with the certificates I am trying to use.

Looking at the certificates generated by the redis supplied script, I can see that the CN is set to "Generic-cert" unlike the rust-openssl example certs which use "localhost"
image

And then sure enough, comparing the handling of address validation between redis and shotover I can see that only rust-openssl calls these functions: X509_VERIFY_PARAM_set1_host/X509_VERIFY_PARAM_set1_ip
And removing them from rust-openssl fixes the issue.

So either:

  • redis is not correctly implementing TLS and has a security flaw. We should look into fixing that and generate a proper cert for shotover.
  • rust-openssl needs to be patched to allow us to connect without a certificate setup with a proper CN.

@benbromhead
Copy link
Member

Great stuff!

So maybe double check if Redis has a cert mode of operation that will verify things (could be config), otherwise this is one of those wonderful scenarios where shotover aims to be protocol and implementation compatible... bugs and all.

Practically to move forward.

  • I would as part of the tls config, allow the user to turn off host/IP verification (have it on by default). Our Redis source should then not verify host/IP if Redis currently does not.
  • Double check redis config and their GH issues / PRs to see if there is existing chatter about it and raise an issue if it turns out to be a genuine one. If you (or someone else, maybe a good one for @Claude-at-Instaclustr ) wants to write a patch / PR for Redis go for it, but I wouldn't leave that as a blocker for this implementation.

@rukai
Copy link
Member Author

rukai commented Sep 2, 2021

The PR is now ready for review!

The integration test is pretty barebones because I cant access all the helper methods as they are dont work with an async connection.
I intend to make a separate PR to rewrite all the integration tests to use async once #162 is merged.
So we can either:

  • wait for the integration test async rewrite to be completed, then make use of it in this PR.
  • merge this PR as is and then fix up the test in the integration test async rewrite.

I have left a warning in to remind us to continue investigating the TLS issue after this is merged.
I'll make an issue to track it as well.

shotover-proxy/src/sources/redis_source.rs Show resolved Hide resolved
shotover-proxy/src/tls.rs Show resolved Hide resolved
shotover-proxy/src/server.rs Outdated Show resolved Hide resolved
@benbromhead
Copy link
Member

lgtm!

pub fn new(tls_config: TlsConfig) -> Result<TlsAcceptor> {
let mut builder = SslAcceptor::mozilla_modern_v5(SslMethod::tls())?;
builder.set_ca_file(tls_config.certificate_authority_path)?;
builder.set_private_key_file(tls_config.private_key_path, SslFiletype::PEM)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should happen if these files are malformed? I think it should be similar to how we handle other malformed or invalid data in the topology.yaml config file.

})
}

pub async fn connect(&self, tcp_stream: TcpStream) -> Result<SslStream<TcpStream>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should happen when the TLS connection fails? I changed the key file to cause this and test_tls still passed.

Copy link
Member Author

@rukai rukai Sep 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What did you change? Maybe you got lucky and changed something that wasnt part of the key?
I got a failure like this:

image

Copy link
Member Author

@rukai rukai Sep 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is actually more relevant to your other comment.
The base64 was invalid and this is how the error is handled.

I'll try to investigate what happens if I modify the key to be incorrect not just malformed.

I'll also check if modifying the key like this affects redis.

Copy link
Member

@conorbros conorbros Sep 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What did you change to get that output?

Edit: Oh I see now: "U" to "0"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its in the diff in the screenshot

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its actually a deleted character which is why it failed to parse

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@conorbros what exactly did you change and how did it pass? 😮

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed one of the characters in the key file.

@conorbros
Copy link
Member

Since using TLS with Redis can add big performance overheads perhaps we should add another benchmark.

@rukai
Copy link
Member Author

rukai commented Sep 3, 2021

I'm thinking we can add benchmarks in a follow up PR.
Even if the benchmark indicated that we had to optimize something, I would still want to merge as is and do the optimization in a follow up PR.
I created an issue: #170

let ssl = self
.connector
.configure()?
.verify_hostname(false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense given the lack of hostname support in the context of cluster mode: redis/redis#2186

We can connect to a node by hostname to discover other nodes, but the other nodes are only referenced by IP, so unless you require reverse DNS records (with custom verifier), we don't have the hostnames to check.

We mitigate this for Cassandra by including iPAddress (and dNSName) in the subjectAltName.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh interesting, I added your comment to #167

@rukai
Copy link
Member Author

rukai commented Sep 7, 2021

@XA21X I have addressed your feedback

@rukai
Copy link
Member Author

rukai commented Sep 7, 2021

There has been only minor changes since bens review, so im going to go ahead and merge 🎉

@rukai rukai merged commit 39a0cba into shotover:main Sep 7, 2021
@Claude-at-Instaclustr
Copy link
Contributor

Sorry I am so late to this conversation. But I have a few questions.

  1. Why did we put private keys in Git? Seems like those should be stored somewhere else unless they are explicitly for testing and even then it is questionable practice to out keys and certs into Git.
  2. There doesn't seem to be any indication of acceptable ciphers in in the configuration file. It looks like that was skipped. I believe that we should open a ticket to add available ciphers.
  3. It is unclear to me from the code, but:
    1. does the OpenSSL library use the accepted certs from the system by default? openssl version -a will list the directories where the certs are stored
    2. does providing alternative certs disable the default.
    3. Are there mechanisms to disable default certs?
    4. I note a number of options in the builder classes. I think we should map configuration options to builder options. But I also think this might need some thought. Perhaps we should open a ticket to address how to simplify mapping between configuraiton options and properties in structs. Or, perhaps there is a rust idiom for doing this?

@rukai
Copy link
Member Author

rukai commented Sep 8, 2021

  1. the keys are purely for integration tests. What is your concern with committing them?
  2. I made add TLS configuration option to specify allowed ciphers #182
    1. I'm not sure but I can investigate. What is the effect of one or the other?
    2. It is currently mandatory for a custom cert to be provided in the TLS configuration
    3. There are no configuration options outside of the 3 paths to the ca cert, cert and key.
    4. I'm not sure how what you are suggesting differs from what we have currently?

@Claude-at-Instaclustr
Copy link
Contributor

Claude-at-Instaclustr commented Sep 8, 2021

the keys are purely for integration tests. What is your concern with committing them?

OK, I just want to be sure that we are not dependant upon them for anything else. Also, they will need to be replaced when they age out.

  1. I'm not sure how what you are suggesting differs fro m what we have currently?

https://docs.rs/openssl/0.10.29/openssl/ssl/struct.SslAcceptorBuilder.html has a series of functions to set cipher suites, set the cipher list, set options as defined in https://www.openssl.org/docs/manmaster/man3/SSL_CTX_set_options.html and similar. So there are a lot of knobs and moving parts. If we are probably going to need a way to set any/all of those. Though perhaps not right now. Does rust have a standard way to map a yaml file to properties on a struct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support TLS connections
6 participants