Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Outside access over hostport #194

Closed
Hermain opened this issue Jul 31, 2018 · 8 comments
Closed

Outside access over hostport #194

Hermain opened this issue Jul 31, 2018 · 8 comments

Comments

@Hermain
Copy link
Contributor

Hermain commented Jul 31, 2018

I have written an init container that uses the python kubernetes client to query the kubernetes api server for the public dns name of the node which runs kafka. I also used proper Rbac permissions that only enable the pod to query node information.

This can be used as advertise address for kafka together with a hostport. With such a setup all outside traffic goes to the desired broker directly. The disadvantage is that you can't run two kafka instances on the same node (which is a bad idea anyways).

This works with amazon aws, I haven't tested it in other deployments.

Would this be a good change to try and upstream into this repo or is it too specific for my setup?

@solsson
Copy link
Contributor

solsson commented Jul 31, 2018

It sounds interesting, though I'd guess that most clusters need more than a hostport for outside access.

What do you mean with "all outside traffic goes to the desired broker directly". How is that different from the current approach with invidual services? Do you mean as opposed to through iptables and firewalls? I'm asking because any setup where clients don't address individual brokers will most likely fail.

Edit: I think people are interested, but based on https://kubernetes.io/docs/concepts/configuration/overview/#services I wouldn't merge it. "Don’t specify a hostPort for a Pod unless it is absolutely necessary."

@Hermain
Copy link
Contributor Author

Hermain commented Jul 31, 2018

The difference between a hostport and a nodeport is that a setup with nodeport will forward traffic if it reaches a node which doesn't run the target pod.

I assumed that your setup does exactly that, which creates unnecessary traffic in the cluster. Corerct me if I'm wrong.

Can you explain to me how traffic from an outside client reaches a specific node?
What is the advertise address used?

Lets split the discussion up:

Question 1: Could you use a mechanism to determine the public dns/ip of the node running specific kafka broker, so that the broker can advertise that address?

Question 2: Is using hostPort a good idea or not?

My motivation:

I'm sending data from outside to Kafka from some "IOT devices" which generate up to 50 MBytes/s each. I want to eliminate any unnecessary traffic inside the cluster.

@Hermain
Copy link
Contributor Author

Hermain commented Jul 31, 2018

Specifically what I don't get is this line:
OUTSIDE_HOST=$(kubectl get node "$NODE_NAME" -o jsonpath='{.status.addresses[?(@.type=="InternalIP")].address}')

In my cluster this returns the cluster internal address which can't be reached from outside. However this is actually something very similar to what I did with the python lib.

UPDATE:
Ok so when I change this line to ExternalIP it does exactly what I suggested so you don't need "a mechanism to determine the public dns/ip of the node running specific kafka broker" which answers question 1.

@solsson
Copy link
Contributor

solsson commented Jul 31, 2018

Ok so when I change this line to ExternalIP it does exactly what I suggested

Looks like the suggestion in #187.

While nodePort is indeed exposed on all nodes, when combined with the init script's lookup Kafka's client bootstrap process will redirect clients to the one where the pod sits.

Maybe question 2 then simply becomes a matter of taste, or maybe if you want to bypass iptables or not :)

@Hermain Hermain closed this as completed Aug 2, 2018
@Hermain Hermain reopened this Aug 3, 2018
@Hermain
Copy link
Contributor Author

Hermain commented Aug 3, 2018

Using hostport would also remove the necessity of the outside-services/ which contains one service per node.

This seems clumsy as you might want your statefullset size to change over time (maybe automatically?) and this would require creating a service per broker and deleting it.

Or is there some nice way to handle that?

@solsson
Copy link
Contributor

solsson commented Aug 3, 2018

This repo being mostly an example that you fork, the purpose of the services is that you can change them to whatever service type that fits your hosting scenario. It's easy to generate ten such service manifests. I agree the hostport solution wins on simplicity.

How about we extend https://github.com/Yolean/kubernetes-kafka/blob/master/outside-services/README.md with the insights from this issue and #187? Then it'll be up to individual use cases to simply not apply the outside-services folder and go directly for node IPs instead.

@Hermain
Copy link
Contributor Author

Hermain commented Aug 3, 2018

Lovely, I'll rewrite the code and make sure it works first, then I'll write up an update to the readme with the snippets and make a pullrequest 👍

@solsson
Copy link
Contributor

solsson commented Aug 13, 2018

Thanks a lot for the PR. Relevant to #196 as well.

@solsson solsson closed this as completed Aug 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants