The purpose of this project is to provide a name and path based router for Kubernetes. It started out as an ingress controller but has since been repurposed to allow for both ingress and other types of routing via to its configurability. From an ingress perspective, this router does things a little different than your typical Kubernetes Ingress controller:
- This version does pod-level routing instead of service-level routing
- This version does not use the Kubernetes Ingress Resource definitions and instead uses pod-level annotations to wire things up (This is partially because the built-in ingress resource is intended for service-based ingress instead of pod-based ingress.)
But in the end, you have a routing controller capable of doing routing based on the combination of hostname/IP and path.
This router is written in Go and is intended to be deployed within a container on Kubernetes. Upon startup, this router will find the Pods marked for routing (using a configurable label selector) and the Secrets *(using a configurable location)_ used to secure routing for those pods. (For more details on the role secrets play in this router, please see the Security section of this document.) The Pods marked for routing are then analyzed to identify the wiring information used for routing stored in the Pod's annotations:
routingHosts
: This is a space delimited array of hostnames and/or IP addresses that are expected to route to the Pod (Example:test.github.com 192.168.0.1
)routingPaths
: This is the space delimited array of request path or path prefixes that are expected to route to the Pod and its appropriate container port. (The value's format is{PORT}:{PATH}
where{PORT}
corresponds to the container port serving the traffic for the{PATH}
. Example:3000:/nodejs 8080:/java
.)
Once we've found all Pods and Secrets that are involved in routing, we generate an nginx configuration file and start nginx. At this point, we cache Pods and Secrets to avoid having to requery the full list each time and instead listen for Pod and Secret events. Any time a Pod or Secret event occurs that would have an impact on routing, we regenerate the nginx configuration and reload it. (The idea here was to allow for an initial hit to pull all pods but to then to use the events for as quick a turnaround as possible.) Events are processed in 2 second chunks.
Each Pod can expose one or more services by using one or more entries in the routingPaths
annotation. All of the
paths exposed via routingPaths
are exposed for each of the hosts listed in the routingHosts
annotation. (So if
you have a trafficHosts of host1 host2
and a routingPaths
of 80:/ 3000:/nodejs
, you would have 4 separate nginx
location blocks: host1/ -> {PodIP}:80
, host2/ -> {PodIP}:80
, host1/nodejs -> {PodIP}:3000
and
host2/nodejs -> {PodIP}:3000
Right now there is no way to associate specific paths to specific hosts but it may be
something we support in the future.)
All of the touch points for this router are configurable via environment variables:
API_KEY_HEADER
: This is the header name used by nginx to identify the API Key used (Default:X-ROUTING-API-KEY
)API_KEY_SECRET_LOCATION
: This is the location of the optional API Key to use to secure communication to your Pods. (The format for this key is{SECRET_NAME}:{SECRET_DATA_FIELD_NAME}
. Default:routing:api-key
)CLIENT_MAX_BODY_SIZE
: Configures the max client request body size of nginx. (Default:0
, Disables checking of client request body size.)HOSTS_ANNOTATION
: This is the annotation name used to store the space delimited array of hosts used for routing to your Pods (Default:routingHosts
)PATHS_ANNOTATION
: This is the annotation name used to store the space delimited array of routing path configurations for your Pods (Default:routingPaths
)PORT
: This is the port that nginx will listen on (Default:80
)ROUTABLE_LABEL_SELECTOR
: This is the label selector used to identify Pods that are marked for routing (Default:routable=true
)
While most routers will perform routing only, we have added a very simple mechanism to do API Key based authorization
at the router level. Why might you want this? Imagine you've got multi-tenancy in Kubernetes where each namespace is
specific to a single tentant. To avoid a Pod in namespace X
configuring itself to receive traffic from namespace Y
,
this router allows you to create a specially named secret (routing
in this case) with a specially named data field
(api-key
) and the value stored in this secret will be used to secure traffic to all Pods in your namespace wired up
for routing. To do this, nginx is configured to ensure that all requests routed to your Pod have the
X-ROUTING-API-KEY
header provided with its value being the base64-encoded value of your secret.
Here is an example of how you might create this secret so that all Pods wired up for routing in the my-namespace
namespace are secured via API Key:
kubectl create secret generic routing --from-literal=api-key=supersecret --namespace=my-namespace
Based on the example, any routes that points to Pods in the my-namespace
namespace will be required to have
X-ROUTING-API-KEY: c3VwZXJzZWNyZXQ=
set in their request for the router to allow routing to the Pods. Otherwise, a
403
is returned. Of course, if your namespace does not have the specially named secret, you do not have to adhere to
provide this header.
Note: This feature is written assuming that each combination of routingHosts
and routingPaths
will only be
configured such that the Pods servicing the traffice are from a single namespace. Once you start allowing pods from
multiple namespaces to consume traffic for the same host and path combination, this falls apart. While the routing will
work fine in this situation, the router's API Key is namespace specific and the first seen API Key is the one that is
used.
By default, nginx will buffer responses for proxied servers. Unfortunately, this can be a problem if you deploy a
streaming APIs. Thankfully, nginx makes it easy for proxied applications to disable proxy buffering by setting the
X-Accel-Buffering
header to no
. Doing this will make your streaming API work as expected. For more details, view
the nginx documentation: http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_buffering
Why are we bringing up WebSocket support? Well, nginx itself operates in a way that makes routing to Pods that are
WebSocket servers a little difficult. For more details, read the nginx documentation located here:
https://www.nginx.com/blog/websocket-nginx/ The way that the k8s-router addresses things is at each location
block, we throw in some WebSocket configuration. It's very simple stuff but since there is some reasoning behind the
location where this is applied and the approach taken, it makes sense to explain it here.
The WebSocket configuration is at the location
level, and it is there because nginx does not allow us to use the
set directive at the server
or http
level. We
have to use the set
directive to properly handle the Connection
header. See, nginx uses close
as the default
value for the Connection
header when there is no Connection
header provided. So if we just passed through the
Connection
header and there was no Connection
header provided, instead of using the default value of close
the
value would be ''
which would basically delete the Connection
header which is not how nginx operates. So we have to
conditionally set a variable based on the Connection
header value and set
is the only way.
The other part of this implementation that is worth documenting is that in the previously linked documentation for
enabling WebSockets in nginx, you see they use proxy_http_version 1.1;
to force HTTP 1.1. Well, for a generic server
where not all location
blocks are for WebSockets, we needed a way to conditionally enable HTTP 1.1. Well...there is
no way to do this. proxy_http_version
cannot be used in an if
directive and proxy_http_version
cannot be set to
a string value, which is the only value you can use for nginx variables. So since we do not want to force HTTP 1.1 on
everyone, we just leave it up to the client to make an HTTP 1.1 request.
So when you look at the generated nginx configuration and see some duplicate configuration related to WebSockets, or you see that we are not forcing HTTP 1.1, now you know.
Let's assume you've already deployed the router controller. (If you haven't, feel free to look at the
Building and Running section of the documentation.) When the router starts up, nginx is
configured and started on your behalf. The generated /etc/nginx/nginx.conf
that the router starts with looks like
this (assuming you do not have any deployed Pods marked for routing):
# A very simple nginx configuration file that forces nginx to start as a daemon.
events {}
http {
# Default server that will just close the connection as if there was no server available
server {
listen 80 default_server;
return 444;
}
}
daemon on;
This configuration will tell nginx to listen on port 80
and all requests for unknown hosts will be closed, as if there
was no server listening for traffic. (This approach is better than reporting a 404
because a 404
says someone is
there but the request was for a missing resource while closing the connection says that the request was for a server
that didn't exist, or in our case a request was made to a host that our router is unaware of.)
Now that we know how the router spins up nginx initially, let's deploy a microservice to Kubernetes. To do that, we will be packaging up a simple Node.js application that prints out the environment details of its running container, including the IP address(es) of its host. To do this, we will build a Docker image, publish the Docker image and then create a Kubernetes ReplicationController that will deploy one pod representing our microservice.
Note: All commands you see in this demo assume you are already within the demo
directory.
These commands are written assuming you are running docker at 192.168.64.1:5000
so please adjust your Docker commands
accordingly.
First things first, let's build our Docker image using docker build -t nodejs-k8s-env .
, tag the Docker image using
docker tag -f nodejs-k8s-env 192.168.64.1:5000/nodejs-k8s-env
and finally push the Docker image to your Docker
registry using docker push 192.168.64.1:5000/nodejs-k8s-env
. At this point, we have built and published a Docker
image for our microservice.
The next step is to deploy our microservice to Kubernetes but before we do this, let's look at the ReplicationController
configuration file to see what is going on. Here is the rc.yaml
we'll be using to deploy our microservice
(Same as demo/rc.yaml
):
apiVersion: v1
kind: ReplicationController
metadata:
name: nodejs-k8s-env
labels:
name: nodejs-k8s-env
spec:
replicas: 1
selector:
name: nodejs-k8s-env
template:
metadata:
labels:
name: nodejs-k8s-env
# This marks the pod as routable
routable: "true"
annotations:
# This says that only traffic for the "test.k8s.local" host will be routed to this pod
routingHosts: "test.k8s.local"
# This says that only traffic for the "/nodejs" path and its sub paths will be routed to this pod, on port 3000
routingPaths: "3000:/nodejs"
spec:
containers:
- name: nodejs-k8s-env
image: 192.168.64.1:5000/nodejs-k8s-env
env:
- name: PORT
value: "3000"
ports:
- containerPort: 3000
When we deploy our microservice using kubectl create -f rc.yaml
, the router will notice that we now have one Pod
running that is marked for routing. If you were tailing the logs, or you were to review the content of
/etc/nginx/nginx.conf
in the container, you should see that it now reflects that we have a new microservice
deployed:
events {
worker_connections 1024;
}
http {
# http://nginx.org/en/docs/http/ngx_http_core_module.html
types_hash_max_size 2048;
server_names_hash_max_size 512;
server_names_hash_bucket_size 64;
# Force HTTP 1.1 for upstream requests
proxy_http_version 1.1;
# When nginx proxies to an upstream, the default value used for 'Connection' is 'close'. We use this variable to do
# the same thing so that whenever a 'Connection' header is in the request, the variable reflects the provided value
# otherwise, it defaults to 'close'. This is opposed to just using "proxy_set_header Connection $http_connection"
# which would remove the 'Connection' header from the upstream request whenever the request does not contain a
# 'Connection' header, which is a deviation from the nginx norm.
map $http_connection $p_connection {
default $http_connection;
'' close;
}
# Pass through the appropriate headers
proxy_set_header Connection $p_connection;
proxy_set_header Host $host;
proxy_set_header Upgrade $http_upgrade;
server {
listen 80;
server_name test.k8s.local;
location /nodejs {
# Pod nodejs-k8s-env-eq7mh
proxy_pass http://10.244.69.6:3000;
}
}
# Default server that will just close the connection as if there was no server available
server {
listen 80 default_server;
return 444;
}
}
This means that if someone requests http://test.k8s.local/nodejs
, assuming you've got test.k8s.local
pointed to the
edge of your Kubernetes cluster, it should get routed to the proper Pod (nodejs-k8s-env-eq7mh
in our example). If
everything worked out properly, you should see output like this:
{
"env": {
"PATH": "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"HOSTNAME": "nodejs-k8s-env-eq7mh",
"PORT": "3000",
"KUBERNETES_PORT": "tcp://10.100.0.1:443",
"KUBERNETES_PORT_443_TCP": "tcp://10.100.0.1:443",
"KUBERNETES_PORT_443_TCP_PROTO": "tcp",
"KUBERNETES_PORT_443_TCP_PORT": "443",
"KUBERNETES_PORT_443_TCP_ADDR": "10.100.0.1",
"KUBERNETES_SERVICE_HOST": "10.100.0.1",
"KUBERNETES_SERVICE_PORT": "443",
"VERSION": "v5.8.0",
"NPM_VERSION": "3",
"HOME": "/root"
},
"ips": {
"lo": "127.0.0.1",
"eth0": "10.244.69.6"
}
}
If you've noticed in the example nginx configuration files above, nginx is configured appropriately to reverse proxy to WebSocket servers. (The only reason we bring this up is that it's not something you get out of the box.) To test this using the the previously-deployed application, use the following Node.js application:
var socket = require('socket.io-client')('http://test.k8s.local', {path: '/nodejs/socket.io'});
socket.on('env', function (env) {
console.log(JSON.stringify(env, null, 2));
});
// Emit the 'env' event to the server, which emits an 'env' event to the client with the server environment details.
socket.emit('env');
Now that's cool and all but what happens when we scale our application? Let's scale our microservice to 3
instances
using kubectl scale --replicas=3 replicationcontrollers nodejs-k8s-env
. Your /etc/nginx/nginx.conf
should look
something like this:
events {
worker_connections 1024;
}
http {
# http://nginx.org/en/docs/http/ngx_http_core_module.html
types_hash_max_size 2048;
server_names_hash_max_size 512;
server_names_hash_bucket_size 64;
# Force HTTP 1.1 for upstream requests
proxy_http_version 1.1;
# When nginx proxies to an upstream, the default value used for 'Connection' is 'close'. We use this variable to do
# the same thing so that whenever a 'Connection' header is in the request, the variable reflects the provided value
# otherwise, it defaults to 'close'. This is opposed to just using "proxy_set_header Connection $http_connection"
# which would remove the 'Connection' header from the upstream request whenever the request does not contain a
# 'Connection' header, which is a deviation from the nginx norm.
map $http_connection $p_connection {
default $http_connection;
'' close;
}
# Pass through the appropriate headers
proxy_set_header Connection $p_connection;
proxy_set_header Host $host;
proxy_set_header Upgrade $http_upgrade;
# Upstream for /nodejs traffic on test.k8s.local
upstream upstream1866206336 {
# Pod nodejs-k8s-env-eq7mh
server 10.244.69.6:3000;
# Pod nodejs-k8s-env-yr1my
server 10.244.69.8:3000;
# Pod nodejs-k8s-env-oq9xn
server 10.244.69.9:3000;
}
server {
listen 80;
server_name test.k8s.local;
location /nodejs {
# Upstream upstream1866206336
proxy_pass http://upstream1866206336;
}
}
# Default server that will just close the connection as if there was no server available
server {
listen 80 default_server;
return 444;
}
}
The big change between the one Pod microservice and the N Pod microservice is that now the nginx configuration uses
the nginx upstream to do load balancing across the N
different Pods. And due to the default load balancer in nginx being round-robin based, requests for
http://test.k8s.local/nodejs
should return a different payload for each request showing that you are indeed
hitting each individual Pod.
I hope this example gave you a better idea of how this all works. If not, let us know how to make it better.
As mentioned above, this project started out as an ingress with the sole purpose of routing traffic from the internet to Pods within the Kubernetes cluster. One of the use cases we have at work is we need an general ingress but we also want to use this router for a simplistic service router. So essentially, we have a public ingress and a private...router. Here is an example deployment file where you use the configurability of this router to serve both purposes:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: k8s-pods-router
labels:
app: k8s-pods-router
spec:
template:
metadata:
labels:
app: k8s-pods-router
spec:
containers:
- image: thirtyx/k8s-router:latest
imagePullPolicy: Always
name: k8s-pods-router-public
ports:
- containerPort: 80
hostPort: 80
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
# Use the configuration to use the public/private paradigm (public version)
- name: API_KEY_SECRET_LOCATION
value: routing:public-api-key
- name: HOSTS_ANNOTATION
value: publicHosts
- name: PATHS_ANNOTATION
value: publicPaths
- image: thirtyx/k8s-router:latest
imagePullPolicy: Always
name: k8s-pods-router-private
ports:
- containerPort: 81
# We should probably avoid using host port and if needed, at least lock it down from external access
hostPort: 81
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
# Use the configuration to use the public/private paradigm (private version)
- name: API_KEY_SECRET_LOCATION
value: routing:private-api-key
- name: HOSTS_ANNOTATION
value: privateHosts
- name: PATHS_ANNOTATION
value: privatePaths
# Since we cannot have two containers listening on the same port, use a different port for the private router
- name: PORT
value: "81"
Based on this deployment, we have an ingress that serves publicHosts
and publicPaths
combinations and an internal
router that serves privateHosts
and privatePaths
combinations. With this being the case, let's take our example
Node.js application deployed above and lets deploy a variant over it that has both public and private paths:
apiVersion: v1
kind: ReplicationController
metadata:
name: nodejs-k8s-env
labels:
name: nodejs-k8s-env
spec:
replicas: 1
selector:
name: nodejs-k8s-env
template:
metadata:
labels:
name: nodejs-k8s-env
routable: "true"
annotations:
# Private routing information
privateHosts: "test.k8s.local"
privatePaths: "3000:/internal"
# Public routing information
publicHosts: "test.k8s.com"
publicPaths: "3000:/public"
spec:
containers:
- name: nodejs-k8s-env
image: thirtyx/nodejs-k8s-env
env:
- name: PORT
value: "3000"
ports:
- containerPort: 3000
Now if we were to curl http://test.k8s.com/nodejs
from outside of Kubernetes, assuming DNS was setup properly, the
ingress router would route properly but if we were to curl http://test.k8s.local/nodejs
, it wouldn't go anywhere. Not
only that, if I were to curl http://test.k8s.com/internal
, it also would not go anywhere. The only way to access the
/internal
path would be to be within Kubernetes, with DNS properly setup, and to
curl http://test.k8s.local/internal
.
Now I realize this is a somewhat convoluted example but the purpose was to show how we could use the same code base to serve different roles using configuration alone. Thet network isolation and security required to do this properly is outside the scope of this example.
If you want to run k8s-router
in mock mode, you can use go build
followed by ./k8s-router
. Running in mock mode
means that nginx
is not actually started and managed, nor is any actual routing performed. k8s-router
will use your
kubectl
configuration to identify the cluster connection details by using the current context kubectl
is configured
for. This is very useful in the event you want to connect to an external Kubernetes cluster and see the generated
routing configuration.
If you're building this to run on Kubernetes, you'll need to do the following:
CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -ldflags '-w' -o k8s-router .
docker build ...
docker tag ...
docker push ...
(The ...
are there because your Docker comands will likely be different than mine or someone else's) We have an
example DaemonSet for deploying the k8s-router as an ingress controller to Kubernetes located at
examples/ingress-daemonset.yaml
. Here is how I test locally:
CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -ldflags '-w' -o k8s-router .
docker build -t k8s-router .
docker tag -f k8s-router 192.168.64.1:5000/k8s-router
docker push 192.168.64.1:5000/k8s-router
kubectl create -f examples/ingress-daemonset.yaml
Note: This router is written to be ran within Kubernetes but for testing purposes, it can be ran outside of Kubernetes. When ran outside of Kubernetes, you will have have a kube config file. When ran outside the container, nginx itself will not be started and its configuration file will not be written to disk, only printed to stdout. This might change in the future but for now, this support is only as a convenience.
This project was largely based after the nginx-alpha
example in the
kubernetes/contrib repository.