Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to resolve request when attempting cross-cluster mTLS with a single trust_domain using spire #409

Closed
caleygoff-invitae opened this issue May 4, 2021 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@caleygoff-invitae
Copy link

caleygoff-invitae commented May 4, 2021

Describe the bug
This is similar to the request found in #407 only that this is for a single trust_domain using spire/spiffe. Is it possible to perhaps illustrate how to achieve this with Appmesh and SPIRE with a single trust_domain?

I have personally attempted using this with the colorapp tutorial using mtls and found the single cluster demo isn't functionally the same for users who desire use spire/spiffe with a single trust_domain across clusters. When attempting to connect to resources in the backend cluster I get the following error:

upstream reset: reset reason: connection failure, transport failure reason: TLS error: Secret is not supplied by SDS

When using spire and attempting to configure it in a single trust_domain, its recommended to deploy spire in a nested spire topology. In other words have a Single root-spire-server in which each new cluster that participates in the trust_domain, needs to have a single (root)-spire-agent that lives in the same pod as the otherwise normal spire-server that lives in that cluster.
This cluster specific spire-server would be called a nested-spire-server. More details on how this is configured can be found here and here.

Some helpful hints I used aws_iid for node attestation between theroot-spire-server and root-spire-agent. Additionally between the nested-spire-server and the nested-spire-agents there is k8s_psat attestation while using the k8s-workload-registrar in reconcile mode to register the workload entries without using bash. In my case, my root-spire-server is using an UpsteamAuthority of aws_pca and my nested-spire-servers are using an UpstreamAuthority of spire pointing back to the root-spire-server.

Platform
EKS

To Reproduce
Steps to reproduce the behavior:

  1. configure spire/spiffe to be in nested topology to serve up a single root-ca (I am using AWS PCA)
  2. setup colorapp demo using the mtls sds demo
    --make sure the frontend is one cluster and the backend is another cluster
  3. modify front and backend VirtualNodes in both clusters to be a single trust domain
  4. curl frontend with proper headers and observe the error in the envoy logs

below
trust_domain: non-prod.somecorp.io
frontend SVID: spiffe://non-prod.somecorp.io/cross-cluster-test/front
backend color SVID: spiffe://non-prod.somecorp.io/cross-cluster-test/blue
Errors:
curl frontend get 503

$ curl -H "color-header: blue" -vL front-color-cross-cluster-test.mesh.dev.some.corp.net
*   Trying XX.XX.XX.XX...
* TCP_NODELAY set
* Connected to front-color-cross-cluster-test.mesh.some.corp.net (XX.XX.XX.XX) port 80 (#0)
> GET / HTTP/1.1
> Host: front-color-cross-cluster-test.mesh.dev.some.corp.net
> User-Agent: curl/7.64.1
> Accept: */*
> color-header: blue
>
< HTTP/1.1 503 Service Unavailable
< Server: nginx/1.19.0
< Date: Mon, 03 May 2021 22:15:45 GMT
< Content-Type: text/html;charset=utf-8
< Content-Length: 484
< Connection: keep-alive
< x-envoy-upstream-service-time: 74
<
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
        "http://www.w3.org/TR/html4/strict.dtd">
<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html;charset=utf-8">
        <title>Error response</title>
    </head>
    <body>
        <h1>Error response</h1>
        <p>Error code: 503</p>
        <p>Message: Service Unavailable.</p>
        <p>Error code explanation: 503 - The server cannot process the request due to a high load.</p>
    </body>
</html>

frontend envoy reaching out to blue backend

[2021-05-03 22:15:45.757][98][debug][conn_handler] [source/server/connection_handler_impl.cc:476] [C2633] new connection
[2021-05-03 22:15:45.758][98][debug][http] [source/common/http/conn_manager_impl.cc:225] [C2633] new stream
[2021-05-03 22:15:45.758][98][debug][http] [source/common/http/conn_manager_impl.cc:837] [C2633][S13270034231475148542] request headers complete (end_stream=true):
':authority', 'color.cross-cluster-test.svc.cluster.local:8080'
':path', '/'
':method', 'GET'
'accept-encoding', 'identity'
'user-agent', 'Python-urllib/3.9'
'color-header', 'blue'
'connection', 'close'

[2021-05-03 22:15:45.758][98][debug][http] [source/common/http/filter_manager.cc:721] [C2633][S13270034231475148542] request end stream
[2021-05-03 22:15:45.758][98][debug][router] [source/common/router/router.cc:429] [C2633][S13270034231475148542] cluster 'cds_egress_somecorp-dev_blue_cross-cluster-test_http_8080' match for URL '/'
[2021-05-03 22:15:45.758][98][debug][router] [source/common/router/router.cc:586] [C2633][S13270034231475148542] router decoding headers:
':authority', 'color.cross-cluster-test.svc.cluster.local:8080'
':path', '/'
':method', 'GET'
':scheme', 'https'
'accept-encoding', 'identity'
'user-agent', 'Python-urllib/3.9'
'color-header', 'blue'
'x-forwarded-proto', 'http'
'x-request-id', '2e92a8d8-8165-90f5-9695-8c05668a2dbd'
'x-envoy-expected-rq-timeout-ms', '15000'
'x-b3-traceid', 'a8769b8252a815df'
'x-b3-spanid', 'a8769b8252a815df'
'x-b3-sampled', '1'

[2021-05-03 22:15:45.758][98][debug][pool] [source/common/http/conn_pool_base.cc:71] queueing stream due to no available connections
[2021-05-03 22:15:45.758][98][debug][pool] [source/common/conn_pool/conn_pool_base.cc:104] creating a new connection
[2021-05-03 22:15:45.758][98][debug][config] [source/extensions/transport_sockets/tls/ssl_socket.cc:348] Create NotReadySslSocket
[2021-05-03 22:15:45.758][98][debug][client] [source/common/http/codec_client.cc:39] [C2634] connecting
[2021-05-03 22:15:45.758][98][debug][connection] [source/common/network/connection_impl.cc:769] [C2634] connecting to XX.XX.XX.XX:8080
[2021-05-03 22:15:45.758][98][debug][connection] [source/common/network/connection_impl.cc:785] [C2634] connection in progress
[2021-05-03 22:15:45.758][98][debug][connection] [source/common/network/connection_impl.cc:625] [C2634] connected
[2021-05-03 22:15:45.758][98][debug][connection] [source/common/network/connection_impl.cc:203] [C2634] closing socket: 0
[2021-05-03 22:15:45.758][98][debug][client] [source/common/http/codec_client.cc:96] [C2634] disconnect. resetting 0 pending requests
[2021-05-03 22:15:45.758][98][debug][pool] [source/common/conn_pool/conn_pool_base.cc:314] [C2634] client disconnected, failure reason: TLS error: Secret is not supplied by SDS
[2021-05-03 22:15:45.758][98][debug][router] [source/common/router/router.cc:1031] [C2633][S13270034231475148542] upstream reset: reset reason: connection failure, transport failure reason: TLS error: Secret is not supplied by SDS
[2021-05-03 22:15:45.771][98][debug][router] [source/common/router/router.cc:1533] [C2633][S13270034231475148542] performing retry
[2021-05-03 22:15:45.771][98][debug][pool] [source/common/http/conn_pool_base.cc:71] queueing stream due to no available connections
[2021-05-03 22:15:45.771][98][debug][pool] [source/common/conn_pool/conn_pool_base.cc:104] creating a new connection
[2021-05-03 22:15:45.771][98][debug][config] [source/extensions/transport_sockets/tls/ssl_socket.cc:348] Create NotReadySslSocket
[2021-05-03 22:15:45.771][98][debug][client] [source/common/http/codec_client.cc:39] [C2635] connecting
[2021-05-03 22:15:45.772][98][debug][connection] [source/common/network/connection_impl.cc:769] [C2635] connecting to XX.XX.XX.XX:8080
[2021-05-03 22:15:45.772][98][debug][connection] [source/common/network/connection_impl.cc:785] [C2635] connection in progress
[2021-05-03 22:15:45.772][98][debug][connection] [source/common/network/connection_impl.cc:625] [C2635] connected
[2021-05-03 22:15:45.772][98][debug][connection] [source/common/network/connection_impl.cc:203] [C2635] closing socket: 0
[2021-05-03 22:15:45.772][98][debug][client] [source/common/http/codec_client.cc:96] [C2635] disconnect. resetting 0 pending requests
[2021-05-03 22:15:45.772][98][debug][pool] [source/common/conn_pool/conn_pool_base.cc:314] [C2635] client disconnected, failure reason: TLS error: Secret is not supplied by SDS
[2021-05-03 22:15:45.772][98][debug][router] [source/common/router/router.cc:1031] [C2633][S13270034231475148542] upstream reset: reset reason: connection failure, transport failure reason: TLS error: Secret is not supplied by SDS
[2021-05-03 22:15:45.818][98][debug][router] [source/common/router/router.cc:1533] [C2633][S13270034231475148542] performing retry
[2021-05-03 22:15:45.818][98][debug][pool] [source/common/http/conn_pool_base.cc:71] queueing stream due to no available connections
[2021-05-03 22:15:45.818][98][debug][pool] [source/common/conn_pool/conn_pool_base.cc:104] creating a new connection
[2021-05-03 22:15:45.818][98][debug][config] [source/extensions/transport_sockets/tls/ssl_socket.cc:348] Create NotReadySslSocket
[2021-05-03 22:15:45.818][98][debug][client] [source/common/http/codec_client.cc:39] [C2636] connecting
[2021-05-03 22:15:45.818][98][debug][connection] [source/common/network/connection_impl.cc:769] [C2636] connecting to XX.XX.XX.XX:8080
[2021-05-03 22:15:45.818][98][debug][connection] [source/common/network/connection_impl.cc:785] [C2636] connection in progress
[2021-05-03 22:15:45.819][98][debug][connection] [source/common/network/connection_impl.cc:625] [C2636] connected
[2021-05-03 22:15:45.819][98][debug][connection] [source/common/network/connection_impl.cc:203] [C2636] closing socket: 0
[2021-05-03 22:15:45.819][98][debug][client] [source/common/http/codec_client.cc:96] [C2636] disconnect. resetting 0 pending requests
[2021-05-03 22:15:45.819][98][debug][pool] [source/common/conn_pool/conn_pool_base.cc:314] [C2636] client disconnected, failure reason: TLS error: Secret is not supplied by SDS
[2021-05-03 22:15:45.819][98][debug][router] [source/common/router/router.cc:1031] [C2633][S13270034231475148542] upstream reset: reset reason: connection failure, transport failure reason: TLS error: Secret is not supplied by SDS
[2021-05-03 22:15:45.819][98][debug][http] [source/common/http/filter_manager.cc:805] [C2633][S13270034231475148542] Sending local reply with details upstream_reset_before_response_started{connection failure,TLS error: Secret is not supplied by SDS}
[2021-05-03 22:15:45.819][98][debug][http] [source/common/http/conn_manager_impl.cc:1380] [C2633][S13270034231475148542] closing connection due to connection close header
[2021-05-03 22:15:45.819][98][debug][http] [source/common/http/conn_manager_impl.cc:1435] [C2633][S13270034231475148542] encoding headers via codec (end_stream=false):
':status', '503'
'content-length', '159'
'content-type', 'text/plain'
'date', 'Mon, 03 May 2021 22:15:45 GMT'
'server', 'envoy'
'connection', 'close'

[2021-05-03 22:15:45.819][98][debug][connection] [source/common/network/connection_impl.cc:107] [C2633] closing data_to_write=313 type=2
[2021-05-03 22:15:45.819][98][debug][connection] [source/common/network/connection_impl_base.cc:40] [C2633] setting delayed close timer with timeout 1000 ms
[2021-05-03 22:15:45.819][98][debug][connection] [source/common/network/connection_impl.cc:655] [C2633] write flush complete
[2021-05-03 22:15:45.820][99][debug][router] [source/common/router/router.cc:1178] [C2631][S8642108434781201783] upstream headers complete: end_stream=false
[2021-05-03 22:15:45.820][99][debug][http] [source/common/http/conn_manager_impl.cc:1435] [C2631][S8642108434781201783] encoding headers via codec (end_stream=false):
':status', '503'
'server', 'envoy'
'date', 'Mon, 03 May 2021 22:15:45 GMT'
'content-type', 'text/html;charset=utf-8'
'content-length', '484'
'x-envoy-upstream-service-time', '74'

blue envoy container responding to request from front envoy container

[2021-05-03 22:15:40.282][38][debug][filter] [source/extensions/filters/listener/original_dst/original_dst.cc:18] original_dst: New connection accepted
[2021-05-03 22:15:40.282][38][debug][config] [source/extensions/transport_sockets/tls/ssl_socket.cc:389] Create NotReadySslSocket
[2021-05-03 22:15:40.282][38][debug][conn_handler] [source/server/connection_handler_impl.cc:476] [C612] new connection
[2021-05-03 22:15:40.282][38][debug][connection] [source/common/network/connection_impl.cc:203] [C612] closing socket: 0
[2021-05-03 22:15:40.282][38][debug][conn_handler] [source/server/connection_handler_impl.cc:152] [C612] adding to cleanup list
[2021-05-03 22:15:42.229][1][debug][main] [source/server/server.cc:190] flushing stats
[2021-05-03 22:15:43.296][38][debug][filter] [source/extensions/filters/listener/original_dst/original_dst.cc:18] original_dst: New connection accepted
[2021-05-03 22:15:43.296][38][debug][config] [source/extensions/transport_sockets/tls/ssl_socket.cc:389] Create NotReadySslSocket
[2021-05-03 22:15:43.296][38][debug][conn_handler] [source/server/connection_handler_impl.cc:476] [C613] new connection
[2021-05-03 22:15:43.296][38][debug][connection] [source/common/network/connection_impl.cc:203] [C613] closing socket: 0
[2021-05-03 22:15:43.297][38][debug][conn_handler] [source/server/connection_handler_impl.cc:152] [C613] adding to cleanup list
[2021-05-03 22:15:43.319][36][debug][filter] [source/extensions/filters/listener/original_dst/original_dst.cc:18] original_dst: New connection accepted
[2021-05-03 22:15:43.319][36][debug][config] [source/extensions/transport_sockets/tls/ssl_socket.cc:389] Create NotReadySslSocket
[2021-05-03 22:15:43.319][36][debug][conn_handler] [source/server/connection_handler_impl.cc:476] [C614] new connection
[2021-05-03 22:15:43.319][36][debug][connection] [source/common/network/connection_impl.cc:203] [C614] closing socket: 0
[2021-05-03 22:15:43.319][36][debug][conn_handler] [source/server/connection_handler_impl.cc:152] [C614] adding to cleanup list
[2021-05-03 22:15:43.331][38][debug][filter] [source/extensions/filters/listener/original_dst/original_dst.cc:18] original_dst: New connection accepted
[2021-05-03 22:15:43.332][38][debug][config] [source/extensions/transport_sockets/tls/ssl_socket.cc:389] Create NotReadySslSocket
[2021-05-03 22:15:43.332][38][debug][conn_handler] [source/server/connection_handler_impl.cc:476] [C615] new connection
[2021-05-03 22:15:43.332][38][debug][connection] [source/common/network/connection_impl.cc:203] [C615] closing socket: 0
[2021-05-03 22:15:43.332][38][debug][conn_handler] [source/server/connection_handler_impl.cc:152] [C615] adding to cleanup list
[2021-05-03 22:15:43.347][38][debug][filter] [source/extensions/filters/listener/original_dst/original_dst.cc:18] original_dst: New connection accepted
[2021-05-03 22:15:43.347][38][debug][config] [source/extensions/transport_sockets/tls/ssl_socket.cc:389] Create NotReadySslSocket
[2021-05-03 22:15:43.347][38][debug][conn_handler] [source/server/connection_handler_impl.cc:476] [C616] new connection
[2021-05-03 22:15:43.347][38][debug][connection] [source/common/network/connection_impl.cc:203] [C616] closing socket: 0
[2021-05-03 22:15:43.347][38][debug][conn_handler] [source/server/connection_handler_impl.cc:152] [C616] adding to cleanup list
[2021-05-03 22:15:43.363][35][debug][filter] [source/extensions/filters/listener/original_dst/original_dst.cc:18] original_dst: New connection accepted
[2021-05-03 22:15:43.363][35][debug][config] [source/extensions/transport_sockets/tls/ssl_socket.cc:389] Create NotReadySslSocket
[2021-05-03 22:15:43.363][35][debug][conn_handler] [source/server/connection_handler_impl.cc:476] [C617] new connection
[2021-05-03 22:15:43.363][35][debug][connection] [source/common/network/connection_impl.cc:203] [C617] closing socket: 0
[2021-05-03 22:15:43.363][35][debug][conn_handler] [source/server/connection_handler_impl.cc:152] [C617] adding to cleanup list
[2021-05-03 22:15:43.395][38][debug][filter] [source/extensions/filters/listener/original_dst/original_dst.cc:18] original_dst: New connection accepted
[2021-05-03 22:15:43.395][38][debug][config] [source/extensions/transport_sockets/tls/ssl_socket.cc:389] Create NotReadySslSocket
[2021-05-03 22:15:43.395][38][debug][conn_handler] [source/server/connection_handler_impl.cc:476] [C618] new connection
[2021-05-03 22:15:43.395][38][debug][connection] [source/common/network/connection_impl.cc:203] [C618] closing socket: 0
[2021-05-03 22:15:43.395][38][debug][conn_handler] [source/server/connection_handler_impl.cc:152] [C618] adding to cleanup list
[2021-05-03 22:15:43.409][38][debug][filter] [source/extensions/filters/listener/original_dst/original_dst.cc:18] original_dst: New connection accepted
[2021-05-03 22:15:43.409][38][debug][config] [source/extensions/transport_sockets/tls/ssl_socket.cc:389] Create NotReadySslSocket
[2021-05-03 22:15:43.409][38][debug][conn_handler] [source/server/connection_handler_impl.cc:476] [C619] new connection
[2021-05-03 22:15:43.409][38][debug][connection] [source/common/network/connection_impl.cc:203] [C619] closing socket: 0
[2021-05-03 22:15:43.409][38][debug][conn_handler] [source/server/connection_handler_impl.cc:152] [C619] adding to cleanup list
[2021-05-03 22:15:43.432][38][debug][filter] [source/extensions/filters/listener/original_dst/original_dst.cc:18] original_dst: New connection accepted
[2021-05-03 22:15:43.432][38][debug][config] [source/extensions/transport_sockets/tls/ssl_socket.cc:389] Create NotReadySslSocket
[2021-05-03 22:15:43.432][38][debug][conn_handler] [source/server/connection_handler_impl.cc:476] [C620] new connection
[2021-05-03 22:15:43.432][38][debug][connection] [source/common/network/connection_impl.cc:203] [C620] closing socket: 0

showing entries on the nested-spire-server of registered workload for current nested frontend cluster

/opt/spire/bin # ./spire-server entry show -registrationUDSPath /tmp/spire/spire-registration.sock
Found 12 entries
Entry ID         : 6e75166d-ddde-4f1c-b12e-244ce208d0b6
SPIFFE ID        : spiffe://non-prod.somecorp.io/cross-cluster-test/front
Parent ID        : spiffe://non-prod.somecorp.io/spire-k8s-registrar/demo1.dev.some.corp.net/node/ip-XX-XX-XX-XX.ec2.internal
Revision         : 0
TTL              : default
Selector         : k8s:ns:cross-cluster-test
Selector         : k8s:pod-name:front-67c778c749-jhh8w

showing entries on the nested-spire-server of registered workload for current nested backend cluster

/opt/spire/bin # ./spire-server entry show -registrationUDSPath /tmp/spire/spire-registration.sock
Found 9 entries
Entry ID         : 2d54e0ea-e891-47c7-89d8-6837906e4d6f
SPIFFE ID        : spiffe://non-prod.somecorp.io/cross-cluster-test/blue
Parent ID        : spiffe://non-prod.somecorp.io/spire-k8s-registrar/demo2.dev.some.corp.net/node/ip-XX-XX-XX-XX.ec2.internal
Revision         : 0
TTL              : default
Selector         : k8s:ns:cross-cluster-test
Selector         : k8s:pod-name:blue-545564b695-l29t9

Expected behavior
I would expect that even with the individual clusters if the spire server local to the workload is able to be issued a SVID based on the same root ca envoy should allow the request to complete. Instead I get TLS error: Secret is not supplied by SDS from the sds service via envoy.

Config files, and API responses
frontend VirtualNode

apiVersion: appmesh.k8s.aws/v1beta2
kind: VirtualNode
metadata:
  name: front
  namespace: cross-cluster-test
spec:
  podSelector:
    matchLabels:
      app: front
  listeners:
    - portMapping:
        port: 8080
        protocol: http
      healthCheck:
        protocol: http
        path: '/ping'
        healthyThreshold: 2
        unhealthyThreshold: 2
        timeoutMillis: 2000
        intervalMillis: 5000
  backends:
    - virtualService:
        virtualServiceARN: arn:aws:appmesh:us-east-1:XXXXXXXXXXX:mesh/somecorp-dev/virtualService/color.cross-cluster-test.svc.cluster.local
  backendDefaults:
    clientPolicy:
      tls:
        mode: STRICT
        certificate:
          sds:
            secretName: spiffe://non-prod.somecorp.io/cross-cluster-test/front
        validation:
          trust:
            sds:
              secretName: spiffe://non-prod.somecorp.io
          subjectAlternativeNames:
            match:
              exact:
                - spiffe://non-prod.somecorp.io/cross-cluster-test/blue
  serviceDiscovery:
    awsCloudMap:
      namespaceName: mesh.dev.some.corp.net
      serviceName: front

blue Virtualnode backend

apiVersion: appmesh.k8s.aws/v1beta2
kind: VirtualNode
metadata:
  name: blue
  namespace: cross-cluster-test
spec:
  podSelector:
    matchLabels:
      app: color
      version: blue
  listeners:
    - portMapping:
        port: 8080
        protocol: http
      healthCheck:
        protocol: http
        path: '/ping'
        healthyThreshold: 2
        unhealthyThreshold: 2
        timeoutMillis: 2000
        intervalMillis: 5000
      tls:
        mode: STRICT
        certificate:
          sds:
            secretName: spiffe://non-prod.somecorp.io/cross-cluster-test/blue
        validation:
          trust:
            sds:
              secretName: spiffe://non-prod.somecorp.io
          subjectAlternativeNames:
            match:
              exact:
                - spiffe://non-prod.somecorp.io/cross-cluster-test/front
  serviceDiscovery:
    awsCloudMap:
      attributes:
        - key: color
          value: blue
      namespaceName: mesh.dev.some.corp.net
      serviceName: color-blue

frontend envoy config_dump snippets

  {
   "@type": "type.googleapis.com/envoy.admin.v3.SecretsConfigDump",
   "dynamic_active_secrets": [
    {
     "name": "spiffe://non-prod.somecorp.io",
     "version_info": "1",
     "last_updated": "2021-05-03T20:47:42.780Z",
     "secret": {
      "@type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret",
      "name": "spiffe://non-prod.somecorp.io",
      "validation_context": {
       "trusted_ca": {
        "inline_bytes": "<...>=="
       }
      }
     }
    }
   ],
   "dynamic_warming_secrets": [
    {
     "name": "spiffe://non-prod.somecorp.io/cross-cluster-test/front",
     "version_info": "uninitialized",
     "last_updated": "2021-05-03T20:47:36.491Z",
     "secret": {
      "@type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret",
      "name": "spiffe://non-prod.somecorp.io/cross-cluster-test/front"
     }
    }
   ]
  }

blue backend envoy config_dump snippets

{
   "@type": "type.googleapis.com/envoy.admin.v3.SecretsConfigDump",
   "dynamic_active_secrets": [
    {
     "name": "spiffe://non-prod.somecorp.io",
     "version_info": "2",
     "last_updated": "2021-05-03T22:02:14.657Z",
     "secret": {
      "@type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret",
      "name": "spiffe://non-prod.somecorp.io",
      "validation_context": {
       "trusted_ca": {
        "inline_bytes": "<...>=="
       }
      }
     }
    }
   ],
   "dynamic_warming_secrets": [
    {
     "name": "spiffe://non-prod.somecorp.io/cross-cluster-test/blue",
     "version_info": "uninitialized",
     "last_updated": "2021-05-03T20:46:26.810Z",
     "secret": {
      "@type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret",
      "name": "spiffe://non-prod.somecorp.io/cross-cluster-test/blue"
     }
    }
   ]
  }

Additional Context
If I remove the spiffe uris off the SANs on the listeners for both the frontend and backend VirtualNodes, I can get the dynamic_warming_secrets uninitialized thing to go away, but I end up with OPENSSL errors. eg:
frontend:

          subjectAlternativeNames:
            match:
              exact:
                - front-color-cross-cluster-test.mesh.dev.some.corp.net
                - color-blue.mesh.dev.some.corp.net
#                - spiffe://non-prod.somecorp.io/cross-cluster-test/blue
#                - spiffe://non-prod.somecorp.io/cross-cluster-test/red
#                - spiffe://non-prod.somecorp.io/cross-cluster-test/green

backend

          subjectAlternativeNames:
            match:
              exact:
                - color-blue.mesh.dev.some.corp.net
                - front-color-cross-cluster-test.mesh.dev.some.corp.net
#                - spiffe://non-prod.somecorp.io/cross-cluster-test/front

frontend OPENSSL errors

[2021-05-04 15:22:31.280][91][debug][pool] [source/common/conn_pool/conn_pool_base.cc:314] [C45] client disconnected, failure reason: TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED
[2021-05-04 15:22:31.280][91][debug][router] [source/common/router/router.cc:1031] [C43][S4995513198116003529] upstream reset: reset reason: connection failure, transport failure reason: TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED
[2021-05-04 15:22:31.286][91][debug][router] [source/common/router/router.cc:1533] [C43][S4995513198116003529] performing retry
[2021-05-04 15:22:31.286][91][debug][pool] [source/common/http/conn_pool_base.cc:71] queueing stream due to no available connections

backend OPENSSL errors

[2021-05-04 15:22:31.278][29][debug][conn_handler] [source/server/connection_handler_impl.cc:476] [C29] new connection
[2021-05-04 15:22:31.280][29][debug][connection] [source/extensions/transport_sockets/tls/ssl_socket.cc:215] [C29] TLS error: 268436502:SSL routines:OPENSSL_internal:SSLV3_ALERT_CERTIFICATE_UNKNOWN
@caleygoff-invitae caleygoff-invitae added the bug Something isn't working label May 4, 2021
@caleygoff-invitae
Copy link
Author

caleygoff-invitae commented May 4, 2021

😅 I do not know what I did but I got the above working but I guess its still not clear from an example walkthrough standpoint to arrive where I got by failing forward 😛

So on my frontend VirtualNode I needed to have the correct spiffe:// uri in the subjectAlternativeNames field.

My backend entries have SVIDS registered as

spiffe://non-prod.somecorp.io/cross-cluster-test/red
spiffe://non-prod.somecorp.io/cross-cluster-test/green
spiffe://non-prod.somecorp.io/cross-cluster-test/blue

the working frontend VirtualNode needed to have the following on the subjectAlternativeNames field to make it work:

subjectAlternativeNames:
  match:
    exact:
      - spiffe://non-prod.somecorp.io/cross-cluster-test/blue
      - spiffe://non-prod.somecorp.io/cross-cluster-test/red
      - spiffe://non-prod.somecorp.io/cross-cluster-test/green

It appears that when I created this ticket I was missing the above. I did not need afterall the FDQNs like below, just spiffe:// uris

non-working frontend VirtualNode:

          subjectAlternativeNames:
            match:
              exact:
                - front-color-cross-cluster-test.mesh.dev.some.corp.net
                - color-blue.mesh.dev.some.corp.net
#                - spiffe://non-prod.somecorp.io/cross-cluster-test/blue
#                - spiffe://non-prod.somecorp.io/cross-cluster-test/red
#                - spiffe://non-prod.somecorp.io/cross-cluster-test/green

Last note: on my individual backend VirtualNodes I did not need to include anything in the subjectAlternativeNames fields. I left this entire field off on each of the backend VirtualNodes.

@achevuru
Copy link
Contributor

@caleygoff-invitae Closing this issue. It appears that you were able to get past the issue. Feel free to reopen it if you're still running in to any issues with mTLS+SPIRE walkthrough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants