- Fix bug in tunneler source transparency when using UDP
- Added guidance under /quickstart for quickly launching a simplified, local environment suitable for local dev testing and learning
- Removed xtv framework from fabric and moved edge terminator identity validation to control channel handler. The
terminator:
section may be removed from the controller configuration file. - Listen for SIGINT for router shutdown
- Implement dial and listen identity options in go tunneler
- Edge REST API Deprecation Warnings
- Posture Check Process Multi
Upcoming changes will remove support for non-prefixed Edge REST API URLs. The correct API URL prefix has been edge/v1
for over a year and not using will become unsupported at a future date. Additionally, the Edge REST API will be splitting
into two separate APIs in the coming months:
/edge/management/v1
/edge/client/v1
These new prefixes are not currently live and will be released in a subsequent version.
A new posture check type has been introduced: PROCESS_MULTI
. This posture check
is meant to replace the posture check type PROCESS
and PROCESS
should be considered
deprecated. PROCESS_MULTI
covers all the uses that its predecessor provided with
additional semantic configuration options.
- semantic: Either
AllOf
orOneOf
. Determines which processes specified inprocesses
must pass - processes: An array of objects representing a process. Similar to
PROCESS
's fields but with the ability to specify multiple binary hashes- osType - Any of the standard posture check OS types (Android, iOS, macOS, Linux, Windows, WindowsServer)
- path - The absolute file path the process is expected to run from
- hashes - An array of sha512 hashes that are valid (optional, none allows any)
- signerFingerprints - An array of sha1 signer fingerprints that are valid (optional, none allows any)
- Revert dial error messages to what sdks are expecting. Add error codes so future sdks don't have to parse error text
- Add router events
- Allow filters with no predicate if sort or paging clauses are provided
- Ex: instead of
true limit 5
you could have justlimit 5
. Or instead oftrue sort by name
you could havesort by name
- Ex: instead of
- Corrected host.v1 configuration type schema to prevent empty port range objects
To enable:
events:
jsonLogger:
subscriptions:
- type: fabric.routers
Example JSON output:
{
"namespace": "fabric.routers",
"event_type": "router-online",
"timestamp": "2021-04-22T11:26:31.99299884-04:00",
"router_id": "JAoyjafljO",
"router_online": true
}
{
"namespace": "fabric.routers",
"event_type": "router-offline",
"timestamp": "2021-04-22T11:26:41.335114358-04:00",
"router_id": "JAoyjafljO",
"router_online": false
}
-
Add workaround for bbolt bug which caused some data to get left behind when deleting identities, seen when turning off tunneler capability on edge routers
-
Remove deprecated ziti-enroller command. Enrollement can be done using the ziti-tunnel, ziti-router and ziti commands
-
Fix UDP intercept handling
-
The host.v1 service configuration type has been changed as follows:
- Rename
dialIntercepted*
properties toforwardProtocol
,forwardAddress
,forwardPort
for better consistency with non-tunneler client applications. - Add
allowedProtocols
,allowedAddresses
, andallowedPortRanges
properties to whitelist destinations that are dialed viaforward*
. Theallowed*
properties are required for any correspondingforward*
property that istrue
. - Add
allowedSourceAddresses
, which serves as a whitelist for source IPs/CIDRs and informs the hosting tunneler of the local routes to establish when hosting a service.
- Rename
-
Ziti Controller will now report service posture query policy types (Dial/Bind)
-
Ziti Controller now supports enrollment extension for routers
-
Ziti Router now support forcing enrollment extension via
run -e
-
Ziti Routers will now automatically extend their enrollment before their certificates expire
-
ziti edge enroll
with a UPDB JWT now confirms and properly sets the password suppliedCaveats:
- Any existing host.v1 configurations that use will become invalid.
- ziti-tunnel and the converged router/tunneler creates local routes that are established for
allowedSourceAddresses
, but the routes are not consistently cleaned up whenziti-tunnel
exits. This issue will be addressed in a future release.
- Fixed issue where edge router renames didn't propagate to fabric
- Fixed issue where gateway couldn't dial after router rename
- Allowed parsing identities where values were not URIs
- Allow updating which configs are used by services
- Converged Tunneler/Router (Beta 2)
- intercept.v1 support (also in ziti-tunnel)
- support for setting per-service hosting cost/precedence on identity
- Identities and edge routers now support appData, which is tag data consumable by sdks
- Edge Router Policies now expose the
isSystem
flag for system managed policies - ziti-tunnel no longer supports tun mode. It has been superseded by tproxy mode
- Fix deadlock in ziti-router which would stop new connections from being established after an api session is removed
- Fix id extraction for data plane link latency metrics
- Fix id extraction for ctrl plane link latency metrics
- ziti-tunnel wasn't asking for host.v1/host.v2 configs
Previously support was adding for setting the default cost and precedence that an identity would use when hosting services. Now, in addition to setting default values, costs and precedences can be set per-service. These are exposed as maps keyed by service. There is one map for costs and another for precedences. There is also CLI support for setting these values.
Example creating an identity:
ziti edge create identity service test2 --default-hosting-cost 10 --default-hosting-precedence failed --service-costs loop=20,echo=30 --service-precedences loop=default,echo=required
When viewing the identity, these values can be seen:
"id": "-qJRZFqV8t",
"tags": {},
"updatedAt": "2021-04-05T16:54:42.763Z",
"authenticators": {},
"defaultHostingCost": 10,
"defaultHostingPrecedence": "failed",
"envInfo": {},
"hasApiSession": false,
"hasEdgeRouterConnection": false,
"isAdmin": false,
"isDefaultAdmin": false,
"isMfaEnabled": false,
"name": "test2",
"roleAttributes": null,
"sdkInfo": {},
"serviceHostingCosts": {
"vH3QndzRYt": 20,
"wAnuyO3PmI": 30
},
"serviceHostingPrecedences": {
"vH3QndzRYt": "default",
"wAnuyO3PmI": "required"
},
Note that this mechanism replaces setting cost and precedence via the host.v1/host.v2 config types listenOptions
. Those values will be ignored, and may in future be removed from the schemas.
We have an existing tags mechanism, which can be used by system administrators to annotate ziti entities in whatever is useful to them. Tags are an administrator function and are not meant to be visible to SDKs and SDK applications. If an administrator wants to provide custom data for services to the SDK they can use config types and configs for that purpose. Up until now however, there hasn't been a means to annotate identities and edge routers, which are the other two entities visible to SDKs, with data that the SDKs can consume.
0.19.9 introduces appData
on identities and edge-routers. appData
has the same structure as tags
, but is intended to allow administrators to push custom data on identities and edge routers to SDKs. An example use is for tunnelers. The sourceIp
can contain template information, which can refer back to the appData
for the tunneler's identity.
The CLI supports setting appData on identities and edge routers.
ziti edge create identity service myIdentity --tags office=Regional5,device=Laptop --app-data ip=1.1.1.1,QoS=voip
- Transwarp beta_1
- Converged Tunneler/Router (Beta)
- Updates to the host.v1 config type
- New host.v2 config type
- New health events
- Service Policies with no Posture Checks properly grant access even if an identity has another Service Policies with posture checks
- Removing entities with no referencing entities via attributes no longer panics
- API Sessions and Current Api Session now share the same modeling logic
- API Sessions now have lastActivityAt and cachedLastActivityAt
- Enrolling and unenrolling in MFA sets MFA posture data
See the Transwarp beta_1 guide here.
ziti-router can now run with the tunneler embedded. It has the same capabilities as ziti-tunnel. As ziti-tunnel gains new features, the combined ziti-router/tunnel should maintain feature parity.
This is a beta release. It should be relatively feature complete and bug-free for hosting services but intercept support is still nascent. Some features are likely to be changed based on user feedback.
Like ziti-tunnel, tproxy
mode will only work on linux. proxy
and host
modes should work on all operating systems.
The converged tunneler/router does not support running in tun mode. This mode may be deprecated for ziti-tunnel as well at some point in the future if no advantages over tproxy mode are evident.
The following configurations are supported:
ziti-tunneler-client.v1
ziti-tunneler-server.v1
host.v1
(updated)host.v2
(new)
Support for intercept.v1
will be added in a follow-up release.
When an edge router is marked as being tunneler enabled, a matching identity will be created, of type Router, as well as an edge router policy. The edge router policy ensures that the identity always has access to the edge router. The identity allows the router to be included in service policies, to configure which services will intercepted/hosted.
- The identity will have the same id and name as the edge router
- If an identity with the same name as the router already exists, the router create/update will fail
- When the router name is changed, the identity name will be updated as well.
- The identity name and type cannot be changed directly. The type may not be changed at all and the name may only be changed by changing the name of the router.
- The identity may not be deleted except by deleting the router or disabling tunneler support for the identity.
- When the router is deleted, the accompanying identity and edge router policy will also be deleted
- If tunneler support is disabled in the router, the accompanying identity and edge router policy will also be deleted.
- The edge router policy will have the same id as the router and have a name of the form
edge-router-<edge-router-id>-system
, where<edge-router-id>
is replaced by the id of the edge router. - The edge router policy is considered a
system
entity, and cannot be updated and cannot deleted except by the system when the associated router is deleted.
In order for a router instance to host a tunneler, it must meet the following criteria:
- It must be represented in the model by an edge router
- The edge router field
isTunnelerEnabled
must be set to true - Edge functionality in the router must be enabled, which means the
edge:
config section must be present. NOTE: The edge listener does not need to be enabled. - The tunnel listener must be enabled.
The ziti CLI can be used to enable/disable tunneler support on edge routers. When creating an edge router, the -t
flag can be passed in to enable running the tunneler.
ziti edge create edge-router myEdgeRouter --tunneler-enabled
or
ziti edge create edge-router myEdgeRouter -t
An existing edge router can be marked as tunneler enabled as follows:
ziti edge update edge-router myEdgeRouter --tunneler-enabled
or
ziti edge update edge-router myEdgeRouter -t
An existing edge router can be marked as not supporting the tunneler as follows:
ziti edge update edge-router myEdgeRouter -t=false
or
ziti edge update edge-router myEdgeRouter --tunneler-enabled=false
listeners:
- binding: tunnel
options:
mode: tproxy # mode to run in. Valid values [tproxy, host, proxy]. Default: tproxy
svcPollRate: 15s # How often to poll for service changes. Default: 15s
resolver: udp://127.0.0.1:53 # DNS resolve. Default: udp://127.0.0.1:53 for tproxy, blank for others
dnsSvcIpRange: 100.64.0.1/10 # cidr to use when assigning IPs to unresolvable intercept hostnames (default "100.64.0.1/10")
services: # services to intercept in proxy mode. Default: none
- echo:1977
lanIf: tun1 # if specified, INPUT rules for intercepted service addresses are assigned to this interface. Defaults to unspecified.
Health checks now have a new trigger option change
. This will trigger the action whenever the health state changes from pass to fail or fail to pass.
There's now a new action, send event
, which will send a health event to the controller. This will be reported via the recently introduced service events.
The change
option is designed to be sued with send event
as a way to send events only when the state changes, rather than sending every single pass/fail.
An example configuration:
{
"address": "localhost",
"port": 8171,
"protocol": "tcp",
"portChecks": [
{
"address": "localhost:8171",
"interval": "5s",
"timeout": "100ms",
"actions": [
{
"trigger": "change",
"action": "send event"
}
]
}
]
}
What the service events look like:
{
"namespace": "service.events",
"event_type": "service.health_check.failed",
"service_id": "vH3QndzRYt",
"count": 2,
"interval_start_utc": 1616703360,
"interval_length": 60
}
{
"namespace": "service.events",
"event_type": "service.health_check.failed",
"service_id": "vH3QndzRYt",
"count": 1,
"interval_start_utc": 1616703720,
"interval_length": 60
}
{
"namespace": "service.events",
"event_type": "service.health_check.passed",
"service_id": "vH3QndzRYt",
"count": 2,
"interval_start_utc": 1616703720,
"interval_length": 60
}
The host.v1 config type now support health checks. The configuration is the same for ziti-tunneler-server-v1
.
See here: https://github.com/openziti/edge/blob/v0.19.54/tunnel/entities/host.v1.json for the full schema
There is a new host configuration, host.v2
, which should supersede host.v1
. The primary difference from host.v1
is that it supports multiple terminators. This means that when a service is hosted by a router, it can connect to multiple service instances if the service is horizontally scaled, or in a primary/failover setup.
Each terminator definition is the same as a host.v1
configuration, which one exception: listOptions.connectTimeoutSeconds
is now listenOptions.connectTimeout
and is specified as a duration (5s
, 2500ms
, etc).
Example:
{
"terminators": [
{
"address": "localhost",
"port": 8171,
"protocol": "tcp",
"listenOptions": {
"cost": 50,
"precedence": "required"
},
"portChecks": [
{
"address": "localhost:8171",
"interval": "5s",
"timeout": "100ms",
"actions": [
{
"trigger": "change",
"action": "send event"
},
{
"trigger": "fail",
"action": "increase cost 25"
},
{
"trigger": "pass",
"action": "decrease cost 10"
}
]
}
]
},
{
"address": "localhost",
"port": 8172,
"protocol": "tcp",
"listenOptions": {
"cost": 51,
"precedence": "required"
},
"portChecks": [
{
"address": "localhost:8172",
"interval": "5s",
"timeout": "100ms",
"actions": [
{
"trigger": "change",
"action": "send event"
},
{
"trigger": "fail",
"action": "increase cost 25"
},
{
"trigger": "pass",
"action": "decrease cost 10"
}
]
}
]
}
]
}
- Update to Golang 1.16
- Idle route garbage collection: orphaned routing table entries will be garbage collected. New infrastructure for session confirmation facilitating additional types of garbage collection
- Configurable Xgress dial "dwell time"
- Database tracing support
- Immediately close router ctrl channel connection if fingerprint validation fails
- Immediately close router ctrl channel if no version info is provided
- Ziti Controller Remove All Ziti Controller Remove API Sessions and Edge Sessions API Sessions and Edge Sessions
- Fixed posture check error responses to include only failed checks
- Heartbeat Collection And Batching
- Add Service Request Failures for Posture Checks
- Remove database migration code for versions older than 0.17.1
The following router configuration stanza controls idle route garbage collection:
forwarder:
#
# After how many milliseconds of inactivity is a forwarding table entry considered idle?
#
idleSessionTimeout: 60000
#
# How frequently will we confirm idle sessions with the controller?
#
idleTxInterval: 60000
The following router configuration stanza controls Xgress dial "dwell time". You probably don't want to use this unless you're debugging a timing-related issue in the overlay:
forwarder:
#
# (Debugging) Xgress dial "dwell time". When dialing, the Xgress framework will wait this number of milliseconds
# before responding in the affirmative to the controller.
#
xgressDialDwellTime: 0
Enable database tracing using the dbTrace
controller configuration directive:
dbTrace: true
This will result in log output that describes the entrance into and exit from transactional functions operating against the underlying database:
[ 0.003] INFO fabric/controller/db.traceUpdateEnter: Enter Update (tx:18) [github.com/openziti/fabric/controller/db.createRoots]
[ 0.003] INFO fabric/controller/db.traceUpdateExit: Exit Update (tx:18) [github.com/openziti/fabric/controller/db.createRoots]
[ 0.006] INFO fabric/controller/db.traceUpdateEnter: Enter Update (tx:19) [github.com/openziti/foundation/storage/boltz.(*migrationManager).Migrate.func1]
[ 0.006] INFO foundation/storage/boltz.(*migrationManager).Migrate.func1: fabric datastore is up to date at version 4
[ 0.006] INFO fabric/controller/db.traceUpdateExit: Exit Update (tx:19) [github.com/openziti/foundation/storage/boltz.(*migrationManager).Migrate.func1]
Ziti Controller Remove All Ziti Controller Remove API Sessions and Edge Sessions API Sessions and Edge Sessions
With the Ziti Controller shutdown, it is now possible to clear out all API Sessions and Edge Sessions that were persisted prior to the controller being stopped. All connecting identities will need to re-authenticate when the controller is restarted.
This command is useful in situations where the number of sessions is large and the database is being copied and/or used for debugging.
Example Command:
ziti-controller delete-sessions
Example Output:
[ 0.017] INFO ziti/ziti-controller/subcmd.deleteSessions: {go-version=[go1.16] revision=[local] build-date=[2020-01-01 01:01:01] os=[windows] arch=[amd64] version=[v0.0.0]} removing API Sessions and Edge Sessions from ziti-controller
[ 9.469] INFO ziti/ziti-controller/subcmd.deleteSessions.func2: existing api Sessions: 2785
[ 18.274] INFO ziti/ziti-controller/subcmd.deleteSessions.func2: edge sessions bucket does not exist, skipping, count is: 4035
[ 47.104] INFO ziti/ziti-controller/subcmd.deleteSessions.func3: done removing api Sessions
[ 55.866] INFO ziti/ziti-controller/subcmd.deleteSessions.func3: done removing api session indexes
[ 58.325] INFO ziti/ziti-controller/subcmd.deleteSessions.func3: done removing edge session indexes
In previous versions heartbeats from REST API usage and discrete Edge Router connection would all cause writes for the same API Session as they were encountered. In situations where one or more REST API requests were issues and/or one or more Edge Router connections were held by a ZitI Application, multiple simultaneous heartbeats could occur for no apparent benefit and consume disk write I/O.
Heartbeats are now aggregated over a window of time in a cache and written to disk on an interval. The write interval defaults to 90s and the batch size (for write transactions) to 250. Additionally, all heartbeats are flush to disk when the controller is properly shut down.
These settings can be defined in the edge.api
section for the Ziti Controller configuration.
Example:
edge:
api:
...
#(optional, default 90s) Alters how frequently heartbeat and last activity values are persisted
activityUpdateInterval: 90s
#(optional, default 250) The number of API Sessions updated for last activity per transaction
activityUpdateBatchSize: 250
...
When a Ziti Identity (endpoint) requests a service that is provided via a Service Policy with
Posture Checks associated with it, failure to meet the Posture Check requirements results in an
error message with a code of INVALID_POSTURE
and data elaborating on what Service Policies ids and
Posture Check ids failed. Additionally, now the controller will log the most recent requests and
provide detailed error output for administrators.
The output will show every Service Policy that was checked for access and every Posture Check that failed. The failed Posture Checks include the Posture Data that was available for the identity and the requirements defined in the Posture Check at the time of the Edge Session request.
New Endpoint: GET /identities/{id}/failed-service-requests
Example Output:
{
"data": [
{
"apiSessionId": "ckmgcom680002cc61hggmcy1n",
"policyFailures": [
{
"policyId": "eqggCQlBm",
"policyName": "alldial",
"checks": [
{
"actualValue": {
"hash": "123",
"isRunning": true,
"signerFingerprints": [
"834f29a60152ce36eb54af37ca5f8ec029eccf01",
"123248b9e8b0dd41938018a871a13dd92bed4456"
]
},
"expectedValue": {
"hashes": [
"3af35956a71c2afefbfe356f86c9139725eeecb15f0de7d98557d4d696c434f51fbc2fa5f7543aef4f5f1afb83caa4a43619973bae52e1f4f92ec10c39b039e8"
],
"osType": "Windows",
"path": "C:\\Program Files\\TestApp\\test.exe",
"signerFingerprint": "834f29a60152ce36eb54af37ca5f8ec029eccf01"
},
"postureCheckId": "UF9aOqlD3",
"postureCheckName": "processCheck",
"postureCheckType": "PROCESS"
},
{
"actualValue": {
"type": "Windows",
"version": "6.0.18364"
},
"expectedValue": [
{
"type": "Windows",
"versions": [
">=10.0.18364 <=10.0.19041"
]
}
],
"postureCheckId": "fK0aOQFD3",
"postureCheckName": "osCheck",
"postureCheckType": "OS"
},
{
"actualValue": "wrong.com",
"expectedValue": [
"right.com"
],
"postureCheckId": "i2wgOQlBm",
"postureCheckName": "domainCheck",
"postureCheckType": "DOMAIN"
}
]
}
],
"serviceId": "ll-aOqFDm",
"serviceName": "test-service",
"sessionType": "Dial",
"when": "2021-03-19T09:41:10.117-04:00"
}
],
"meta": {}
}
- Service Event Counters
- New log message which shows local (router side) address, including port, when router dials are successful. This allows correlating server side access logs with ziti logs
There are new events emitted for service, with statistics for how many dials are successful, failed, timed out, or fail for non-dial related reasons. Stats are aggregated per-minute, similar to usage statistics. They can be enabled in the controller config as follows:
events:
jsonLogger:
subscriptions:
- type: services
handler:
type: file
format: json
path: /tmp/ziti-events.log
Example of the events JSON output:
{
"namespace": "service.events",
"event_type": "service.dial.fail",
"service_id": "HSfgHzIom",
"count": 8,
"interval_start_utc": 1615400160,
"interval_length": 60
}
{
"namespace": "service.events",
"event_type": "service.dial.success",
"service_id": "HSfgHzIom",
"count": 29,
"interval_start_utc": 1615400160,
"interval_length": 60
}
Event types:
service.dial.success
service.dial.fail
service.dial.timeout
service.dial.error_other
- fabric#206 Fix controller deadlock which can happen when links go down
- Use AtomicBitSet for xgress flags. Minimize memory use and contention
- edge router status wasn't getting set online on connect
- ziti-tunnel proxy wasn't working for services without a client config
- Add queue for metrics messages. Add config setting for metrics report interval and message queue
size
- metrics.reportInterval - how often to report metrics to controller. Default:
15s
- metrics.messageQueueSize - how many metrics message to allow to queue before blocking. Default: 10
- metrics.reportInterval - how often to report metrics to controller. Default:
Example stanza from router config file:
metrics:
reportInterval: 15s
messageQueueSize: 10
- Link latency probe timeout parameter in router configuration.
- Support configurable timeout on Xgress dial operations. Router terminated services can now specify a short timeout for dial operations.
- Fix 0.19 regression: updating terminator cost/precedence from the sdk was broken
- Fix 0.19 regression: fabric session client id was incorrectly set to edge session token instead of id
- Fix MFA secret length, lowered from 80 bytes to 80 bits
- Ensure that negative lengths in message headers are properly handled
- Fix panic when updating session activity for removed session
- Fix panic when shared router state is used when a router disconnects or reconnects
- Additional garbage collection for parallel route algorithm, removes successful routes created during failed attempts, after final successful attempt.
Control the link latency probe timeout parameter with the following router configuration syntax:
forwarder:
#
# After how many milliseconds does the link latency probe timeout?
# (default 10000)
#
latencyProbeTimeout: 10000
Specify the dial timeout for Xgress dialers using the following syntax:
dialers:
- binding: transport
options:
connectTimeout: 2s
You will need to specify Xgress options for each dialer binding that you want to use with your configuration.
- Metric events formatting has changed
Each now gets its own event. Here are two example events:
{
"metric": "xgress.tx_write_time",
"metrics": {
"xgress.tx_write_time.count": 0,
"xgress.tx_write_time.m1_rate": 0,
"xgress.tx_write_time.mean": 0,
"xgress.tx_write_time.p99": 0
},
"namespace": "metrics",
"source_event_id": "62c31ab9-e0ed-48f5-9907-2d2e8c76f393",
"source_id": "pTF3hzUQI",
"timestamp": "2021-02-23T19:33:39.017329033Z"
}
{
"metric": "link.rx.msgsize",
"metrics": {
"link.rx.msgsize.count": 3,
"link.rx.msgsize.mean": 0,
"link.rx.msgsize.p99": 0
},
"namespace": "metrics",
"source_entity_id": "8VEJ",
"source_event_id": "62c31ab9-e0ed-48f5-9907-2d2e8c76f393",
"source_id": "pTF3hzUQI",
"timestamp": "2021-02-23T19:33:39.017329033Z"
}
Changes of note:
- The metric name is now listed
- There's a new
source_event_id
which can be used to link together all the metrics that were reported at a given time - The timestamp format has been changed to match the other event times. Format is: RFC3339Nano
- Metrics which formerlly had an id in them, such as link and control channel metrics now have the
id extracted. The id is stored in the
source_entity_id
field.
- Fix edge router synchronization from stopping after workers exit
- Session validation intervals in the edge router were calculated incorrected
- Notes on the configuration values related to session validation were missing from the 0.19 release
notes. They have been added in the section called
Edge Session Validation
- Fix v0.18.x - v0.19.x API Session id incompatibility, all API Session and Sessions are deleted during this upgrade
- Fix Edge Router double connect leading to panics during Edge Router REST API rendering
-
Ziti CLI now has 'Let's Encrypt' PKI support to facilitate TLS connections to Controller from BrowZer-based apps that use the
ziti-sdk-js
.-
New command to Register a Let's Encrypt account, then create and install a certificate
Usage:
ziti pki le create -d domain -p path-to-where-data-is-saved [flags]
Flags:
-a, --acmeserver string ACME CA hostname (default "https://acme-v02.api.letsencrypt.org/directory") -d, --domain string Domain for which Cert is being generated (e.g. me.example.com) -e, --email string Email used for registration and recovery contact (default "[email protected]") -h, --help help for create -k, --keytype EC256|EC384|RSA2048|RSA4096|RSA8192 Key type to use for private keys (default RSA4096) -p, --path string Directory to use for storing the data -o, --port string Port to listen on for HTTP based ACME challenges (default "80") -s, --staging Enable creation of 'staging' Certs (instead of production Certs)
-
New command to Display Let's Encrypt certificates and accounts information
Usage:
ziti pki le list -p path-to-where-data-is-saved [flags]
Flags:
-a, --accounts Display Account info -h, --help help for list -n, --names Display Names info -p, --path string Directory where data is stored
-
New command to Renew a Let's Encrypt certificate
Usage:
ziti pki le renew -d domain -p path-to-where-data-is-saved [flags]
Flags:
-a, --acmeserver string ACME CA hostname (default "https://acme-v02.api.letsencrypt.org/directory") --days int The number of days left on a certificate to renew it (default 14) -d, --domain string Domain for which Cert is being generated (e.g. me.example.com) -e, --email string Email used for registration and recovery contact (default "[email protected]") -h, --help help for renew -k, --keytype EC256|EC384|RSA2048|RSA4096|RSA8192 Key type to use for private keys (default RSA4096) -p, --path string Directory where data is stored -r, --reuse-key Used to indicate you want to reuse your current private key for the renewed certificate (default true) -s, --staging Enable creation of 'staging' Certs (instead of production Certs)
-
New command to Revoke a Let's Encrypt certificate
Usage:
ziti pki le revoke -d domain -p path-to-where-data-is-saved [flags]
Flags:
-a, --acmeserver string ACME CA hostname (default "https://acme-v02.api.letsencrypt.org/directory") -d, --domain string Domain for which Cert is being generated (e.g. me.example.com) -e, --email string Email used for registration and recovery contact (default "[email protected]") -h, --help help for revoke -p, --path string Directory where data is stored -s, --staging Enable creation of 'staging' Certs (instead of production Certs)
-
- Edge session validation is now handled at the controller, not the edge router
- Routing across the overlay is now handled in parallel, rather than serially. This changes the
syntax and semantics of a couple of control plane messages between the controller and the
connected routers. See the section below on
Parallel Routing
for additional details. - API Session synchronization improvements and pluggability
- ziti ps now supports
router-disconnect
androuter-reconnect
, which disconnects/reconnects the router from the controller. This allows easier testing of various failure states. Requires that --debug-ops is passed toziti-router
on startup. - Golang SDK hosted service listeners are now properly closed when they receive close notifications
- Golang SDK now recovers if the session is gone
- Golang SDK now stops some go-routines that were previously left running after the SDK context was closed
- Fix session leak caused by using half close when tunneling UDP connections
- Fix connection leak caused by not closing the UDP connection when it's activity timer expires
- Fabric Xctrl instances are now notified when the control channel reconnects
- Fabric Xctrl instances may now provide message decoders for the trace infrastructure so that custom messages will be properly displayed in trace logs
Before 0.19, edge sessions (note: network sessions, not API sessions) would be sent to edge routers after they were created. When the edge router received a dial or bind request it would verify that the session was valid, then request the controller to create a fabric session.
This approach has two downsides.
- There is a race condition where the edge router may receive a dial/bind request before it has received the session from the controller. It thus has to wait awhile before declaring the session invalid.
- Sessions need to be managed across multiple edge routers, since we don't know where the client will connect. This adds a lot of control channel traffic.
Since the edge router makes a request to the controller anyway, we can pass the session token and fingerprints up to the controller and do the verification there. This allows us to minimize the amount of state the edge router needs to keep synchronized with the controller and removes the race condition.
When an edge controller loses connection to the controller, it needs to verify that its sessions are still valid, in case it missed any session deletion notifications. There are three new settings which control this behavior.
sessionValidateChunkSize
- how many sessions to validate in each request to the controller. Default value: 1000sessionValidateMinInterval
- minimum time to wait between chuenks of sessions. Default value:250ms
. Format: duration - examples:10s
,1m
,500ms
sessionValidateMaxInterval
- maximum time to wait between chuenks of sessions. Default value:1500ms
. Format: duration - examples:10s
,1m
,500ms
Intervals between sending session validation chunks is random, between min and max intervals. This is done to prevent the routers from flooding the controller after a controller restart.
Prior to 0.19, the Ziti controller would send a Route
message to the terminating router first, to
establish terminator endpoint connectivity. If the destination endpoint was unreachable, the entire
session setup would be abandoned. If the terminator responded successfully, the controller would
then proceed to work through the chain of routers sending Route
messages and creating the
appropriate forwarding table entries. This all happened sequentially.
In 0.19 route setup for session creation now happens in parallel. The controller sends Route
commands to all of the routers in the chain (including the terminating router), and waits for
responses and/or times out those responses. If all of the participating routers respond
affirmatively within the timeout period, the entire session creation succeeds. If any participating
router responds negatively, or the timeout period occurs, the session creation attempt fails,
updating configured termination weights. Session creation will retry up to a configured number of
attempts. Each attempt will perform a fresh path selection to ensure that failed terminators can be
excluded from subsequent attempts.
The terminationTimeoutSeconds
timeout parameter has been removed and will be ignored.
The routeTimeoutSeconds
controls the timeout for each route attempt.
#network:
#
# routeTimeoutSeconds controls the number of seconds the controller will wait for a route attempt to succeed.
#
#routeTimeoutSeconds: 10
You'll want to ensure that your participating routers' getSessionTimeout
in the Xgress options is
configured to a suitably large enough value to support the configured number of routing attempts, at
the configured routing attempt timeout. In the router configuration, the getSessionTimeout
value
is configured for your Xgress listeners like this:
listeners:
# basic ssh proxy
- binding: proxy
address: tcp:0.0.0.0:1122
service: ssh
options:
getSessionTimeout: 120s
The new parallel routing implementation also supports a configurable number of session creation
attempts. Prior to 0.19, the number of attempts was hard-coded at 3. In 0.19, the number of retries
is controlled by the createSessionRetries
parameter, which defaults to 3.
network:
#
# createSessionRetries controls the number of retries that will be attempted to create a circuit (and terminate it)
# for new sessions.
#
createSessionRetries: 5
Prior to 0.19 API Sessions were only capable of being synchronized with connecting/reconnecting edge routers in a single manner. In 0.19 and forward improvements allow for multiple strategies to be defined within the same code base. Future releases will be able to introduce configurable and negotiable strategies.
The default strategy from prior releases, now named 'instant', has been improved to fix issues that could arise during edge router reconnects where API Sessions would become invalid on the reconnecting edge router. In addition, the instant strategy now allows for invalid synchronization detection, resync requests, enhanced logging, and synchronization statuses for edge routers.
The GET /edge-routers
list and GET /edge-routers/<id>
detail responses now include
a syncStatus
field. This value is updated during the lifetime of the edge router's connection to the controller
and will provide insight on its status.
The possible syncStatus
values are as follows:
- "SYNC_NEW" - connection accepted but no strategy actions have been taken
- "SYNC_QUEUED" - connection handed to a strategy and waiting for processing
- "SYNC_HELLO_TIMEOUT" - sync failed due to a hello timeout, requeued for hello
- "SYNC_HELLO" - controller edge hello being sent
- "SYNC_HELLO_WAIT" - hello received from router and queued for processing
- "SYNC_RESYNC_WAIT" - router requested a resync and queued for processing
- "SYNC_IN_PROGRESS" - synchronization processing
- "SYNC_DONE" - synchronization completed, router is now in maintenance updates
- "SYNC_UNKNOWN" - state is unknown, edge router misbehaved, error state
- "SYNC_DISCONNECTED" - strategy was disconnected before finishing, error state