This Helm chart deploys a one-time data upload job or continuous replication service using Kubernetes. It supports three deployment types:
- SNAPSHOT_ONLY: A one-time
Job
for uploading a snapshot of data. - INCREMENT_ONLY: A continuous data replication
StatefulSet
. - SNAPSHOT_AND_INCREMENT: A
Job
for a one-time data snapshot upload followed by a continuous replicationStatefulSet
after the job completes.
- Kubernetes 1.18+
- Helm 3.0+
- A Docker image with the required commands:
trctl activate --file /etc/config/config.yaml
for the snapshot job.trctl replication --file /etc/config/config.yaml
for continuous replication.
To install the chart, run the following command:
helm pull oci://ghcr.io/doublecloud/transfer-helm/transfer
For example:
helm install transfer ./transfer --set transferSpec.type=SNAPSHOT_ONLY
To uninstall the chart, run:
helm uninstall <release-name>
The chart is highly configurable. You can specify various parameters in the values.yaml
file or through the --set
option when running the helm install
command.
Parameter | Description | Default |
---|---|---|
transferSpec.id |
Unique ID for the data transfer job. | dtttest |
transferSpec.type |
Type of deployment: SNAPSHOT_ONLY , INCREMENT_ONLY , SNAPSHOT_AND_INCREMENT . |
SNAPSHOT_ONLY |
transferSpec.src.type |
Source type (e.g., pg ). |
pg |
transferSpec.src.params |
Source parameters. | {} |
transferSpec.dst.type |
Destination type (e.g., ch ). |
ch |
transferSpec.dst.params |
Destination parameters. | {} |
snapshot.worker_count |
Number of parallel instances for the snapshot job. | 1 |
replication.worker_count |
Number of replicas for the continuous replication StatefulSet . |
1 |
resources.requests.cpu |
CPU resource requests for the pods. | 100m |
resources.requests.memory |
Memory resource requests for the pods. | 128Mi |
resources.limits.cpu |
CPU resource limits for the pods. | 500m |
resources.limits.memory |
Memory resource limits for the pods. | 256Mi |
coordinator.type |
Type of external coordinator service, e.g., s3 or memory . |
s3 |
coordinator.job_count |
Number of parallel instances the workload. | 1 |
coordinator.process_count |
How many threads will be run inside each job. | 4 |
coordinator.bucket |
Name of the S3 bucket for coordination. | place_your_bucket |
transferSpec.regular_snapshot.incremental |
List of objects defining incremental snapshot settings. | [] |
transferSpec.regular_snapshot.enabled |
Enable or disable the regular snapshot mechanism. | false |
transferSpec.regular_snapshot.cron_expression |
Cron expression for scheduled cron job. | 0 1 * * * |
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "500m"
coordinator:
job_count: 1 # set more than one means work would be sharded, coordinator must be non memory
process_count: 4 # default is 4, how many threads will be run inside each job
type: s3 # type of coordination, one of: s3 or memory
bucket: place_your_bucket # Bucket with write access, will be used to store state
transferSpec:
id: mytransfer # Unique ID of a transfer
name: awesome transfer # Human friendly name for a transfer
type: INCREMENT_ONLY # type of transfer, one of: INCREMENT_ONLY, SNAPSHOT_ONLY, SNAPSHOT_AND_INCREMENT
src:
type: source_type # for example: pg, s3, kafka ...
params:
... # source type params, all params can be founded in `model_source.go` for provider folder
dst:
type: target_type # for example: s3, ch, kafka ...
params:
... # target type params, all params can be founded in `model_destination.go` for provider folder
regular_snapshot:
enabled: true
incremental:
- namespace: public
name: playing_with_neon
cursor_field: id
The transferSpec
section configures the data source (src
) and destination (dst
). These fields accept JSON-formatted strings in the params
section for both source and destination configurations. The type
defines the kind of source/destination system (e.g., PostgreSQL, ClickHouse).
- Source Example:
src:
type: pg # Postgres source type
params:
Hosts:
- YOUR_NEON_DB
Database: YOUR_DB_NAME
User: YOUR_DB_USER
Password: YOUR_DB_PASSWORD
Port: 5432
BatchSize: 1024
SlotByteLagLimit: 53687091200
EnableTLS: true
KeeperSchema: public
DesiredTableSize: 1073741824
SnapshotDegreeOfParallelism: 4
- Destination Example:
dst:
type: ch # Clickhouse destination type
params:
User: YOUR_CH_USER
Password: YOUR_CH_PASSWORD
ShardsList:
- Hosts:
- YOUR_CH_HOST
Database: default
HTTPPort: 8443
SSLEnabled: true
NativePort: 9440
MigrationOptions:
AddNewColumns: true
InsertParams:
MaterializedViewsIgnoreErrors: true
RetryCount: 20
UseSchemaInTableName: true
Interval: 1000000000
Cleanup: Drop
BufferTriggingSize: 536870912
The coordinator
section configures an external coordination service, such as an S3 bucket, that can be used for coordinating data transfers or snapshots. This is useful when external storage or coordination mechanisms are required.
- Example:
coordinator:
type: s3
bucket: my-coordination-bucket
The regular_snapshot
feature allows for periodic snapshots with optional incremental replication. This is particularly useful when a regular snapshot schedule is needed alongside continuous data replication.
- Incremental Snapshot Example:
transferSpec:
regular_snapshot:
enabled: true
incremental:
- namespace: public
name: playing_with_neon
cursor_field: id
This example enables the regular_snapshot
feature and specifies that incremental replication should track changes using the id
field from the playing_with_neon
table within the public
namespace.
This mode deploys a one-time Job
to upload a data snapshot. You can control the number of parallel job instances by setting job.instanceCount
.
To deploy:
helm install transfer ./transfer --set transferSpec.type=SNAPSHOT_ONLY --set job.instanceCount=3
This mode deploys a StatefulSet
for continuous data replication. You can control the number of replicas with statefulSet.replicaCount
.
To deploy:
helm install transfer ./transfer --set transferSpec.type=INCREMENT_ONLY --set statefulSet.replicaCount=2
This mode first runs a Job
to take a data snapshot, followed by a StatefulSet
for continuous replication. The StatefulSet
will only start after the Job
completes successfully.
To deploy:
helm install transfer ./transfer --set transferSpec.type=SNAPSHOT_AND_INCREMENT --set job.instanceCount=2 --set statefulSet.replicaCount=2
You can adjust the CPU and memory allocation for the pods using the resources
section of the values.yaml
file. Ensure you specify both requests
and limits
for better resource management.
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "500m"
You can override any configuration option in the values.yaml
file by using the --set
flag when installing the Helm chart. For example:
helm install transfer ./transfer --set transferSpec.id=my-custom-id --set transferSpec.type=INCREMENT_ONLY
To view logs of a running Job
or StatefulSet
pod, you can use kubectl
:
kubectl logs <pod-name>
For example, to get
logs from the Job
:
kubectl get pods -l job-name=transfer
kubectl logs <pod-name>
For the StatefulSet
:
kubectl get pods -l statefulset.kubernetes.io/pod-name=data-upload-statefulset
kubectl logs <pod-name>
Feel free to open issues and submit PRs to improve this chart!