Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Update zh and en doc about CloudCanal(BladePipe) #53157

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified docs/en/_assets/3.11-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/en/_assets/3.11-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/en/_assets/3.11-3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/en/_assets/3.11-4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/en/_assets/3.11-5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/en/_assets/3.11-6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/en/_assets/3.11-7.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/en/_assets/3.11-8.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
102 changes: 43 additions & 59 deletions docs/en/integrations/loading_tools/CloudCanal.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,93 +2,77 @@
displayed_sidebar: docs
---

# Load data using CloudCanal
# BladePipe

## Introduction

CloudCanal Community Edition is a free data migration and synchronization platform published by [ClouGence Co., Ltd](https://www.cloudcanalx.com) that integrates Schema Migration, Full Data Migration, verification, Correction, and real-time Incremental Synchronization.
CloudCanal help users build a modern data stack in a simple way.
![image.png](../../_assets/3.11-1.png)

## Download

[CloudCanal Download Link](https://www.cloudcanalx.com)
BladePipe is a **real-time end-to-end data replication tool**, moving data between **30+** databases, message queues, search engines, caching, real-time data warehouses, data lakes and more, with **ultra-low latency**. It features efficiency, stability and scalability, compatibility with diverse database engines, one-stop management, enhanced security, and complex data transformation. BladePipe helps to break down data silos, increasing the value of data.

[CloudCanal Quick Start](https://www.cloudcanalx.com/us/cc-doc/quick/quick_start)

## Function Description
![image.png](../../_assets/3.11-1.png)

- It is highly recommended to utilize CloudCanal version 2.2.5.0 or higher for efficient data import into StarRocks.
- It is advisable to exercise control over the ingestion frequency when using CloudCanal to import **incremental data** into StarRocks. The default import frequency for writing data from CloudCanal to StarRocks can be adjusted using the `realFlushPauseSec` parameter, which is set to 10 seconds by default.
- In the current community edition with a maximum memory configuration of 2GB, if DataJobs encounter OOM exceptions or significant GC pauses, it is recommended to reduce the batch size to minimize memory usage.
- For Full DataTask, you can adjust the `fullBatchSize` and `fullRingBufferSize` parameters.
- For Incremental DataTask, the `increBatchSize` and `increRingBufferSize` parameters can be adjusted accordingly.
- Supported Source endpoints and features:

| Source Endpoints \ Feature | Schema Migration | Full Data | Incremental | Verification |
| --- | --- | --- | --- | --- |
| Oracle | Yes | Yes | Yes | Yes |
| PostgreSQL | Yes | Yes | Yes | Yes |
| Greenplum | Yes | Yes | No | Yes |
| MySQL | Yes | Yes | Yes | Yes |
| Kafka | No | No | Yes | No |
| OceanBase | Yes | Yes | Yes | Yes |
| PolarDb for MySQL | Yes | Yes | Yes | Yes |
| Db2 | Yes | Yes | Yes | Yes |
## Functions

## Typical example
BladePipe presents a visual management interface, allowing you to easily create DataJobs to realize **schema migration, data migration, synchronization, verification and correction**, etc. In addition, more refined and customized configurations are supported by setting parameters. Now BladePipe supports data movement from the following source DataSources to StarRocks:

CloudCanal allows users to perform operations in a visual interface where users can seamlessly add DataSources and create DataJobs through a visual interface. This enables automated schema migration, full data migration, and real-time incremental synchronization. The following example demonstrates how to migrate and synchronize data from MySQL to StarRocks. The procedures are similar for data synchronization between other data sources and StarRocks.
| Source DataSource | Schema Migration | Data Migration | Data Sync | Verification & Correction |
| --- | --- | --- | --- | --- |
| MySQL/MariaDB/AuroraMySQL | Yes | Yes | Yes | Yes |
| Oracle | Yes | Yes | Yes | Yes |
| PostgreSQL/AuroraPostgreSQL | Yes | Yes | Yes | Yes |
| SQL Server | Yes | Yes | Yes | Yes |
| Kafka | No | No | Yes | No |
| AutoMQ | No | No | Yes | No |
| TiDB | Yes | Yes | Yes | Yes |
| Hana | Yes | Yes | Yes | Yes |
| PolarDB for MySQL | Yes | Yes | Yes | Yes |
| Db2 | Yes | Yes | Yes | Yes |
:::info
For more functions and parameter settings, please refer to [BladePipe Connections](https://doc.bladepipe.com/dataMigrationAndSync/connection/mysql2?target=StarRocks).
:::

### Prerequisites
## Installation

First, refer to the [CloudCanal Quick Start](https://www.cloudcanalx.com/us/cc-doc/quick/quick_start) to complete the installation and deployment of the CloudCanal Community Edition.
Follow the instructions in [Install Worker (Docker)](https://doc.bladepipe.com/productOP/docker/install_worker_docker) or [Install Worker (Binary)](https://doc.bladepipe.com/productOP/binary/install_worker_binary) to download and install a BladePipe Worker.

### Add DataSource
## Example
Taking a MySQL instance as an example, the following part describes how to move data from MySQL to StarRocks.

- Log in to the CloudCanal platform
- Go to **DataSource Management** -> **Add DataSource**
- Select **StarRocks** from the options for self-built databases
### Add DataSources

1. Log in to the [BladePipe Cloud](https://cloud.bladepipe.com/). Click **DataSource** > **Add DataSource**.
2. Select StarRocks as the Type, and fill in the setup form.
- **Client Address**:The port StarRocks provided to MySQL Client. BladePipe queries the metadata in databases via it.
- **Account**: The user name of the StarRocks database. The INSERT permission is required to write data to StarRocks. If the user doesn't have the INSERT permission, please grant the permission with [GRANT](../../sql-reference/sql-statements/account-management/GRANT.md) as a reference.
- **Http Address**:It is used to receive the request from BladePipe to write data to StarRocks.
![image.png](../../_assets/3.11-2.png)

> Tips:
>
> - Client Address: The address of the StarRocks server's MySQL client service port. CloudCanal primarily uses this address to query metadata information of the database tables.
>
> - HTTP Address: The HTTP address is mainly used to receive data import requests from CloudCanal.
1. Click **Test Connection**. After successful connection, click **Add DataSource** to add the DataSource.
2. Add a MySQL DataSource following the above steps.

### Create DataJob

Once the DataSource has been added successfully, you can follow these steps to create data migration and synchronization DataJob.

- Go to **DataJob Management** -> **Create DataJob** in the CloudCanal
- Select the source and target databases for the DataJob
- Click Next Step
1. Click **DataJob** > [**Create DataJob**](https://doc.bladepipe.com/operation/job_manage/create_job/create_full_incre_task).

2. Select the source and target DataSources, and click **Test Connection** to ensure the connection to the source and target DataSources are both successful.
![image.png](../../_assets/3.11-3.png)

- Choose **Incremental** and enable **Full Data**
- Select DDL Sync
- Click Next Step

3. Select **Incremental** for DataJob Type, together with the **Full Data** option.
![image.png](../../_assets/3.11-4.png)

- Select the source tables you want to subscribe to. Please note that the target StarRocks tables automatically after Schema Migration are Primary Key tables, so source tables without a primary key are not currently supported**

- Click Next Step

4. Select the tables to be replicated. **Note that the target StarRocks tables automatically created after Schema Migration have primary keys, so source tables without primary keys are not supported currently**.
![image.png](../../_assets/3.11-5.png)

- Configure the column mapping
- Click Next Step

5. Select the columns to be replicated.
![image.png](../../_assets/3.11-6.png)

- Create DataJob

6. Confirm the DataJob creation.
![image.png](../../_assets/3.11-7.png)

- Check the status of DataJob. The DataJob will automatically go through the stages of Schema Migration, Full Data, and Incremental after it has been created
7. The DataJob runs automatically. BladePipe will automatically run the following DataTasks:
- **Schema Migration**: The schemas of the source tables will be migrated to the target instance.
- **Full Data**: All existing data of the source tables will be fully migrated to the target instance.
- **Incremental**: Ongoing data changes will be continuously synchronized to the target instance (with latency less than a minute).

![image.png](../../_assets/3.11-8.png)

Binary file modified docs/zh/_assets/3.11-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/zh/_assets/3.11-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/zh/_assets/3.11-3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/zh/_assets/3.11-4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/zh/_assets/3.11-5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/zh/_assets/3.11-6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/zh/_assets/3.11-7.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/zh/_assets/3.11-8.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
103 changes: 40 additions & 63 deletions docs/zh/integrations/loading_tools/CloudCanal.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,95 +6,72 @@ displayed_sidebar: docs

## 介绍

CloudCanal 社区版是一款由 [ClouGence 公司](https://www.clougence.com) 发行的集结构迁移、数据全量迁移/校验/订正、增量实时同步为一体的免费数据迁移同步平台。产品包含完整的产品化能力,助力企业打破数据孤岛、完成数据互融互通,从而更好的使用数据
CloudCanal 是一款全自研、可视化、自动化的数据迁移、同步工具,支持 30+ 款流行关系型数据库、实时数仓、消息中间件、缓存数据库和搜索引擎之间的数据互通,具备实时高效、精确互联、稳定可拓展、一站式、混合部署、复杂数据转换等优点,助力企业打破数据孤岛,从而更好地使用数据
![image.png](../../_assets/3.11-1.png)

## 下载安装

[CloudCanal 最新版下载地址](https://www.clougence.com)

[CloudCanal 快速开始](https://www.clougence.com/cc-doc/quick/quick_start)

## 功能说明

- 推荐使用 v2.2.5.0 及以上的 CloudCanal 版本写入 StarRocks。
- 建议您在使用 CloudCanal 将 **增量数据** 导入至 StarRocks 时,控制导入的频率,CloudCanal 写入 StarRocks 的默认导入频率可以通过参数 `realFlushPauseSec` 调整,默认为 10 秒。
- 当前社区版本最大的内存配置为 2GB,如果同步任务运行产生 OOM 异常或者 GC 停顿严重,可以调小以下参数来减少批次大小,从而减少内存占用。
- 全量参数为 `fullBatchSize` 和 `fullRingBufferSize`。
- 增量参数为 `increBatchSize` 和 `increRingBufferSize`。
- 支持的源端以及功能项:

| 数据源 \ 功能项 | 结构迁移 | 全量数据迁移 | 增量实时同步 | 数据校验 |
| --- | --- | --- | --- | --- |
| Oracle 源端 | 支持 | 支持 | 支持 | 支持 |
| PostgreSQL 源端 | 支持 | 支持 | 支持 | 支持 |
| Greenplum 源端 | 支持 | 支持 | 不支持 | 支持 |
| MySQL 源端 | 支持 | 支持 | 支持 | 支持 |
| Kafka 源端 | 不支持 | 不支持 | 支持 | 不支持 |
| OceanBase 源端 | 支持 | 支持 | 支持 | 支持 |
| PolarDb for MySQL 源端 | 支持 | 支持 | 支持 | 支持 |
| Db2 源端 | 支持 | 支持 | 支持 | 支持 |
CloudCanal 提供可视化的界面,可轻松实现数据的结构迁移、全量迁移、增量同步、校验与订正等,此外还可通过设置参数,完成更多精细化、自定义的数据同步配置。目前支持从以下数据源集成数据到 StarRocks:

## 使用方法
| 源端数据源 | 结构迁移 | 全量迁移 | 增量同步 | 校验订正 |
| --- | --- | --- | --- | --- |
| MySQL/MariaDB/AuroraMySQL | 支持 | 支持 | 支持 | 支持 |
| Oracle | 支持 | 支持 | 支持 | 支持 |
| PostgreSQL/AuroraPostgreSQL | 支持 | 支持 | 支持 | 支持 |
| SQL Server | 支持 | 支持 | 支持 | 支持 |
| Kafka | 不支持 | 不支持 | 支持 | 不支持 |
| AutoMQ | 不支持 | 不支持 | 支持 | 不支持 |
| TiDB | 支持 | 支持 | 支持 | 支持 |
| Hana | 支持 | 支持 | 支持 | 支持 |
| PolarDB for MySQL | 支持 | 支持 | 支持 | 支持 |
| Db2 | 支持 | 支持 | 支持 | 支持 |
:::info
更多功能及参数设置,请参考 [CloudCanal 数据链路说明](https://www.clougence.com/cc-doc/dataMigrationAndSync/connection/mysql2?target=StarRocks)。
:::

CloudCanal 提供了完整的产品化能力,用户在可视化界面完成数据源添加和任务创建即可自动完成结构迁移、全量迁移、增量实时同步。下文演示如何将 MySQL 数据库中的数据迁移同步到对端 StarRocks 中。其他源端同步到 StarRocks 也可以按照类似的方式进行。
## 下载安装

### 前置条件
请参考 [全新安装 (Docker Linux/MacOS)](https://www.clougence.com/cc-doc/productOP/docker/install_linux_macos),前往 [CloudCanal 官网](https://www.clougence.com/) 下载安装私有部署版本。

首先参考 [CloudCanal 快速开始](https://www.clougence.com/cc-doc/quick/quick_start) 完成 CloudCanal 社区版的安装和部署。
## 操作示例
以下以 MySQL 为例,演示如何实现 MySQL 到 StarRocks 的数据迁移同步。

### 添加数据源

- 登录 CloudCanal 平台
- 数据源管理-> 新增数据源
- 选择自建数据库中 StarRocks

1. 登录 CloudCanal 平台,点击 **数据源管理** > **新增数据源**。
2. 添加 StarRocks 数据源,并填写相应信息。
- **Client 地址**:StarRocks 提供给 MySQL Client 的服务端口,CloudCanal 主要用其查询库表的元数据信息。
- **账号**:StarRocks 集群用户名。导入操作需要目标表的 INSERT 权限。如果您的用户账号没有 INSERT 权限,请参考 [GRANT](../../sql-reference/sql-statements/account-management/GRANT.md) 给用户赋权。
- **Http 地址**:主要用于接收 CloudCanal 数据导入的请求。
![image.png](../../_assets/3.11-2.png)

> Tips:
>
> - Client 地址:为 StarRocks 提供给 MySQL Client 的服务端口,CloudCanal 主要用其查询库表的元数据信息。
>
> - Http 地址:Http 地址主要用于接收 CloudCanal 数据导入的请求。
>
> - 账号:导入操作需要目标表的 INSERT 权限。如果您的用户账号没有 INSERT 权限,请参考 [GRANT](../../sql-reference/sql-statements/account-management/GRANT.md) 给用户赋权。

### 任务创建

添加好数据源之后可以按照如下步骤进行数据迁移、同步任务的创建。
1. 点击 **测试连接**,连接成功后,点击 **新增数据源**。
2. 按照相同步骤添加 MySQL 数据源。

- **任务管理**-> **任务创建**
- 选择 **** **目标** 数据库
- 点击 下一步
### 创建任务
1. 点击 **同步任务** > **创建任务**。
2. 选择源和目标数据源,并分别点击 **测试连接**。

![image.png](../../_assets/3.11-3.png)

- 选择 **增量同步**,并且启用 **全量数据初始化**
- 勾选 DDL 同步
- 点击下一步
3. 选择 **数据同步** 并勾选 **全量初始化**。

![image.png](../../_assets/3.11-4.png)

- 选择订阅的表,**结构迁移自动创建的表为主键表,因此暂不支持无主键表**
- 点击下一步
4. 选择需要同步的表。**结构迁移自动创建的表为主键表,因此暂不支持无主键表**。

![image.png](../../_assets/3.11-5.png)

- 配置列映射
- 点击下一步
5. 选择需要同步的列。

![image.png](../../_assets/3.11-6.png)

- 创建任务
6. 确认创建任务。

![image.png](../../_assets/3.11-7.png)

- 查看任务状态。任务创建后,会自动完成结构迁移、全量、增量阶段。

![image.png](../../_assets/3.11-8.png)

## 参考资料

更多关于 CloudCanal 同步 StarRocks 的资料,可以查看
7. 任务自动运行。CloudCanal 会自动进行任务流转,其中的步骤包括:
- **结构迁移**: 将源端的表结构迁移到对端,如果同名表在对端已存在,则忽略。
- **全量数据迁移**: 已存在的存量数据将会完整迁移到对端,支持断点续传。
- **增量数据同步**: 增量数据将会持续地同步到对端数据库,并且保持实时(秒级别延迟)。

- [5 分钟搞定 PostgreSQL 到 StarRocks 数据迁移同步-CloudCanal 实战](https://www.askcug.com/topic/262)
![image.png](../../_assets/3.11-8.png)
Loading