Skip to content

Commit

Permalink
[FLINK-34270][docs] Update doc for the new Table source & sink interf…
Browse files Browse the repository at this point in the history
…aces introduced in FLIP-146/367

Close #24234
  • Loading branch information
X-czh authored and libenchao committed Feb 1, 2024
1 parent 3c8088b commit 98e1d07
Show file tree
Hide file tree
Showing 2 changed files with 28 additions and 6 deletions.
12 changes: 10 additions & 2 deletions docs/content.zh/docs/dev/table/sourcesSinks.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,11 @@ Flink 会对工厂类逐个进行检查,确保其“标识符”是全局唯

可以实现更多的功能接口来优化数据源,比如实现 `SupportsProjectionPushDown` 接口,这样在运行时在 source 端就处理数据。在 `org.apache.flink.table.connector.source.abilities` 包下可以找到各种功能接口,更多内容可查看 [source abilities table](#source-abilities)

实现 `ScanTableSource` 接口的类必须能够生产 Flink 内部数据结构,因此每条记录都会按照`org.apache.flink.table.data.RowData` 的方式进行处理。Flink 运行时提供了转换机制保证 source 端可以处理常见的数据结构,并且在最后进行转换。
返回的 **_scan runtime provider_** 提供了读取数据的运行时实现。有多种运行时实现的接口,其中 `SourceProvider` 是推荐使用的核心接口。

不管使用那种运行时实现的接口,source 端的运行时实现必须能够生产 Flink 内部数据结构,因此每条记录都会按照`org.apache.flink.table.data.RowData` 的方式进行处理。Flink 运行时提供了转换机制保证 source 端可以处理常见的数据结构,并且在最后进行转换。

为了支持指定并行度,动态表的工厂类需要支持在 `org.apache.flink.table.factories.FactoryUtil` 中定义的 `scan.parallelism` 可选参数,并将参数的值传递给一个实现了 `ParallelismProvider` 接口的运行时实现接口。

<a name="lookup-table-source"></a>

Expand Down Expand Up @@ -240,7 +244,11 @@ Flink 会对工厂类逐个进行检查,确保其“标识符”是全局唯

可以实现 `SupportsOverwrite` 等功能接口,在 sink 端处理数据。可以在 `org.apache.flink.table.connector.sink.abilities` 包下找到各种功能接口,更多内容可查看[sink abilities table](#sink-abilities)

实现 `DynamicTableSink` 接口的类必须能够处理 Flink 内部数据结构,因此每条记录都会按照 `org.apache.flink.table.data.RowData` 的方式进行处理。Flink 运行时提供了转换机制来保证在最开始进行数据类型转换,以便 sink 端可以处理常见的数据结构。
返回的 **_sink runtime provider_** 提供了写出数据的运行时实现。有多种运行时实现的接口,其中 `SinkV2Provider` 是推荐使用的核心接口。

不管使用那种运行时实现的接口,sink 端的运行时实现必须能够处理 Flink 内部数据结构,因此每条记录都会按照 `org.apache.flink.table.data.RowData` 的方式进行处理。Flink 运行时提供了转换机制来保证在最开始进行数据类型转换,以便 sink 端可以处理常见的数据结构。

为了支持指定并行度,动态表的工厂类需要支持在 `org.apache.flink.table.factories.FactoryUtil` 中定义的 `sink.parallelism` 可选参数,并将参数的值传递给一个实现了 `ParallelismProvider` 接口的运行时实现接口。

<a name="sink-abilities"></a>

Expand Down
22 changes: 18 additions & 4 deletions docs/content/docs/dev/table/sourcesSinks.md
Original file line number Diff line number Diff line change
Expand Up @@ -197,10 +197,17 @@ A table source can implement further ability interfaces such as `SupportsProject
mutate an instance during planning. All abilities can be found in the `org.apache.flink.table.connector.source.abilities`
package and are listed in the [source abilities table](#source-abilities).

The runtime implementation of a `ScanTableSource` must produce internal data structures. Thus, records
must be emitted as `org.apache.flink.table.data.RowData`. The framework provides runtime converters such
The returned _scan runtime provider_ provides the runtime implementation for reading the data. There are
different interfaces for runtime implementation, among which `SourceProvider` is the recommended core interface.

Independent of the provider interface, the source runtime implementation must produce internal data structures.
Thus, records must be emitted as `org.apache.flink.table.data.RowData`. The framework provides runtime converters such
that a source can still work on common data structures and perform a conversion at the end.

To support parallelism setting, the dynamic table factory should support the optional `scan.parallelism` option
defined in `org.apache.flink.table.factories.FactoryUtil` and pass its value to a provider that also implements
the `ParallelismProvider` interface.

#### Lookup Table Source

A `LookupTableSource` looks up rows of an external storage system by one or more keys during runtime.
Expand Down Expand Up @@ -296,10 +303,17 @@ A table sink can implement further ability interfaces such as `SupportsOverwrite
instance during planning. All abilities can be found in the `org.apache.flink.table.connector.sink.abilities`
package and are listed in the [sink abilities table](#sink-abilities).

The runtime implementation of a `DynamicTableSink` must consume internal data structures. Thus, records
must be accepted as `org.apache.flink.table.data.RowData`. The framework provides runtime converters such
The returned _sink runtime provider_ provides the runtime implementation for writing the data. There are
different interfaces for runtime implementation, among which `SinkV2Provider` is the recommended core interface.

Independent of the provider interface, the sink runtime implementation must consume internal data structures.
Thus, records must be accepted as `org.apache.flink.table.data.RowData`. The framework provides runtime converters such
that a sink can still work on common data structures and perform a conversion at the beginning.

To support parallelism setting, the dynamic table factory should support the optional `sink.parallelism` option
defined in `org.apache.flink.table.factories.FactoryUtil` and pass its value to a provider that also implements
the `ParallelismProvider` interface.

#### Sink Abilities

<table class="table table-bordered">
Expand Down

0 comments on commit 98e1d07

Please sign in to comment.