[FLINK-35600] Add timestamp for low and high watermark #3415

JNSimba · 2024-06-14T05:35:40Z

https://issues.apache.org/jira/browse/FLINK-35600

JNSimba · 2024-07-18T08:16:09Z

@leonardBang @ruanhang1993 PTAL.

github-actions · 2024-09-17T00:03:30Z

This pull request has been automatically marked as stale because it has not had recent activity for 60 days. It will be closed in 30 days if no further activity occurs.

ruanhang1993 · 2024-09-19T01:50:10Z

@JNSimba Thanks for this PR. Please add some tests to cover the changes.

JNSimba · 2024-09-20T09:40:06Z

Thanks, itcase has been added,PTAL @ruanhang1993

ruanhang1993 · 2024-09-27T07:35:45Z

...ain/java/org/apache/flink/cdc/connectors/mysql/debezium/task/MySqlSnapshotSplitReadTask.java

@@ -187,6 +192,9 @@ protected SnapshotResult<MySqlOffsetContext> doExecute(
        } else {
            // Get the current binlog offset as HW
            highWatermark = DebeziumUtils.currentBinlogOffset(jdbcConnection);
+            long epochSecond = clock.currentTime().getEpochSecond();
+            highWatermark.getOffset().put(BinlogOffset.TIMESTAMP_KEY, String.valueOf(epochSecond));
+            highWatermark.getOffset().put(BinlogOffset.SERVER_ID_KEY, String.valueOf(epochSecond));


Why do we need to add the server-id here?

In the configureFilter method of BinlogSplitReader In the process, the highWatermark of all chunks will be compared to obtain the largest one.

When the table has multiple chunks
chunk-1:timestamp=1727423957,binlogpostion=1001
chunk-2:timestamp=1727423958,binlogpostion=1002
chunk-3:timestamp=1727423959,binlogpostion=1002
chunk-4:timestamp=1727423960,binlogpostion=1002

However, at this time, the serverid is 0, but the BinlogPosition may be different (because new data has been added during the period). According to the current logic of BinlogOffset.compare: if the serverid is the same, the postion/filename will be compared. However, for chunk-2/chunk-3/chunk-4, except for the timestamp, everything else is the same, so the calculated highWatermark is finally chunk-2, which will lead to duplicate data.

JNSimba · 2024-09-27T15:42:57Z

I changed this to compare timestamp in BinlogOffset.compare method, so there is no need to set serverid, PTAL, thanks @ruanhang1993

ruanhang1993 · 2024-10-10T09:02:56Z

@lvyanquan Do you have time to help to review this PR again?

lvyanquan

LGTM.

add timestamp for low and high watermark

caa7fc9

github-actions bot added the mysql-cdc-connector label Jun 14, 2024

code style

b41ecfe

github-actions bot added Stale and removed Stale labels Sep 17, 2024

JNSimba added 2 commits September 19, 2024 10:31

Merge branch 'master' into data-repeat

6c34205

add itcase

f79a564

ruanhang1993 reviewed Sep 27, 2024

View reviewed changes

update

1a9f637

code style

4fc6818

ruanhang1993 approved these changes Oct 10, 2024

View reviewed changes

github-actions bot added approved reviewed labels Oct 10, 2024

lvyanquan approved these changes Oct 14, 2024

View reviewed changes

Merge branch 'master' into data-repeat

2028e19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-35600] Add timestamp for low and high watermark #3415

[FLINK-35600] Add timestamp for low and high watermark #3415

JNSimba commented Jun 14, 2024 •

edited

Loading

JNSimba commented Jul 18, 2024

github-actions bot commented Sep 17, 2024

ruanhang1993 commented Sep 19, 2024

JNSimba commented Sep 20, 2024

ruanhang1993 Sep 27, 2024

JNSimba Sep 27, 2024 •

edited

Loading

JNSimba commented Sep 27, 2024

ruanhang1993 commented Oct 10, 2024

lvyanquan left a comment

[FLINK-35600] Add timestamp for low and high watermark #3415

Are you sure you want to change the base?

[FLINK-35600] Add timestamp for low and high watermark #3415

Conversation

JNSimba commented Jun 14, 2024 • edited Loading

JNSimba commented Jul 18, 2024

github-actions bot commented Sep 17, 2024

ruanhang1993 commented Sep 19, 2024

JNSimba commented Sep 20, 2024

ruanhang1993 Sep 27, 2024

Choose a reason for hiding this comment

JNSimba Sep 27, 2024 • edited Loading

Choose a reason for hiding this comment

JNSimba commented Sep 27, 2024

ruanhang1993 commented Oct 10, 2024

lvyanquan left a comment

Choose a reason for hiding this comment

JNSimba commented Jun 14, 2024 •

edited

Loading

JNSimba Sep 27, 2024 •

edited

Loading