[HUDI-7610] Resolve issues for delete records #12122

linliu-code · 2024-10-17T19:38:37Z

Change Logs

The main problem we face for delete logic is that some DeleteRecord do not have valid orderingVal field. This is not a problem for processing time based merging, but it breaks the event time based merging. The fundamental solution is to preserve the orderingValue field for DeleteRecord, which may not be possible or easy in reality; we don't attempt to do that in this PR. Here we focus on making the delete logic reasonable and consistent across fg and non-fg readers, spark/avro record types. This problem is mainly for MOR table.

For a given record key RK, suppose we have a series of operations on it, like insert, update, delete, update, delete, update, etc. That is, we have a series of records, i.e., br1, lfr1, lfr2, lfr3, lfr4, etc.
(1) If all records have the orderingVal field, we can successfully merge based on event time, which is the happy path.
(2) If lfr3 is a delete record without ordering value, we don't have enough information to merge it with other records based on event time.
Here a reasonable assumption is: all records before this delete record, i.e., its commit time is bigger, can be considered as processing time based. But records that are newer than the delete record, can keep merging based on event time. In this way, we combined processing time and event time in a logical way, which is universal for all Spark/Avro, COW/MOR, with FG or without FG.

To implement, we create a metadata entry "PROCESSING_TIME_BASED_DELETE_FOUND" to indicate that a processing time based delete has been found; any further merging should be skipped.
(1) For non-fg reader, we store the flag into the HoodieRecord.metadata field. For further merging, this flag is kept, which is used to skip merging with base file record.
(2) For fg reader, we store the the flag into the metadata field of the record buffer. All further merging should be skipped.

Impact

Make the delete logic consistent across different record types, and fg and non-fg readers.

Risk level (write none, low medium or high below)

Medium.

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

The config description must be updated if new configs are added or the default value of the configs are changed
Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
ticket number here and follow the instruction to make
changes to the website.

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

nsivabalan

minor comments.

nsivabalan · 2024-10-18T21:52:04Z

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergedReadHandle.java

@@ -166,6 +167,12 @@ private List<HoodieRecord<T>> doMergedRead(Option<HoodieFileReader> baseFileRead
 String key = record.getRecordKey();
 if (deltaRecordMap.containsKey(key)) {
 deltaRecordKeys.remove(key);
+ // When internal operation exists, it means there are at least one delete in between.
+ // Therefore, no need to merge with the base record.


do you think we can check if value is equal to "Delete" operation.

The record from log record reader may not be a delete record. So its operation may not be Delete operation.

nsivabalan · 2024-10-18T21:58:55Z

hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java

+ HoodieRecord finalRecord = latestHoodieRecord.copy();
+
+ // Reserve the delete information.
+ if (prevRecord.isDelete(readerSchema, this.getPayloadProps())


I see BaseHMergedLogRecordScanner L98ish, we account for payload props, and tableConfig.getPreombineKey.
why we are not accounting for tableConfig here.
If we are sure payload props is good enough?

should we try to add some utils methods and use it across the board. for eg, to account for both payloadProps and tableConfig.

bcoz, we could be using 1.x hudi binary to read a 0.10.x table as well. So, payload props may not be present.

btw, can we check if payload props are set while using it as a reader?
bcoz, on the reader, only info we have is the table config. So, I would expect we read the table config and set these payload props.

If not, payload props might be empty even while reading a 1.x table.

Checked the code logic. For fg, the payload properties are derived from table properties. For non-fg, the payload only contain one filed precombined field. I have updated the code to add table properties initially.

nsivabalan · 2024-10-18T22:18:06Z

...-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java

- && existingOrderingVal.compareTo(deleteOrderingVal) > 0;
+ Comparable deleteOrderingVal = readerContext.getOrderingValue(
+ Option.empty(), Collections.emptyMap(), readerSchema, orderingFieldName, orderingFieldType, orderingFieldDefault);
+ deleteOrderingVal = deleteRecord.getOrderingValue() == null ? deleteOrderingVal : deleteRecord.getOrderingValue();


for now, if we are going to take a stand that delete will just override any previous value, why can't we just set the default ordering value (readerContext.castValue(0, orderingFieldType)) as the ordering value for delete records. and it make the code simple and easy to read.

The default ordering value like (0) should be lower than the new inserted records with the same key, then the delete records are lost. Therefore, we have to use some extra flag to remember if a delete without valid ordering value exists.

nsivabalan · 2024-10-18T22:29:05Z

...-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java

- boolean chooseExisting = !deleteOrderingVal.equals(0)
- && ReflectionUtils.isSameClass(existingOrderingVal, deleteOrderingVal)
- && existingOrderingVal.compareTo(deleteOrderingVal) > 0;
+ Comparable deleteOrderingVal = readerContext.getOrderingValue(


can we define a boolean value named "orderingValueMissing" and so code readabiltiy is better.

Yeah, we can do that.

I use default value directly, and create a function to indicate if we found a processing time delete.

nsivabalan · 2024-10-18T22:37:58Z

hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/Iterators.scala

- Some(projection.apply(data))
+
+ // Delete records are in-between; no merge is needed.
+ if (newRecord.getMetaDataInfo(HoodieReaderContext.INTERNAL_META_OPERATION).isPresent) {


lets add an example and say why we need this processing.

nsivabalan · 2024-10-18T22:39:58Z

...urce/hudi-spark/src/test/scala/org/apache/hudi/common/table/read/TestDeleteRecordLogic.scala

+ val actualMinusExpected = finalDf.except(expectedDf)
+
+ expectedMinusActual.show(false)
+ actualMinusExpected.show(false)


for MOR, can we trigger one final compaction and ensure the expected value stays intact.

yes, will do.

nsivabalan · 2024-10-19T17:48:52Z

hudi-common/src/main/java/org/apache/hudi/common/engine/HoodieReaderContext.java

@@ -141,6 +141,7 @@ public void setShouldMergeUseRecordPosition(boolean shouldMergeUseRecordPosition
 public static final String INTERNAL_META_OPERATION = "_3";
 public static final String INTERNAL_META_INSTANT_TIME = "_4";
 public static final String INTERNAL_META_SCHEMA = "_5";
+ public static final String PROCESSING_TIME_BASED_DELETE_FOUND = "_6";


lets call this DELETE_FOUND_WITHOUT_ORDERING_VALUE
so its more clearer

nsivabalan

mostly minor comments

nsivabalan · 2024-10-19T18:12:13Z

hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java

+ * 2. The current record's metadata contains the flag: PROCESSING_TIME_BASED_DELETE_FOUND.
+ */
+ private <T> boolean hasProcessingTimeBasedDelete(HoodieRecord<T> record) throws IOException {
+ return (record.isDelete(readerSchema, getPayloadProps())


can we flip the comparison. lets first check (record.getMetadata().isPresent() && record.getMetaDataInfo(PROCESSING_TIME_BASED_DELETE_FOUND).isPresent()
and then check for ordering value from the record.

nsivabalan · 2024-10-19T18:13:15Z

hudi-common/src/main/java/org/apache/hudi/common/table/log/TypeCaster.java

+ return null;
+ }
+ } else {
+ throw new UnsupportedOperationException(


does this mean that we can only support ordering values of certain data types? I am not asking to fix it in this patch. just trying to gauge whats the limitation we have around this

nsivabalan · 2024-10-19T18:28:00Z

...-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java

+ // Here existing record represents newer record with the same key, which can be a delete or non-delete record.
+ // Therefore, we should use event time based merging if possible. So, the newer record is returned if
+ // 1. the delete is processing time based, or
+ // 2. the delete is event time based, and has higher value.


minor.
2. delete is event time based, and the existing record has higher value.

nsivabalan · 2024-10-19T18:35:35Z

...urce/hudi-spark/src/test/scala/org/apache/hudi/common/table/read/TestDeleteRecordLogic.scala

+ mode(SaveMode.Append).
+ save(basePath)
+
+ val fourUpdateData = Seq((-9, "4", "rider-DDDD", "driver-DDDD", 20.00, 1))


lets do a round of validation before compaction. and once after compaction.

and lets validate that in case of MOR table, compaction has not kicked in while we do the first validation.

and post compaction, lets validate that 1 compaction commit is seen.

nsivabalan · 2024-10-19T18:36:51Z

...urce/hudi-spark/src/test/scala/org/apache/hudi/common/table/read/TestDeleteRecordLogic.scala

+ option(RECORDKEY_FIELD.key(), "key").
+ option(PRECOMBINE_FIELD.key(), "ts").
+ option(TABLE_TYPE.key(), tableType).
+ option(OPERATION.key(), "delete").


can you set HoodieCompactionConfig.INLINE_COMPACT.key() = false for these ingests. so that we know compaction does not kick in.
we added automatic compaction for spark data source writes (which is after 5 commits).
So, lets ensure we do not trigger compaction unless we explicitly want to

nsivabalan · 2024-10-19T18:37:35Z

...urce/hudi-spark/src/test/scala/org/apache/hudi/common/table/read/TestDeleteRecordLogic.scala

+ option(RECORDKEY_FIELD.key(), "key").
+ option(PRECOMBINE_FIELD.key(), "ts").
+ option(TABLE_TYPE.key(), tableType).
+ option(HoodieCompactionConfig.INLINE_COMPACT.key(),


you may need to set hoodie.compact.inline.max.delta.commits = 1 here. so we know for sure compaction kicked in for MOR table.
just by setting HoodieCompactionConfig.INLINE_COMPACT.key() = true, compaction may not kick in.

hudi-bot · 2024-10-19T18:47:56Z

CI report:

ea58f41 Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

linliu-code changed the title ~~HUDI-7610][WIP] Fix lost delete bug~~ [HUDI-7610][WIP] Fix lost delete bug Oct 17, 2024

github-actions bot added size:M PR with lines of changes in (100, 300] size:L PR with lines of changes in (300, 1000] and removed size:M PR with lines of changes in (100, 300] labels Oct 17, 2024

linliu-code changed the title ~~[HUDI-7610][WIP] Fix lost delete bug~~ [HUDI-7610][WIP] Fix delete records related bugs Oct 18, 2024

linliu-code changed the title ~~[HUDI-7610][WIP] Fix delete records related bugs~~ [HUDI-7610] Fix delete records related bugs Oct 18, 2024

linliu-code force-pushed the HUDI-7610_fix_inconsistent_delete_logic branch from 75557e7 to d510d12 Compare October 18, 2024 16:47

linliu-code changed the title ~~[HUDI-7610] Fix delete records related bugs~~ [HUDI-7610] Resolve issues for delete records with NULL ordering value Oct 18, 2024

linliu-code changed the title ~~[HUDI-7610] Resolve issues for delete records with NULL ordering value~~ [HUDI-7610] Resolve issues for delete records Oct 18, 2024

nsivabalan reviewed Oct 18, 2024

View reviewed changes

linliu-code force-pushed the HUDI-7610_fix_inconsistent_delete_logic branch from 9266428 to 9f23edc Compare October 19, 2024 04:15

linliu-code added 15 commits October 19, 2024 09:23

Fix lost-deletes bug of MOR table.

8c974ac

Handle lost deletes in between

0d34c08

add more fix

9c3d087

Add tests

7716f7c

Update tests

d698865

Fix more bugs

5350ed1

remove unnecessary changes

aa3b4a0

Fix bugs

bacb1ba

refactor

4c5a1e7

Add typecaster

cd70ead

Fix CI tests

ea3efd6

fix formatting

c2996df

Fix more bugs

3d65b18

Fix more bugs

3ce46ba

Addressed comment

2552175

linliu-code force-pushed the HUDI-7610_fix_inconsistent_delete_logic branch from 9f23edc to 2552175 Compare October 19, 2024 16:23

fix a bug

ea58f41

linliu-code requested a review from nsivabalan October 19, 2024 16:49

nsivabalan reviewed Oct 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HUDI-7610] Resolve issues for delete records #12122

[HUDI-7610] Resolve issues for delete records #12122

linliu-code commented Oct 17, 2024 •

edited

Loading

nsivabalan left a comment

nsivabalan Oct 18, 2024

linliu-code Oct 19, 2024

nsivabalan Oct 18, 2024

nsivabalan Oct 18, 2024

nsivabalan Oct 18, 2024

linliu-code Oct 19, 2024

nsivabalan Oct 18, 2024

linliu-code Oct 19, 2024

nsivabalan Oct 18, 2024

linliu-code Oct 19, 2024

linliu-code Oct 19, 2024

nsivabalan Oct 18, 2024

linliu-code Oct 19, 2024

nsivabalan Oct 18, 2024

linliu-code Oct 19, 2024

linliu-code Oct 19, 2024

nsivabalan Oct 19, 2024

nsivabalan left a comment

nsivabalan Oct 19, 2024

nsivabalan Oct 19, 2024

nsivabalan Oct 19, 2024

nsivabalan Oct 19, 2024

nsivabalan Oct 19, 2024

nsivabalan Oct 19, 2024

nsivabalan Oct 19, 2024

hudi-bot commented Oct 19, 2024

[HUDI-7610] Resolve issues for delete records #12122

Are you sure you want to change the base?

[HUDI-7610] Resolve issues for delete records #12122

Conversation

linliu-code commented Oct 17, 2024 • edited Loading

Change Logs

Impact

Risk level (write none, low medium or high below)

Documentation Update

Contributor's checklist

nsivabalan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nsivabalan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hudi-bot commented Oct 19, 2024

CI report:

linliu-code commented Oct 17, 2024 •

edited

Loading