Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly identify and report data conflicts when deleting rows and/or columns during a three-way merge. #6980

Merged
merged 39 commits into from
Nov 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
8dafa27
Remove unused `sqlSchema` parameter.
nicktobey Oct 23, 2023
b265266
Track database in system tables.
nicktobey Oct 23, 2023
27cf02e
Change expected error in ambiguous select.
nicktobey Oct 23, 2023
dc995c5
Use database name for system tables and table functions.
nicktobey Oct 24, 2023
0faf90c
[ga-bump-dep] Bump dependency in Dolt by nicktobey
nicktobey Oct 24, 2023
134c635
Fix comment formatting.
nicktobey Oct 25, 2023
92b5c25
Refactor test to make it more readable.
nicktobey Nov 8, 2023
91a5042
Replace BATS test with engine test for joining tables from different …
nicktobey Oct 24, 2023
0dfc815
Add schema merge tests.
nicktobey Nov 8, 2023
cbc85d1
Add additional schema and mapping information to valueMerger
nicktobey Nov 8, 2023
0c838e6
During three-way merge, attempt to automatically resolve conflicts wh…
nicktobey Oct 25, 2023
66eeeb5
During three-way merge, process merges even in columns that were remo…
nicktobey Nov 8, 2023
94fdf2d
Don't remap the left branch to the new schema prior to merging.
nicktobey Nov 8, 2023
83b16bd
Correctly update secondary indexes when the schema has changed.
nicktobey Nov 8, 2023
0af121d
Correctly update primary indexes when the schema has changed.
nicktobey Nov 8, 2023
7fb001c
Fix validators when the schema has changed.
nicktobey Nov 8, 2023
926d610
Fix cell-wise merging when the schema has changed.
nicktobey Nov 8, 2023
84a28b0
Report null constraint violations when processing left diffs during t…
nicktobey Nov 8, 2023
51de4f6
Add `GetKeyColumnsDescriptor` with flag for specifying whether to con…
nicktobey Nov 7, 2023
919ddf8
Drop secondary indexes when the table schema changed.
nicktobey Nov 8, 2023
9edbe39
Merge branch 'main' into nicktobey/schemamerge
nicktobey Nov 9, 2023
c6776f5
Unskip previously skipped test cases.
nicktobey Nov 9, 2023
e16e46e
Merge branch 'main' into nicktobey/schemamerge
nicktobey Nov 13, 2023
02387aa
Since we now support merging a deleted row with a deleted column, upd…
nicktobey Nov 14, 2023
0aa2736
Update correctness issues in cell merging exposed by existing tests, …
nicktobey Nov 14, 2023
6070c5c
Make new test object for each data test in schema merge.
nicktobey Nov 14, 2023
e53978d
Remove outdated row merge test.
nicktobey Nov 14, 2023
1b902e8
Update schema merge test.
nicktobey Nov 14, 2023
efff37b
Update bats tests. These tests test handling data conflicts, but the …
nicktobey Nov 15, 2023
32fd66d
Update `processBaseColumn` to account for the fact that we no longer …
nicktobey Nov 15, 2023
445ba96
During three-way merging, avoid touching the primary index for every …
nicktobey Nov 16, 2023
95b80fd
Cleanup comments in merge_prolly_rows.go
nicktobey Nov 16, 2023
85525e5
Fix copy-paste error in `processBaseColumns`
nicktobey Nov 16, 2023
52ce8d2
Remove unused code and add comments.
nicktobey Nov 16, 2023
c278833
Refactor test harness to make clear why we're copying the test object.
nicktobey Nov 20, 2023
560dfc0
Rename `GetKeyColumnsDescriptor` to `GetKeyDescriptorWithNoConversion`
nicktobey Nov 20, 2023
89f89a0
Don't run check validator on resolved deleted rows: it's not needed a…
nicktobey Nov 21, 2023
8cf2982
Add comments explaining why we have to handle left-only diffs during …
nicktobey Nov 21, 2023
6527d9d
Update docstring for `mergeProllyTableData`
nicktobey Nov 21, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
416 changes: 258 additions & 158 deletions go/libraries/doltcore/merge/merge_prolly_rows.go

Large diffs are not rendered by default.

13 changes: 13 additions & 0 deletions go/libraries/doltcore/merge/mutable_secondary_index.go
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,19 @@ func GetMutableSecondaryIdxsWithPending(ctx *sql.Context, sch schema.Schema, tab
return nil, err
}
m := durable.ProllyMapFromIndex(idx)

// If the schema has changed, don't reuse the index.
// TODO: This isn't technically required, but correctly handling updating secondary indexes when only some
// of the table's rows have been updated is difficult to get right.
// Dropping the index is potentially slower but guarenteed to be correct.
if !m.KeyDesc().Equals(index.Schema().GetKeyDescriptorWithNoConversion()) {
continue
}

if !m.ValDesc().Equals(index.Schema().GetValueDescriptor()) {
continue
}

if schema.IsKeyless(sch) {
m = prolly.ConvertToSecondaryKeylessIndex(m)
}
Expand Down
10 changes: 0 additions & 10 deletions go/libraries/doltcore/merge/row_merge_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -173,16 +173,6 @@ var testCases = []testCase{
true,
false,
},
{
"dropping a column should be equivalent to setting a column to null",
build(1, 2, 0),
build(2, 1),
build(1, 1, 1),
3, 2, 3,
build(2, 2),
true,
false,
},
// TODO (dhruv): need to fix this test case for new storage format
//{
// "add rows but one holds a new column",
Expand Down
90 changes: 61 additions & 29 deletions go/libraries/doltcore/merge/schema_merge_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -268,24 +268,25 @@ var columnAddDropTests = []schemaMergeTest{
skip: true,
},
{
// Skipped because the differ currently doesn't try to merge the dropped column.
// (https://github.com/dolthub/dolt/issues/6747)
name: "one side sets to non-NULL, other drops NULL, plus data change",
ancestor: singleRow(1, 2, nil),
left: singleRow(1, 3),
right: singleRow(1, 2, 3),
dataConflict: true,
skip: true,
},
{
// Skipped because the differ currently doesn't try to merge the dropped column.
// (https://github.com/dolthub/dolt/issues/6747)
name: "one side sets to non-NULL, other drops non-NULL",
ancestor: singleRow(1, 2, 3),
left: singleRow(1, 2),
right: singleRow(1, 2, 4),
dataConflict: true,
skip: true,
},
{
name: "one side drops column, other deletes row",
ancestor: []sql.Row{row(1, 2, 3), row(4, 5, 6)},
left: []sql.Row{row(1, 2), row(4, 5)},
right: []sql.Row{row(1, 2, 3)},
merged: []sql.Row{row(1, 2)},
},
},
},
Expand All @@ -304,54 +305,39 @@ var columnAddDropTests = []schemaMergeTest{
merged: singleRow(1, 3),
},
{
// Skipped because the differ currently doesn't try to merge the dropped column.
// (https://github.com/dolthub/dolt/issues/6747)
name: "one side sets to NULL, other drops non-NULL",
ancestor: singleRow(1, 2, 3),
left: singleRow(1, 3),
right: singleRow(1, nil, 3),
dataConflict: true,
skip: true,
},
{
// Skipped because the differ currently doesn't try to merge the dropped column.
// (https://github.com/dolthub/dolt/issues/6747)
name: "one side sets to NULL, other drops non-NULL, plus data change",
ancestor: singleRow(1, 2, 4),
left: singleRow(1, 3),
right: singleRow(1, nil, 4),
dataConflict: true,
skip: true,
},
{
// Skipped because the differ currently doesn't try to merge the dropped column.
// (https://github.com/dolthub/dolt/issues/6747)
name: "one side sets to non-NULL, other drops NULL, plus data change",
ancestor: singleRow(1, nil, 3),
left: singleRow(1, 3),
right: singleRow(1, 2, 3),
dataConflict: true,
skip: true,
},
{
// Skipped because the differ currently doesn't try to merge the dropped column.
// (https://github.com/dolthub/dolt/issues/6747)
name: "one side sets to non-NULL, other drops NULL, plus data change",
ancestor: singleRow(1, nil, 3),
left: singleRow(1, 4),
right: singleRow(1, 2, 3),
dataConflict: true,
skip: true,
},
{
// Skipped because the differ currently doesn't try to merge the dropped column.
// (https://github.com/dolthub/dolt/issues/6747)
name: "one side sets to non-NULL, other drops non-NULL",
ancestor: singleRow(1, 2, 3),
left: singleRow(1, 3),
right: singleRow(1, 4, 3),
dataConflict: true,
skip: true,
},
},
},
Expand Down Expand Up @@ -547,6 +533,29 @@ var columnAddDropTests = []schemaMergeTest{
skipNewFmt: true,
skipOldFmt: true,
},
{
name: "right side drops and adds column of same type",
ancestor: tbl(sch("CREATE TABLE t (id int PRIMARY KEY, c int, a int)")),
left: tbl(sch("CREATE TABLE t (id int PRIMARY KEY, c int, a int)")),
right: tbl(sch("CREATE TABLE t (id int PRIMARY KEY, c int, b int)")),
merged: tbl(sch("CREATE TABLE t (id int PRIMARY KEY, c int, b int)")),
dataTests: []dataTest{
{
name: "left side modifies dropped column",
ancestor: singleRow(1, 1, 2),
left: singleRow(1, 1, 3),
right: singleRow(1, 2, 2),
dataConflict: true,
},
},
},
{
name: "right side drops and adds column of different type",
ancestor: tbl(sch("CREATE TABLE t (id int PRIMARY KEY, c int, a int)")),
left: tbl(sch("CREATE TABLE t (id int PRIMARY KEY, c int, a int)")),
right: tbl(sch("CREATE TABLE t (id int PRIMARY KEY, c int, b text)")),
merged: tbl(sch("CREATE TABLE t (id int PRIMARY KEY, c int, b text)")),
},
}

var columnDefaultTests = []schemaMergeTest{
Expand Down Expand Up @@ -735,6 +744,27 @@ var typeChangeTests = []schemaMergeTest{
right: singleRow(1, "hello world", 1, "hello world"),
dataConflict: true,
},
{
name: "delete and schema change on left",
ancestor: singleRow(1, "test", 1, "test"),
left: nil,
right: singleRow(1, "test", 1, "test"),
merged: nil,
},
{
name: "schema change on left, delete on right",
ancestor: singleRow(1, "test", 1, "test"),
left: singleRow(1, "test", 1, "test"),
right: nil,
merged: nil,
},
{
name: "schema and value change on left, delete on right",
ancestor: singleRow(1, "test", 1, "test"),
left: singleRow(1, "hello", 1, "hello"),
right: nil,
dataConflict: true,
},
},
},
}
Expand Down Expand Up @@ -952,7 +982,7 @@ func testSchemaMergeHelper(t *testing.T, tests []schemaMergeTest, flipSides bool
require.NoError(t, err)
foundDataConflict = foundDataConflict || hasConflict
}
require.Equal(t, expectDataConflict, foundDataConflict)
require.True(t, foundDataConflict, "Expected data conflict, but didn't find one.")
} else {
for name, addr := range exp {
a, ok := act[name]
Expand All @@ -978,17 +1008,19 @@ func testSchemaMergeHelper(t *testing.T, tests []schemaMergeTest, flipSides bool
runTest(t, test, false)
})
for _, data := range test.dataTests {
test.ancestor.rows = data.ancestor
test.left.rows = data.left
test.right.rows = data.right
test.merged.rows = data.merged
test.skipNewFmt = test.skipNewFmt || data.skip
test.skipFlipOnNewFormat = test.skipFlipOnNewFormat || data.skipFlip
// Copy the test so that the values from one data test don't affect subsequent data tests.
dataDest := test
dataDest.ancestor.rows = data.ancestor
dataDest.left.rows = data.left
dataDest.right.rows = data.right
dataDest.merged.rows = data.merged
dataDest.skipNewFmt = dataDest.skipNewFmt || data.skip
dataDest.skipFlipOnNewFormat = dataDest.skipFlipOnNewFormat || data.skipFlip
t.Run(data.name, func(t *testing.T) {
if data.skip {
t.Skip()
}
runTest(t, test, data.dataConflict)
runTest(t, dataDest, data.dataConflict)
})
}
})
Expand Down
7 changes: 7 additions & 0 deletions go/libraries/doltcore/schema/schema.go
Original file line number Diff line number Diff line change
Expand Up @@ -77,8 +77,15 @@ type Schema interface {
GetMapDescriptors() (keyDesc, valueDesc val.TupleDesc)

// GetKeyDescriptor returns the key tuple descriptor for this schema.
// If a column has a type that can't appear in a key (such as "address" columns),
// that column will get converted to equivalent types that can. (Example: text -> varchar)
GetKeyDescriptor() val.TupleDesc

// GetKeyDescriptorWithNoConversion returns the a descriptor for the columns used in the key.
// Unlike `GetKeyDescriptor`, it doesn't attempt to convert columns if they can't appear in a key,
// and returns them as they are.
GetKeyDescriptorWithNoConversion() val.TupleDesc

// GetValueDescriptor returns the value tuple descriptor for this schema.
GetValueDescriptor() val.TupleDesc

Expand Down
15 changes: 12 additions & 3 deletions go/libraries/doltcore/schema/schema_impl.go
Original file line number Diff line number Diff line change
Expand Up @@ -409,6 +409,15 @@ func (si *schemaImpl) GetMapDescriptors() (keyDesc, valueDesc val.TupleDesc) {

// GetKeyDescriptor implements the Schema interface.
func (si *schemaImpl) GetKeyDescriptor() val.TupleDesc {
return si.getKeyColumnsDescriptor(true)
}

// GetKeyDescriptorWithNoConversion implements the Schema interface.
func (si *schemaImpl) GetKeyDescriptorWithNoConversion() val.TupleDesc {
return si.getKeyColumnsDescriptor(false)
}

func (si *schemaImpl) getKeyColumnsDescriptor(convertAddressColumns bool) val.TupleDesc {
if IsKeyless(si) {
return val.KeylessTupleDesc
}
Expand All @@ -420,17 +429,17 @@ func (si *schemaImpl) GetKeyDescriptor() val.TupleDesc {
sqlType := col.TypeInfo.ToSqlType()
queryType := sqlType.Type()
var t val.Type
if queryType == query.Type_BLOB {
if convertAddressColumns && queryType == query.Type_BLOB {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to your changes... but... JSON is also an address type (as of very recently at least!). If it's possible to add that in cleanly here, that would be awesome, if you aren't comfortable rolling that into this PR, could you plz open an issue to track adding it here so we don't forget to update this?

t = val.Type{
Enc: val.Encoding(EncodingFromSqlType(query.Type_VARBINARY)),
Nullable: columnMissingNotNullConstraint(col),
}
} else if queryType == query.Type_TEXT {
} else if convertAddressColumns && queryType == query.Type_TEXT {
t = val.Type{
Enc: val.Encoding(EncodingFromSqlType(query.Type_VARCHAR)),
Nullable: columnMissingNotNullConstraint(col),
}
} else if queryType == query.Type_GEOMETRY {
} else if convertAddressColumns && queryType == query.Type_GEOMETRY {
t = val.Type{
Enc: val.Encoding(serial.EncodingCell),
Nullable: columnMissingNotNullConstraint(col),
Expand Down
Loading
Loading