Releases: dolthub/dolt
1.41.3
Merged PRs
dolt
- 8102: Bug fix: binlog heartbeat
nextLogPosition
field
Heartbeat binlog events must have a correctNextLogPosition
field that matches up with the previous events that have been sent in the stream. If not, the replica will shutdown the binlog stream. #8087 fixed this issue, but didn't account for when a heartbeat event is sent after the initial Format Description event, but before any user initiated requests. The way to trigger this is tostart replica;
on the replica, then don't run any commands on the primary and let the first heartbeat go out after the binlog stream has been up for 30s. - 8101: go.mod: Migrate from gopkg.in/square/go-jose.v2 to gopkg.in/go-jose/go-jose.v2. Bump version. Picks up fix for CVE-2024-28180.
- 8088: [no-release-info] Add additional tests for manipulating large JSON documents and fix corner case bugs in JSON_LOOKUP and JSON_INSERT
This PR adds additional tests for calling JSON_INSERT on large JSON documents. It also fixes three issues with IndexedJsonDocuments:- Some operations are not supported by the new optimized implementation for JSON_LOOKUP, such as wildcards on array paths (eg
$[*]
). Instead of returning an error, we detect the error and fall back on the original implementation. - Attempting to insert a value into a document could cause an infinite loop.
- We would fail to read some keys from an IndexedJsonDocument's StaticMap if the document contained arrays.
- Some operations are not supported by the new optimized implementation for JSON_LOOKUP, such as wildcards on array paths (eg
go-mysql-server
- 2583: [stats] Disable histogram bucket merging for now because it mutated shared memory
Merging buckets in the current format is unsafe:- we collect statistics for an index where two buckets have overlapping values
- we execute a join using the index with overlapping values, and use a merge algorithm to combine those buckets. The merged bucket is synthetic, but the statistics used for the join is also synthetic, so this all works as expected.
- a future indexscan selects the compressed range from before, accessing one of the synthetic buckets created by the join
- we error
invalid bucket type: *stats.Bucket
at the end of the indexscan when adding the filtered histogram with a synthetic back to the implementor-type statistic
EditedmergeOverlappingBuckets
to not share memory, but also I'm not sure if merging buckets is a common performance win in most cases, so disabling for now
- 2580: Remove a duplicate column from information_schema
Just what it says on the tin. This duplicate column causes problems for DuckDB when attempting to connect to doltdb databases.
Closed Issues
Performance
Read Tests | MySQL | Dolt | Multiple |
---|---|---|---|
covering_index_scan | 2.07 | 2.97 | 1.4 |
groupby_scan | 13.7 | 17.32 | 1.3 |
index_join | 1.34 | 5.28 | 3.9 |
index_join_scan | 1.27 | 2.57 | 2.0 |
index_scan | 34.33 | 54.83 | 1.6 |
oltp_point_select | 0.18 | 0.46 | 2.6 |
oltp_read_only | 3.49 | 7.7 | 2.2 |
select_random_points | 0.33 | 0.77 | 2.3 |
select_random_ranges | 0.39 | 0.9 | 2.3 |
table_scan | 34.95 | 56.84 | 1.6 |
types_table_scan | 75.82 | 144.97 | 1.9 |
reads_mean_multiplier | 2.1 |
Write Tests | MySQL | Dolt | Multiple |
---|---|---|---|
oltp_delete_insert | 8.13 | 6.09 | 0.7 |
oltp_insert | 3.82 | 3.02 | 0.8 |
oltp_read_write | 8.58 | 13.95 | 1.6 |
oltp_update_index | 3.89 | 3.07 | 0.8 |
oltp_update_non_index | 3.89 | 3.02 | 0.8 |
oltp_write_only | 5.37 | 6.43 | 1.2 |
types_delete_insert | 7.7 | 6.67 | 0.9 |
writes_mean_multiplier | 1.0 |
TPC-C TPS Tests | MySQL | Dolt | Multiple |
---|---|---|---|
tpcc-scale-factor-1 | 98.43 | 32.02 | 3.1 |
tpcc_tps_multiplier | 3.1 |
Overall Mean Multiple | 2.07 |
---|
1.41.2
Merged PRs
dolt
- 8093: go/libraries/doltcore/remotestorage: internal/reliable: Recv: Fix a race where a completed state machine run and a canceled parent context could return a nil response message with a nil error.
- 8090: dolt sql shell slash redux
- 8087: Bug fix: Use correct log position in Dolt to MySQL replication heartbeats
Ensure heartbeat events sent from a Dolt primary to a MySQL replica have the latest nextLogPosition populated, otherwise the MySQL replica will close the binlog event stream. - 8086: define schema for dolt_schemas table
dolthub/doltgresql#454 depends on this PR. - 8082: /docker/{docker-entrypoint.sh,serverDockerfile}: change image to pass all args to dolt sql-server command
This PR fixes #8079. Now when runningdolthub/dolt-sql-server
if the argsdolt sql-server
are passed to the image, it will error. This will also prevent accidentally starting two Dolt servers in the container. - 8081: Fixed keyless secondary indexing for Doltgres
- Companion PR: dolthub/doltgresql#452
This PR fixes two issues with creating secondary indexes for Doltgres types. The first deals with handlers, as we were not adding anil
handler for the additional hash type, which would cause a panic as the counts were not equal (all non-Extended types should have a matching nil handler).
The second issue was due to the reuse of anExtendedTupleComparator
. When creating a newExtendedTupleComparator
, we pass in the previousTupleTypeHandler
to handle all non-Extended types. If the previousTupleTypeHandler
wasExtendedTupleComparator
and the new one was alsoExtendedTupleComparator
, then we could end up with a misinterpretation of data that could lead to incorrect results, as the handler assumed a different type than the actual type. This has been changed so thatExtendedTupleComparator
will always use the inner comparator of a previousExtendedTupleComparator
. For now this will always be the default comparator, but if we ever add another one, then this should properly handle that change.
- Companion PR: dolthub/doltgresql#452
- 8080: Re-enable doltgres sysbench scripts
- 8078: Archive index rework to make loading faster
The initial impl of archive indexes over optimized for space. This resulted in being 10x slower to load the index of archives than noms table files. To address this:- Dropped the end to end compression of the index
- Dropped the use of var ints for offset deltas and chunk refs
- Altered the use of byte span offsets, and instead used a end-offset approach which requires no delta processing on load.
- Used only slices of primitive types in the index memory. Constant time read path with a little more complexity, but allows us to read directly off disk into memory.
Testing indicates that on a 41 Gb archive file, this returned load performance to match classic table files, and the size of the index increased by about 350Mb (total ~ 1Gb)
- 8077: /go/libraries/doltcore/remotestorage/chunk_fetcher.go: fix nil pointer
We observe dolthubapi can crash with the following nil pointer error:This pr aims to prevent this.panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x48 pc=0x29e14d1] goroutine 399548427 [running]: github.com/dolthub/dolt/go/libraries/doltcore/remotestorage.fetcherRPCDownloadLocsThread.func3() external/com_github_dolthub_dolt_go/libraries/doltcore/remotestorage/chunk_fetcher.go:266 +0xf1 golang.org/x/sync/errgroup.(*Group).Go.func1() external/org_golang_x_sync/errgroup/errgroup.go:78 +0x56 created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 399548420 external/org_golang_x_sync/errgroup/errgroup.go:75 +0x96
- 8076: Bump golang.org/x/image from 0.10.0 to 0.18.0 in /go
Bumps golang.org/x/image from 0.10.0 to 0.18.0.Commits
3bbf4a6
tiff: Validate palette indices when parsing palette-color images6c5fa46
go.mod: update golang.org/x dependencies55c4ab6
go.mod: update golang.org/x dependencies0057a93
tiff: fix function name in comment9e190ae
webp: disallow multiple VP8X chunks445ab0e
go.mod: update golang.org/x dependencies240a51a
font/sfnt: support early version 0 OS/2 tablesc20bbc3
draw: simplify some calls to fmt.Fprintf491771c
draw: merge draw_go117.go into draw.go4aa0222
go.mod: update go directive to 1.18- Additional commits viewable in compare view
[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=golang.org/x/image&package-manager=go_modules&previous-version=0.10.0&new-version=0.18.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) ---Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/dolthub/dolt/network/alerts). - 8073: Added schema to index creation
In Doltgres, whenever we would create an index, we would use the empty schema as the destination (the default value for the schema name). This meant that the updated table with an index was saved into the empty schema, which is incorrect since Doltgres always has a schema. This adds the schema to index creation, along with several other locations that it should be in.
go-mysql-server
- 2583: [stats] Disable histogram bucket merging for now because it mutated shared memory
Merging buckets in the current format is unsafe:- we collect statistics for an index where two buckets have overlapping values
- we execute a join using the index with overlapping values, and use a merge algorithm to combine those buckets. The merged bucket is synthetic, but the statistics used for the join is also synthetic, so this all works as expected.
- a future indexscan selects the compressed range from before, accessing one of the synthetic buckets created by the join
- we error
invalid bucket type: *stats.Bucket
at the end of the indexscan when adding the filtered histogram with a synthetic back to the implementor-type statistic
EditedmergeOverlappingBuckets
to not share memory, but also I'm not sure if merging buckets is a common performance win in most cases, so disabling for now
- 2581: [stats] populate types for nil zeroing
- [2577](https://github.com/d...
1.41.1
Merged PRs
dolt
- 8062: add SchemaName to DatabaseSchema interface
Depends on dolthub/go-mysql-server#2569 - 8059: Add initial, no-op implementation for
ListBinaryLogs
API changes
Adds a simple no-op implementation forDoltBinlogPrimaryController.ListBinaryLogs
to keep it in sync with API changes in GMS.
Depends on dolthub/go-mysql-server#2567
go-mysql-server
- 2572: fix for
table_catalog
forinformation_schema.tables
- 2570: Added infoschema to privilege check
This fixes: #8052
In the analyzer, we make a check to determine if we're querying the information schema. The queries provided in the issue that do not work are regarded as subqueries, and these are explicitly ignored. This causes the privilege checker to look for the information schema tables by name, which is not the intended behavior.
This PR just adds an additional information schema check at a lower layer, which should remove the inconsistencies found from the queries provided in the issue. - 2569: add SchemaName to DatabaseSchema interface
This method returns schema name. Schema name for Doltgres and database name for Dolt. - 2567: Add support for
SHOW BINARY LOGS
When theSHOW BINARY LOGS
statement is executed, GMS will invoke the registeredBinlogPrimaryController
to ask it for the list of binary logs and send them back to the client. - 2566: Renamed index functions and enums to be public
This renames theindexScanOp
enum so that it's accessible from outside the package, and also replaces the type switch innewLeaf
with a replaceable function that can be overridden from outside the package to support types that are not native to GMS. - 2565: When a subroutine (like
CREATE PROCEDURE
contains a subqeury, correctly index into it.
Fixes #8028
We didn't have tests for constructs containing nested subroutines (likeCREATE PROCEDURE foo() CREATE PROCEDURE bar() SELECT 1;
This PR adds tests for those, but tests where we don't currently match MySQL are disabled. There's enough enabled tests to show that statements like this no longer cause a panic.
Making sure that we match MySQL for these statements should be done in a follow-up. - 2564: validate all string types for unicode
MySQL throws errors on invalid utf8 encoded strings. A previous PR detected those, but only for[]byte
string conversions. Prepared statements receive the string parameters as astring
type, so this PR moves the check for all conversions.
Additionally, it adds bindings toAssertErr
andAssertErrWithCtx
methods.
related pr: dolthub/go-mysql-server#2562
dolt pr: #8060
fixes #8040 - 2563: small changes to stats bucket counting
Joins don't track output MCVs anymore, they aren't in a format where they'd be useful anyways. Also assume MCVs are sorted for faster matching.
Closed Issues
1.41.0
This PR includes a backwards incompatible change to table statistics type encoding. Old statistics will not load with the new client, and will have to be manually updated with ANALYZE, or deleted with call dolt_stats_drop()
, or removed from the filesystem. Additionally, table statistics will load on startup by default for databases with fewer than 2 million rows. This is usually a one-time penalty of a few seconds.
Merged PRs
dolt
- 8056: Revert "Merge pull request #7940 from dolthub/macneale4/slash-cmds"
This reverts commit dd7071f, reversing changes made to 02f4503.
Reverting slash commands to fix: #8050 and #8022 - 8036: [statspro] Bootstrap database statistics once on startup
Load database statistics once on sql engine startup. If auto refresh is enabled, bootstrap is not performed. Behavior is on by default and can be turned off:(calling the command above with non-empty tables will still bootstrap statistics once)dolt sql -q "set @@PERSIST.dolt_stats_bootstrap_enabled = 1;"
This includes a small change to the way we encode column types for stats. We previously split using a comma","
, but enums and others can include commas so we use a line break now"/n"
. Old versions of stats will fail to load with the newer version.
Closed Issues
1.40.3
Merged PRs
dolt
- 8041: Truncate MCVs
Sort and truncate MCVs. Only keep values whose frequency is > twice the uniform frequency. This prevents us from manually summing non-outliers (which is expensive). - 8025: [prolly] Float keyRange increment bug
Incrementing the[n, n+1)
key range is a lot faster than a binary search with a tuple comparison callback. But it is subject to at least two edge cases where(n+1)
is not a valid stop range: (1)n+1 == n
, because of precision loss, and (2)n+1 < n
, because of overflow.
I added a series of GMS tests here: dolthub/go-mysql-server#2554. I couldn't find a DECIMAL failure case, I think DECIMAL always encodes a valid 1's place, and is not subject to overflow AFAICT.
go-mysql-server
- 2563: small changes to stats bucket counting
Joins don't track output MCVs anymore, they aren't in a format where they'd be useful anyways. Also assume MCVs are sorted for faster matching. - 2562: throw error on invalid utf8 encoding for strings
fixes #8040 - 2561: Fixes unexpected timezone converting when passing TIMESTAMP to unix_timestamp()
see #2111 - 2560: fix
GetField
indexes forUpdateJoin
withUpdate Trigger
This PR addresses an issue where we were incorrectly assigningGetField
indexes to anupdate join
query.
The fix involved:- adding a case for
triggerIters
torowUpdateAccumulator
- not picking
ResolvedTable
references underSubqueryAliases
when there are multiple - correctly setting the scope node for update joins
fixes #7943
- adding a case for
- 2559: Implement support for
DECLARE CONTINUE HANDLER
Fixes #7971
Previously, we would always terminate a block when encountering an error, even if there's a matching handler. Additionally, there was no mechanism to resume an error that happened inside aLOOP
construct.
This correctly implementsDECLARE CONTINUE HANDLER
by making the following changes:- Checks for handlers while executing the
Block
node instead of theBeginEnd
node. - For
DECLARE EXIT HANDLER
, theBlock
returns a special error value that propagates to the containingBeginEnd
node in order to terminate just that node.
- Checks for handlers while executing the
- 2556: Compute GetField indexes in procedure if-conditions.
Fixes #7994
It seems like we aren't running theassignExecIndexes
analysis pass on if-conditions when invoking stored procedures, which can cause execution failures if the condition has a sub-expression that has aGetField
node.
Fixing this revealed a related issue: when constructing the scope of the if-condition to determine the correct indexes, we were incorrectly including any columns from the condition's body in the scope. This was also causing incorrect index calculations forGetFields
in the if-condition, and is also fixed here.
(This only affected conditionals in stored procedures, not conditionals in triggers, because the analysis has a separate execution path for each;analyzeProcedureBodies
is not called for triggers.) - 2555: [stats] simplify stats comparison and mcv logic
Lazier comparison logic. Skip promoting/converting types when the index types match.
Remove an expensive and seemingly unnecessary bucket compression step that was re-evaluating mcvs. - 2552: support
VALUES
statement
fixes #8012
syntax: dolthub/vitess#354
vitess
- 355: New functions to create PreviousGtids events, and to update event checksum
- 354: support
VALUES
statement
This PR add syntax support forVALUES
statment as an alias forSELECT * FROM ...
.
We are still missingSELECT (VALUES ...)
(support for values as aselect_expression
).
syntax for #8012
Closed Issues
- 8034: Users are able to create branches with a "-" at the start. If you try to delete the branch after that, it looks like dolt is understanding it as an option for dolt_branch
- 8040: Inserting BINARY into VARCHAR(26) should result in error Incorrect string value
- 7943: Update with subquery join clause causes field index error/panic for table with trigger
- 8042: mysql:latest - missing error: [HY000][1524] Plugin 'mysql_native_password' is not loaded
- 7971: NOT FOUND handlers in procedures cause ERROR 1105 (HY000): EOF
- 7994: "Unable to find field with index" error on INSERT in procedure following IF (SELECT ...)
Performance
Read Tests | MySQL | Dolt | Multiple |
---|---|---|---|
covering_index_scan | 2.07 | 2.91 | 1.4 |
groupby_scan | 13.22 | 17.32 | 1.3 |
index_join | 1.34 | 5.37 | 4.0 |
index_join_scan | 1.27 | 2.22 | 1.7 |
index_scan | 34.33 | 53.85 | 1.6 |
oltp_point_select | 0.18 | 0.52 | 2.9 |
oltp_read_only | 3.49 | 8.28 | 2.4 |
select_random_points | 0.33 | 0.81 | 2.5 |
select_random_ranges | 0.39 | 0.97 | 2.5 |
table_scan | 34.95 | 54.83 | 1.6 |
types_table_scan | 75.82 | 142.39 | 1.9 |
reads_mean_multiplier | 2.2 |
Write Tests | MySQL | Dolt | Multiple |
---|---|---|---|
oltp_delete_insert | 8.13 | 6.09 | 0.7 |
oltp_insert | 3.82 | 3.02 | 0.8 |
oltp_read_write | 8.58 | 15.0 | 1.7 |
oltp_update_index | 3.89 | 3.13 | 0.8 |
oltp_update_non_index | 3.89 | 3.07 | 0.8 |
oltp_write_only | 5.37 | 6.55 | 1.2 |
types_delete_insert | 7.7 | 6.79 | 0.9 |
writes_mean_multiplier | 1.0 |
TPC-C TPS Tests | MySQL | Dolt | Multiple |
---|---|---|---|
tpcc-scale-factor-1 | 98.85 | 32.66 | 3.0 |
tpcc_tps_multiplier | 3.0 |
Overall Mean Multiple | 2.07 |
---|
1.40.2
Merged PRs
dolt
- 7963: Create pg_catalog by default for Doltgres
This makes it such thatpg_catalog
is created by default when Doltgres is using Dolt. In addition, adds a new function to hook into schema functionality.
go-mysql-server
- 2552: support
VALUES
statement
fixes #8012
syntax: dolthub/vitess#354 - 2551: unwrap parenthesized table references
fixes #8009
vitess
- 354: support
VALUES
statement
This PR add syntax support forVALUES
statment as an alias forSELECT * FROM ...
.
We are still missingSELECT (VALUES ...)
(support for values as aselect_expression
).
syntax for #8012 - 353: allow backticks in system and user variables
This PR allows the use of backticks in system and user variables.
We are more lenient than MySQL when it comes to backticks in set statements.
For example, we allowset @abc.
def= 10
, while MySQL throws an error.
This is because we treat this as a qualified column identifer and automatically strip the backticks.
test bump dolthub/go-mysql-server#2548
fixes #8010
Closed Issues
- 8012: VALUES statement not supported
Performance
Read Tests | MySQL | Dolt | Multiple |
---|---|---|---|
covering_index_scan | 2.07 | 2.81 | 1.4 |
groupby_scan | 13.46 | 17.32 | 1.3 |
index_join | 1.37 | 5.37 | 3.9 |
index_join_scan | 1.27 | 10.84 | 8.5 |
index_scan | 34.95 | 53.85 | 1.5 |
oltp_point_select | 0.18 | 0.46 | 2.6 |
oltp_read_only | 3.49 | 7.56 | 2.2 |
select_random_points | 0.34 | 0.75 | 2.2 |
select_random_ranges | 0.39 | 0.9 | 2.3 |
table_scan | 34.95 | 54.83 | 1.6 |
types_table_scan | 75.82 | 137.35 | 1.8 |
reads_mean_multiplier | 2.7 |
Write Tests | MySQL | Dolt | Multiple |
---|---|---|---|
oltp_delete_insert | 7.98 | 6.09 | 0.8 |
oltp_insert | 3.82 | 3.02 | 0.8 |
oltp_read_write | 8.58 | 13.95 | 1.6 |
oltp_update_index | 3.89 | 3.07 | 0.8 |
oltp_update_non_index | 3.89 | 3.02 | 0.8 |
oltp_write_only | 5.37 | 6.32 | 1.2 |
types_delete_insert | 7.7 | 6.67 | 0.9 |
writes_mean_multiplier | 1.0 |
TPC-C TPS Tests | MySQL | Dolt | Multiple |
---|---|---|---|
tpcc-scale-factor-1 | 99.66 | 26.16 | 3.8 |
tpcc_tps_multiplier | 3.8 |
Overall Mean Multiple | 2.50 |
---|
1.40.1
Previous releases 1.39.5 and 1.40.0 contained a bug when updating floats that would produce incorrect data. The change that caused this bug has been reverted in this release. Releases 1.39.5 and 1.40.0 have been deleted. If you are using those releases, we highly encourage you to upgrade to this release.
Note, only tables containing float types would be effected by the above bug and then only if a value was updated. The effected releases were only in the wild for 48 hours so we think the impact of this bug is small. If you are impacted by the bug, please come by our Discord and we will help further.
The bug was caught by our nightly fuzzer testing.
https://github.com/dolthub/fuzzer
Merged PRs
dolt
- 8024: Revert "[prolly] filteredIter optimization for exact prefix ranges (#…
…7966)"
This reverts commit 6ae4251. - 8018: Archive DDict cache and multi-file bug fixes
Two primary issues addressed in thedolt admin archive
command:- Add caching to dictionaries. This improved performance significantly.
- Fix multiple bugs related to having multiple table files. That was a gap in testing, so added a bats test for the command.
- 8001: Feature: Support
restore
subcommand indolt_backup()
Thedolt_backup()
stored procedure now supports therestore
subcommand. Customers can use this support to create a new database from an existing backup, or to sync an existing database from a backup. Note that therestore
subcommand currently requires root/superuser access to execute, since it can change database state (particular when the--force
argument is used).
Example usage to create a database nameddb1
from a backup on disk:Related to #7993call dolt_backup('restore', 'file:///opt/local/dolt-backups/db1', 'db1');
Fixes #6074 - 7999: Generate
TEMPORARY TABLE
tags the same as normalTABLE
s
This PR fixes this particular collision and makes collisions with other temporary tables more unlikely, probably, by using the deterministic random number generator used for generating tags for normal persisting tables.
fixes #7995 - 7990: support
auto_increment
on temporary tables
fixes #7972 - 7988: /.github/scripts/fuzzer/get-fuzzer-job-json.sh: add app label to fuzzer
- 7966: [prolly] filteredIter optimization for exact prefix ranges
Index range iteration uses a callback that is arbitrarily flexible but expensive. I changed index table access to only perform partial index scans for complete prefixes, and when the prefix fields equality conditions the generality of the index range callback is overkill. We just need to scan from the partial key(field1, ..., fieldn, nil, ...)
to one higher than the partial key(field1, fieldn+1, nil, ...)
.
This PR differentiates betweenRangeField.StrictKey
and.Equal
attributes to differentiate max-1-row and an equality restriction.
Still need to do follow-up tracing, but this is in response to the queries from TPC-C below. The string ones are much more common. Each of these use a set of equality filters than only partially completes a secondary index prefix. All of them spend ~5ms of CPU time executingRange.Matches
, which is mostly eliminated with this PR.SELECT o_entry_d FROM orders1 WHERE o_w_id = 1 AND o_d_id = 5 AND o_c_id = 1891 ORDER BY o_id DESC; SELECT c_id FROM customer1 WHERE c_w_id = 1 AND c_d_id= 6 AND c_last='ABLECALLYABLE' ORDER BY c_first; SELECT o_id, o_carrier_id, o_entry_d FROM orders1 WHERE o_w_id = 1 AND o_d_id = 9 AND o_c_id = 1709 ORDER BY o_id DESC
- 7914: Feature: Binlog replication
Initial support for Dolt to stream binlog events to a MySQL replica.
In this initial iteration, binlog events are streamed directly to connected replicas, instead of being written to a log file first. This enables customers to test out the initial binlog replication support, but it means that replicas will only receive the events that happen while they are connected, since they are not persisted in a log file yet. The next iteration will persist binlog events to a log file and will enable replicas to receive events that happened while they were not connected.
To enable binlog replication, you must persisted the system variables below. Similar to Dolt's other replication formats, the Dolt server must come up with the replication system variables set in order for replication to be enabled. You cannot set these system variables on a running Dolt sql-server to turn on binlog replication – you must persist the values and then restart the sql-server.Related to #7512SET @@PERSIST.log_bin=1; SET @@PERSIST.enforce_gtid_consistency=ON; SET @@PERSIST.gtid_mode=ON;
- 7912: Add
IndexedJsonDocument
, aJSONWrapper
implementation that stores JSON documents in a prolly tree with probabilistic hashing.
tl;dr: We store a JSON document in a prolly tree, where the leaf nodes of the tree are blob nodes with each contain a fragment of the document, and the intermediate nodes are address map nodes, where the keys describe a JSONPath.
The new logic for reading and writing JSON documents is cleanly separated into the following files:
IndexedJsonDocument - The newJSONWrapper
implementation. It holds the root hash of the prolly tree.
JsonChunker - A wrapper around a regular chunker. Used to write new JSON documents or apply edits to existing documents.
JsonCursor - A wrapper around a regular cursor, with added functionality allowing callers to seek to a specific location in the document.
JsonScanner - A custom JSON parser that tracks that current JSONPath.
JsonLocation - A custom representation of a JSON path suitable for use as a prolly tree key.
Each added file has additional documentation with more details about the individual components.
Throughout every iteration of this project, the core idea has always been to represent a JSON document as a mapping from JSONPath locations to the values stored at those locations, then we could store that map in a prolly tree and get all the benefits that we currently get from storing tables in prolly trees: fast diffing and merging, fast point lookups and mutations, etc.
This goal has three major challenges:- For deeply nested JSON documents, simply listing every JSONPath requires asymptotically more space than the original document.
- We need to do this in a way that doesn't compromise performance on simply reading JSON documents from a table, which I understand is the most common use pattern.
- Ideally, users should not need to migrate their databases, or update their clients in order to read newer dbs, or have to choose between different configurations based on their use case.
This design achieves all three of these requirements: - While it requires additional storage, this additional storage cannot exceed the size of the original document, and is in practice much smaller.
- It has indistinguishable performance for reading JSON documents from storage, while also allowing asymptotically faster diff and merge operations when the size of the changes is much smaller than the size of the document. (There is a cost: initial inserts of JSON documents are currently around 20% slower, but this is a one-time cost that does not impact subsequent reads and could potentially be optimized further.)
- Documents written by the new
JSONChunker
are backwards compatible with current Dolt binaries and can be read back by existing versions of Dolt. (Although they will have different hashes than equivalent documents that those versions would write.)
go-mysql-server
- 2551: unwrap parenthesized table references
fixes #8009 - 2546: Add support for tracking the
Aborted_connects
status variable
Adds support for MySQL'sAborted_connects
status variable.
Depends on: dolthub/vitess#351 - 2542: When casting json to a string, always call StringifyJSON.
This ensures we match MySQL.
We previously weren't calling StringifyJSON inConvertToString
because that same method was being used when printing JSON to the screen or a MySQL client, which favored speed over matching MySQL exactly. But for casts we must be precise.
By adding an extra case toStringType.SQL
we can distinguish between these cases and handle them properly. - 2541: resolve default values for views
This was somewhat of a regression caused by dolthub/go-mysql-server#2465.
However, before that PR views always hadNULL
as their default values, which did not match MySQL.
Now, we just resolve the default values in the schema, similar toResolvedTables
.
fixes #7997 - 2540: [planbuilder] More update join table name validation
- 2539: fix `UPDATE IGNO...
1.39.4
Merged PRs
dolt
- 7979: Allow pulling from a remote if the only changes are to ignored tables.
This loosens the restrictions for pulling from remotes: instead of requiring that there are no working changes, we allow working changes but only to ignored tables.
If there are conflicting changes to ignored tables (which is rare, since ignored tables shouldn't be pushed to remotes in the first place), this will still abort the pull later when it's computing the new root hash.
This is, IMO, better than what git does. Git will just overwrite the ignored files without warning. - 7977: integration-tests/bats: Add some waits on the remotesrv_pid exit for bats tests which spawn a background remotesrv.
Attempts to fix some observed flakiness in bats tests. - 7975: implemented dolt_hashof_db function
Implementeddolt_hashof_db()
function which returns the root hash of a database. - 7967: go/libraries/doltcore/remotesrv: grpc.go: Respect X-Forwarded-Proto when generating HTTP download links in the gRPC server.
Fixes #7961. - 7965: Bug fix: Allow unresolved FKs to merge with resolved FKs
Dolt foreign keys can be in a "resolved" or "unresolved" state. A resolved FK has resolved the table and columns it references, and contains unique identifiers for the referenced columns. An unresolved FK only knows the table and column names that it references. Because of these two states, the way Dolt matches FKs differs depending on whether each key is resolved or unresolved.
Dolt has logic (ForeignKeyCollection.GetMatchingKey()
) to match a resolved FK with an unresolved FK, but this function didn't support matching an unresolved FK with a resolved FK. That code assumed that theForeignKeyCollection
would always be from an ancestor root value and therefore it wasn't valid for the ancestor to be resolved, while a more recent root value was unresolved. However, since then, we have used this logic in our root merging logic that breaks that assumption.
In a multi-session environment, one client can create a table with an unresolved FK, then a second session can load that table, resolve the FK, and commit the changes to disk. If the first session still contains references to the unresolved FK, then when it goes to commit, Dolt's merge logic wasn't able to match the unresolved FK in the session with the resolved FK that was written to disk, and the FK constraints were silently dropped from the new table version.
This PR adds a new parameter toForeignKeyCollection.GetMatchingKey()
to allow the caller to control whether a resolved FK should match with unresolved FKs or not. This meansForeignKeyCollection.GetMatchingKey()
doesn't have to assume its receiver instance is a ForeignKeyCollection from an ancestor root value, and instead the caller is responsible for specifying which behavior is needed.
Related to #7956 - 7955: [types] cache frequently read value store chunks (like working set roots)
- 7952: Made drop table work with search path
go-mysql-server
- 2537: Update generated index names to match MySQL
A customer pointed out that when we add indexes with generated names, we don't generate the same names as MySQL. Specifically:- When a FK is added with an explicit constraint name, that name should be used to name the automatically created index, if one is created.
- Secondary indexes are named after the first column in the index in MySQL, not by joining all the columns together.
Customer issue: https://github.com//issues/7960 Dolt PR with test fixes: https://github.com//pull/7974 - 2536: Rename generated FK names when their table is renamed
Updates ourrename table
logic to match MySQL's behavior of updating auto-generated foreign key names to match the new table name.
Customer issue: #7959
Dolt companion PR: #7968 - 2535: Fix UPDATE JOIN matchedRows
fixes: #7957
Main question is how thorough we want to make the child iter check. Should all iterators implement aChildIter
interface? - 2534: Implement row alias expressions (
INSERT ... VALUES (...) AS new_tbl ON DUPLICATE x = new_tbl.x
)
When inserting values, the user can specify names for both the source table and columns which are used inON DUPLICATE
expressions. It looks like either of the below options:This replaces the previous (now-deprecated) syntax:INSERT INTO tbl VALUES (1, 2) AS tbl_new ON DUPLICATE KEY b = tbl_new.b; INSERT INTO tbl VALUES (1, 2) AS tbl_new(a_new, b_new) ON DUPLICATE KEY b = b_new;
Supporting both syntaxes together was non-trivial because it means there's now two different ways to refer to the same column. While he had an existing way to "redirect" one column name to another, this only worked for unqualified names (no table name), and it overrode the normal name resolution rules, which meant we would fail to detect cases that should be seen as ambiguous.INSERT INTO tbl VALUES (1, 2) ON DUPLICATE KEY b = VALUES(b);
Previously, we would implement references to the inserted values by using a special table named "__new_ins". I implemented this by keeping that as the default, but using the row alias instead of one was provided. We then create a map from the destination table names to column aliases, and use that map to rewrite expressions that appear inside the VALUES() function. - 2533: Table name validation folds strings
fixes: #7958 - 2532: Move
json_function_tests.go
and json tests that depend on it to their own package.
This ensures that non-test code insql/expression/function/json
doesn't depend ontestify
, which is a library that we only want to depend on for tests. - 2530: Bug Fix: Index name case-insensitivity
MySQL index names are case-insensitive, but GMS' memory implementation wasn't handling them that way. This makes index names case-insensitive.
Related to #7945
Closed Issues
- 7970: Add support for
DOLT_HASHOF_DB()
- 7957: Dolt returns wrong number of affected rows for UPDATE ... JOIN with clientFoundRows=true
- 7958: UPDATE ... JOIN fails for tables containing capital letters
- 7945: Dolt panics when renaming index containing capital letters
- 7944: Dolt panics on subquery in IF statement in procedure
Performance
Read Tests | MySQL | Dolt | Multiple |
---|---|---|---|
covering_index_scan | 2.11 | 2.97 | 1.4 |
groupby_scan | 13.22 | 17.32 | 1.3 |
index_join | 1.34 | 5.47 | 4.1 |
index_join_scan | 1.27 | 2.26 | 1.8 |
index_scan | 34.33 | 54.83 | 1.6 |
oltp_point_select | 0.18 | 0.52 | 2.9 |
oltp_read_only | 3.55 | 8.28 | 2.3 |
select_random_points | 0.34 | 0.83 | 2.4 |
select_random_ranges | 0.39 | 0.97 | 2.5 |
table_scan | 34.33 | 54.83 | 1.6 |
types_table_scan | 77.19 | 139.85 | 1.8 |
reads_mean_multiplier | 2.2 |
Write Tests | MySQL | Dolt | Multiple |
---|---|---|---|
oltp_delete_insert | 8.13 | 6.21 | 0.8 |
oltp_insert | 3.82 | 3.02 | 0.8 |
oltp_read_write | 8.58 | 15.0 | 1.7 |
oltp_update_index | 3.89 | 3.13 | 0.8 |
oltp_update_non_index | 3.89 | 3.07 | 0.8 |
oltp_write_only | 5.47 | 6.55 | 1.2 |
types_delete_insert | 7.7 | 6.79 | 0.9 |
writes_mean_multiplier | 1.0 |
TPC-C TPS Tests | MySQL | Dolt | Multiple |
---|---|---|---|
tpcc-scale-factor-1 | 98.38 | 26.01 | 4.0 |
tpcc_tps_multiplier | 3.8 |
Overall Mean Multiple | 2.33 |
---|
1.39.3
Merged PRs
dolt
- 7947: Bug Fix: Index name case-insensitivity
MySQL index names are case-insensitive, but Dolt's index implementation wasn't handling them that way. This makes index names case-insensitive.
Customer issue: #7945
New enginetests added in GMS PR: dolthub/go-mysql-server#2530 - 7941: Bug fix for AllSchemas method for schemas
- 7940: dolt sql slash cmds
Add the ability to run some dolt commands directly from the dolt sql shell.
Fixes: #6874 - 7933: Update get-mysql-dolt-job-json.sh
TPS comparison is inverted compared t latency_p95 comparison. - 7931: Update get-mysql-dolt-job-json.sh
go-mysql-server
- 2531: Bug Fix: Finalize subqueries in
IfConditional
s when applying stored procedures
When applying a stored procedure to aCALL
statement, we weren't callingfinalizeSubqueries()
on any subqueries inIfConditional
expressions, which caused the subquery to not have aNodeExecBuilder
populated.
Customer issue: #7944 - 2529: Fix global decimal.MarshalJSONWithoutQuotes overwrite
Thedecimal.MarshalJSONWithoutQuotes
is a global variable.
By setting this value then this can cause problems with any other code that does not expect this value to be changed.
Instead using a custom encoder to ensure that the marshalling behaviour is as expected without changing the global value ensure that this will not cause compatibility issues with other projects.
This code is covered both by existing tests, and an additional one in this PR.
(if the custom encode switch case is not added, but the global variables are, then the tests fail). - 2528: Bug fix for unwrapping a privileged db
- 2524: Adding
@@max_binlog_size
system variable
https://dev.mysql.com/doc/refman/8.0/en/replication-options-binary-log.html#sysvar_max_binlog_size - 2523: Added additional analyzer hooks for integrators
- 2522: More INSERT short-circuits
Only run an "on update" code block when expressions are non-nil. Directly compare sql mode default string, rather than lowercasing every time. - 2519: IndexedTableAccess gets indexing fast path
vitess
- 350: Refactoring
BinlogStream
type intoBinlogMetadata
Themysql.BinlogStream
type from Vitess was a little awkward to use, and seems to have been mostly intended as test code. This gives it a more descriptive name and makes it a little easier to pass around struct copies without concurrency issues from a shared instance. - 349: Fixed timestamp bindvar formatting to match MySQL string expectation
- 348: Allowing caching plugin to be specified in string quotes
TheCREATE USER ... IDENTIFIED WITH
syntax (MySQL ref) allows the caching plugin to be specified in string quotes, but our parser only supported identifier quotes.
This came up as part of binlog replication testing – MySQL was sending aCREATE USER
statement from the primary to a Dolt replica, but Dolt wasn't able to parse the statement because of the use of string quotes around the caching plugin name.
Closed Issues
- 6874: Embed cli command in
dolt sql
- 2289: First Unique Key in a keyless table should be represented as a primary key
Performance
Read Tests | MySQL | Dolt | Multiple |
---|---|---|---|
covering_index_scan | 2.07 | 2.97 | 1.4 |
groupby_scan | 13.22 | 17.01 | 1.3 |
index_join | 1.34 | 5.28 | 3.9 |
index_join_scan | 1.27 | 2.22 | 1.7 |
index_scan | 34.95 | 52.89 | 1.5 |
oltp_point_select | 0.18 | 0.5 | 2.8 |
oltp_read_only | 3.49 | 8.13 | 2.3 |
select_random_points | 0.34 | 0.81 | 2.4 |
select_random_ranges | 0.39 | 0.95 | 2.4 |
table_scan | 34.95 | 54.83 | 1.6 |
types_table_scan | 75.82 | 137.35 | 1.8 |
reads_mean_multiplier | 2.1 |
Write Tests | MySQL | Dolt | Multiple |
---|---|---|---|
oltp_delete_insert | 7.98 | 6.21 | 0.8 |
oltp_insert | 3.82 | 3.07 | 0.8 |
oltp_read_write | 8.58 | 14.73 | 1.7 |
oltp_update_index | 3.89 | 3.19 | 0.8 |
oltp_update_non_index | 3.89 | 3.13 | 0.8 |
oltp_write_only | 5.37 | 6.55 | 1.2 |
types_delete_insert | 7.7 | 6.79 | 0.9 |
writes_mean_multiplier | 1.0 |
TPC-C TPS Tests | MySQL | Dolt | Multiple |
---|---|---|---|
tpcc-scale-factor-1 | 99.71 | 25.77 | 4.2 |
tpcc_tps_multiplier | 3.9 |
Overall Mean Multiple | 2.33 |
---|
1.39.2
Merged PRs
dolt
- 7930: Bump mysql2 from 3.9.7 to 3.9.8 in /integration-tests/mysql-client-tests/node
Bumps mysql2 from 3.9.7 to 3.9.8. - 7929: dolt fetch default spec from empty repo should return silently
Git fetch returns without error when you fetch the default refspec. When you fetch a specific ref you get an error. Dolt now matches this behavior.
Fixes: #7928 - 7925: apply
filter-branch
changes to working/staged changes
This PR adds support for a--apply-to-uncommitted
option todolt filter-branch
, which applies thefilter-branch
changes to the working and staged roots.
fixes #7902 - 7923: [dsess] Cache checks lookup for TPC-C update
- 7922: [writer] skip more deserialization steps in getTableWriter
- 7900: prevent dolt filter branch when it would overwrite unchecked branch's working set
Turns out other branches can have working sets, and dolt-filter branch would drop those. PR prevents that from happening.
adding tests to comments I missed here:
#7895 - 7898: Added workflow for checking DoltgreSQL
This adds a new workflow that runs a subset of the tests in DoltgreSQL to check for any major integration errors. The workflow does not fail if errors are encountered. Instead, it creates a comment stating that failures were found. If no failures were found, then no comment is made. - 7892: dolt admin archive
This hidden admin command will convert the table files in oldgen into archive files, then update the manifest so that you can run queries against the archive for performance testing. Currently we assume thatdolt gc
has been run immediately prior to using this command.
After the build is complete, we lookup every chunk in the archive using the index of the originating table file. We then verify each chunk's key checks out. If this verification fails, exit status 1.
Lot of rough edges still:- Currently no feedback as the build progresses. This is annoying because it can take a fair amount of time
- ChunkSource interface is single threaded, so getMany and hasMany are not going to perform well.
- Lacking checks to ensure that the server isn't running and we have the LOCK on oldgen.
- No bats tests, and this is kind of a temporary thing. There are go tests on key bits.
- 7863: Use the search path to resolve table names in Doltgres
Doltgres enables the UseSearchPath global at startup, which triggers this behavior.
This is a shim to get a proof of concept of this behavior working faster. A better solution, coming next, involves making this behavior pluggable and putting this logic in the Doltgres package, not in Dolt.
Companion PRs:
dolthub/go-mysql-server#2498
dolthub/doltgresql#269
go-mysql-server
- 2520: Default sql mode for common path
Bit strange & verbose, but has a noticeable effect for small queries.
perf here: #7915 - 2519: IndexedTableAccess gets indexing fast path
- 2518: Short circuit for update/delete
Simple updates and deletes skip most of analysis.
perf here: #7907 - 2517: Improve correctness and error messages for JSON functions.
MySQL doesn't do this and neither should we.
MySQL:The only time we should be coercing a JSON-null document into SQL-null is for JSON_EXTRACT (for paths other than "$") and JSON_VALUE (for all paths). But these are already handled separately.mysql> select JSON_INSERT("null", "$.a", 1); +-------------------------------+ | JSON_INSERT("null", "$.a", 1) | +-------------------------------+ | null | +-------------------------------+ 1 row in set (0.00 sec) mysql> select JSON_INSERT("null", "$.a", 1) is null; +---------------------------------------+ | JSON_INSERT("null", "$.a", 1) is null | +---------------------------------------+ | 0 | +---------------------------------------+
- 2515: Zachmu/schemas2 merge
- 2513: Added workflows for checking integrators
This adds a new workflow that runs a subset of tests in Dolt and DoltgreSQL to check for any major integration errors. The workflows do not fail if errors are encountered. Instead, they'll create a comment stating which projects had failures. If no failures were found, then no comment is made. - 2498: New interfaces for resolving table names for databases with schemas
This is a proof of concept to get schema resolution working quickly, and I'm not super happy with the separation of concerns. A better solution would implement table name resolution in the Catalog directly, rather than in the integrator. That effort is significantly hindered by the Catalog being a concrete analyzer implementation with many analyzer-specific details that can't be easily substituted for another implementation. The longer term plan is to perform the extensive refactoring necessary to make the relevant parts of the Catalog swappable, rather than (effectively) having to swap only DatabaseProvider and friends.
Closed Issues
- 7902:
filter-branch
option to apply query toWORKING
andSTAGED
roots - 7928: CLI
dolt fetch <remote>
failed to use the defaultref spec
- 7897: Pomelo Entity Framework connector is not able to commit changes
- 7909: [Question] How to
init
Dolt database programatically?
Performance
Read Tests | MySQL | Dolt | Multiple |
---|---|---|---|
covering_index_scan | 2.11 | 2.97 | 1.4 |
groupby_scan | 13.22 | 17.32 | 1.3 |
index_join | 1.34 | 5.18 | 3.9 |
index_join_scan | 1.27 | 2.18 | 1.7 |
index_scan | 33.72 | 52.89 | 1.6 |
oltp_point_select | 0.17 | 0.5 | 2.9 |
oltp_read_only | 3.36 | 8.13 | 2.4 |
select_random_points | 0.32 | 0.8 | 2.5 |
select_random_ranges | 0.38 | 0.95 | 2.5 |
table_scan | 34.33 | 54.83 | 1.6 |
types_table_scan | 73.13 | 137.35 | 1.9 |
reads_mean_multiplier | 2.2 |
Write Tests | MySQL | Dolt | Multiple |
---|---|---|---|
oltp_delete_insert | 7.98 | 6.21 | 0.8 |
oltp_insert | 3.75 | 3.07 | 0.8 |
oltp_read_write | 8.43 | 15.0 | 1.8 |
oltp_update_index | 3.82 | 3.19 | 0.8 |
oltp_update_non_index | 3.82 | 3.13 | 0.8 |
oltp_write_only | 5.37 | 6.55 | 1.2 |
types_delete_insert | 7.7 | 6.91 | 0.9 |
writes_mean_multiplier | 1.0 |
TPC-C TPS Tests | MySQL | Dolt | Multiple |
---|---|---|---|
tpcc-scale-factor-1 | 101.2 | 25.57 | 4.1 |
tpcc_tps_multiplier | 0.3 |
Overall Mean Multiple | 1.17 |
---|