-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Replace ULID raw record ID with UUID v7 #300
Conversation
perf improvement: uuid7str() now takes 4us vs previous 9us: ULID() 4us + str() 5us
WalkthroughWalkthroughThis update replaces the usage of ULID with UUID version 7 for generating unique IDs in the Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files ignored due to path filters (1)
poetry.lock
is excluded by!**/*.lock
Files selected for processing (2)
- airbyte/records.py (2 hunks)
- pyproject.toml (1 hunks)
Additional comments not posted (6)
airbyte/records.py (2)
75-75
: Verify the new import.Ensure that the
uuid_extensions
package is correctly installed anduuid7str
is the correct function for generating UUID v7 strings.
145-145
: Ensure correct integration of UUID v7.The unique ID generation has been switched to
uuid7str()
. Verify that this change is correctly integrated and that the new IDs meet the required specifications.pyproject.toml (4)
48-48
: Verify the addition of thetypes-pyyaml
package.Ensure that the
types-pyyaml
package is correctly added and compatible with the project.Verification successful
The
types-pyyaml
package has been correctly added to thepyproject.toml
file.
- The package is listed with the version
^6.0.12.12
.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the correct addition of the `types-pyyaml` package. # Test: Check if `types-pyyaml` package is installed. pip show types-pyyamlLength of output: 94
Script:
#!/bin/bash # Description: Verify the correct addition of the `types-pyyaml` package in the pyproject.toml file. # Search for the `types-pyyaml` entry in the pyproject.toml file. grep 'types-pyyaml' pyproject.tomlLength of output: 62
48-48
: Verify the adjustment of thesqlalchemy-bigquery
package version constraint.Ensure that the new version constraint for the
sqlalchemy-bigquery
package is correctly applied and compatible with the project.Verification successful
The
sqlalchemy-bigquery
package version constraint has been correctly adjusted.The
sqlalchemy-bigquery
package is specified with the version "1.9.0" and a Python constraint "<3.13" in thepyproject.toml
file, which aligns with the intended adjustment.
pyproject.toml
:sqlalchemy-bigquery = { version = "1.9.0", python = "<3.13" }Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the adjustment of the `sqlalchemy-bigquery` package version constraint. # Test: Check if the `sqlalchemy-bigquery` package is installed with the correct version constraint. pip show sqlalchemy-bigqueryLength of output: 101
Script:
#!/bin/bash # Description: Verify the adjustment of the `sqlalchemy-bigquery` package version constraint by inspecting the pyproject.toml file. # Search for the sqlalchemy-bigquery package entry in the pyproject.toml file and extract its version constraint. rg 'sqlalchemy-bigquery' -A 1 pyproject.tomlLength of output: 218
48-48
: Verify the addition of theuuid7
package.Ensure that the
uuid7
package is correctly added and compatible with the project.Verification successful
The
uuid7
package has been correctly added and is being utilized in the project.
pyproject.toml
:uuid7 = "^0.1.0"
poetry.lock
: Contains references touuid7
package files.airbyte/records.py
: Contains code that usesuuid7
.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the correct addition of the `uuid7` package. # Test: Check if `uuid7` package is installed. pip show uuid7Length of output: 87
Script:
#!/bin/bash # Description: Verify the correct addition of the `uuid7` package. # Check if `uuid7` is listed in the `pyproject.toml` file. rg 'uuid7' pyproject.toml # Check for any references to `uuid7` in the codebase. rg 'uuid7' .Length of output: 522
48-48
: Verify the removal of theulid
package.Ensure that the
ulid
package is fully removed and no longer used in the project.
Summary
Improved unique ID generation by switching from ULID to a custom UUID method, ensuring more robust and unique identifier creation.
Chores
Updated dependencies in pyproject.toml, including the addition of the uuid7 and types-pyyaml packages, while removing ulid.
In the previous flamegraph, the ULID creation was taking approx. 4us, plus up to 5us to convert to
str
type. TheUUID v7
implementation generates a UUID in one step in approx 2us. This cuts off approximately 7us out of ~26us processing time per record, or up to 30% of the per-record processing time.Now (2us of 19us total):
Previous (4+5us, combined 9us of 27us total):
Notes:
ulid
library, where we are/were actually usingpython-ulid
.New benchmark (41K records / sec using the
e2e
source):poetry run python ./examples/run_perf_test_reads.py -e=6 --source=e2e