-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Parse HogQL with C++ for a huge speedup #17659
Merged
Merged
Changes from 67 commits
Commits
Show all changes
78 commits
Select commit
Hold shift + click to select a range
46d04cf
Add partial C++ parser of HogQL
Twixes a8dc5d4
Support all the rules and add proper error handling
Twixes e53ec36
Use `AlignAfterOpenBracket: BlockIndent`
Twixes 3be9872
Reuse all the parser tests for the C++ backend
Twixes 7906414
Fix relationship between headers and implementations
Twixes b866ebf
Add more error handling and fix minor issues
Twixes b35c93d
Build both Python and C++ parsers in the package.json script
Twixes f2a62ea
Update ARRAY JOIN error assertion
Twixes 102d0a1
Improve timeit
Twixes 6a695af
Move the C extension to the top level
Twixes ef5101a
Refactor `vector_to_list_string`
Twixes ec8ea8b
Build the parser on Linux
Twixes 4af40cc
Build wheels for the parser
Twixes b5101ac
Simplify Linux build and fix macOS
Twixes ea137d5
Fix Homebrew paths on x86 and don't fail fast in CI
Twixes c261a81
Set MACOSX_DEPLOYMENT_TARGET for C++20
Twixes 841bddb
Set up QEMU for Linux ARM builds
Twixes 2a8d760
Publish the wheels on PyPI
Twixes c7dcefe
Avoiding Linux ARM emulation in CI for now
Twixes 30a6257
Build sdist too
Twixes 37ae412
Revert Dockerfile changes
Twixes 6a4eece
Fix PyPI publish
Twixes 5bf907b
Add README and optimize sdist build
Twixes 9c4f952
Use setup.py directly instead of build
Twixes 39f1b52
Use PyPI hogql-parser instead of local
Twixes d4cd499
Also revert production-unit.Dockerfile
Twixes feccbd2
Fix sdist upload and add Linux ARM back
Twixes c142308
No Linux ARM build in the end
Twixes db1b7d9
Fix artifact uploading
Twixes 8c363d6
Do try building Linux ARM
Twixes e1453dd
Use `npm` in `grammar:build`
Twixes e194c89
Fix formatting of hogql_parser
Twixes e54387b
Build everything on macOS
Twixes c30afea
Revert "Build everything on macOS"
Twixes ca1fa3c
Use hogql-parser=0.1.1
Twixes 7e10a55
Fix dylib in macOS wheel
Twixes 713bfec
Bump hogql-parser version
Twixes bfed4f6
Fix missing module error
Twixes 1214e21
Delete timeit.py
Twixes 58df63b
Make error handling robust
Twixes 62e7497
Format the C++
Twixes cc7f5c6
Use `hogql-parser==0.1.1`
Twixes 66a980d
Fix reserved keyword error assertions
Twixes 0e53256
Use HEAD hogql_paresr in CI
Twixes cff8a67
Fix `apt` usage
Twixes 507b78b
Add some sudo in CI
Twixes bfe4603
Ensure package will be releasable before build
Twixes 0d089c4
Bump version to 0.1.3
Twixes 81d3c4b
Cover C++ `unquote_string` with tests
Twixes f3dfb89
Use BuildJet ARM runners for ARM builds
Twixes e8f94ed
Add some instructions
Twixes d0b6647
Add HogQL version check to backend CI
Twixes 729a897
Update requirements.txt
Twixes 826b9d4
Use `setuptools` instead of the deprecated `distutils`
Twixes 1dfd70c
Fix working dir in backend CI
Twixes 99c5bdf
Align ANTLR versions
Twixes 2da9925
Add test for "mismatched input"
Twixes 4922e64
Add types and bump version
Twixes 3051b6e
Comment instead of failing version check
Twixes c50bf47
Automate hogql-release version bump
Twixes 39edff4
Fix checkout token
Twixes 954e1ac
Merge branch 'master' into turbo-parser
Twixes d3c8f34
Don't build hogql-parser if there were no changes
Twixes 2b78d5d
Update query snapshots
github-actions[bot] 2f28676
Update query snapshots
github-actions[bot] e0854eb
Update query snapshots
github-actions[bot] 527ec7e
Update query snapshots
github-actions[bot] 19617f5
Improve documentation
Twixes 7a89bf2
Use new hogql-parser version
github-actions[bot] 4b5e8a4
Fix error start and end initialization
Twixes 4096807
Note `antlr4-cpp-runtime`
Twixes 5e5ba40
Also remove NUL chars in C++
Twixes debb5ed
Check ANTLR4 runtime archive checksum for security
Twixes 640c159
Note more decrefs to add
Twixes eca0d64
Add vector size checks
Twixes 6015bc4
Use new hogql-parser version
github-actions[bot] 4d4d6c0
Don't support the `start` arg in C++ `parse_expr`
Twixes 29980df
Use new hogql-parser version
github-actions[bot] File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -33,4 +33,4 @@ | |
!plugin-server/.prettierrc | ||
!share/GeoLite2-City.mmdb | ||
!hogvm/python | ||
!unit.json | ||
!unit.json |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
name: Release hogql-parser | ||
|
||
on: | ||
push: | ||
branches: | ||
- master | ||
paths: | ||
- hogql_parser/** | ||
- .github/workflows/build-hogql-parser.yml | ||
pull_request: | ||
paths: | ||
- hogql_parser/** | ||
- .github/workflows/build-hogql-parser.yml | ||
|
||
concurrency: | ||
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }} | ||
cancel-in-progress: true | ||
|
||
jobs: | ||
check-version: | ||
name: Check version legitimacy | ||
runs-on: ubuntu-22.04 | ||
outputs: | ||
parser_any_changed: ${{ steps.changed-files-yaml.outputs.parser_any_changed }} | ||
steps: | ||
- uses: actions/checkout@v4 | ||
with: | ||
fetch-depth: 0 | ||
|
||
- name: Check if hogql_parser/ has changed | ||
id: changed-files-yaml | ||
uses: tj-actions/changed-files@v39 | ||
with: | ||
since_last_remote_commit: true | ||
files_yaml: | | ||
parser: | ||
- hogql_parser/** | ||
|
||
- name: Notify about release needed | ||
if: steps.changed-files-yaml.outputs.parser_any_changed == 'true' | ||
shell: bash | ||
run: | | ||
published=$(curl -fSsl https://pypi.org/pypi/hogql-parser/json | jq -r '.info.version') | ||
local=$(python hogql_parser/setup.py --version) | ||
# TODO: Only comment if no comment alraedy exists for $local | ||
if [[ "$published" == "$local" ]]; then | ||
MESSAGE_BODY="It looks like the code of `hogql-parser` has changed since last push, but its version stayed the same at $local. 👀\nMake sure to resolve this in `hogql_parser/setup.py` before merging!" | ||
curl -s -u posthog-bot:${{ secrets.POSTHOG_BOT_GITHUB_TOKEN || secrets.GITHUB_TOKEN }} -X POST -d "{ \"body\": \"$MESSAGE_BODY\" }" "https://api.github.com/repos/${{ github.repository }}/issues/${{ github.event.pull_request.number }}/comments" | ||
fi | ||
|
||
build-wheels: | ||
name: Build wheels on ${{ matrix.os }} | ||
needs: check-version | ||
runs-on: ${{ matrix.os }} | ||
timeout-minutes: 30 | ||
if: ${{ needs.check-version.outputs.parser_any_changed == 'true' }} | ||
strategy: | ||
matrix: | ||
# As of October 2023, GitHub doesn't have ARM Actions runners… and ARM emulation is insanely slow | ||
# (20x longer) on the Linux runners (while being reasonable on the macOS runners). Hence, we use | ||
# BuildJet as a provider of ARM runners - this solution saves a lot of time and consequently some money. | ||
os: [ubuntu-22.04, buildjet-2vcpu-ubuntu-2204-arm, macos-12] | ||
|
||
steps: | ||
- uses: actions/checkout@v4 | ||
|
||
- if: ${{ !endsWith(matrix.os, '-arm') }} | ||
uses: actions/setup-python@v4 | ||
with: | ||
python-version: '3.11' | ||
|
||
- if: ${{ endsWith(matrix.os, '-arm') }} | ||
uses: deadsnakes/[email protected] # Unfortunately actions/setup-python@v4 just doesn't work on ARM! This does | ||
with: | ||
python-version: '3.11' | ||
|
||
- name: Build sdist | ||
if: matrix.os == 'ubuntu-22.04' # Only build the sdist once | ||
run: cd hogql_parser && python setup.py sdist | ||
|
||
- name: Install cibuildwheel | ||
run: python -m pip install cibuildwheel==2.16.* | ||
|
||
- name: Build wheels | ||
run: cd hogql_parser && python -m cibuildwheel --output-dir dist | ||
env: | ||
MACOSX_DEPLOYMENT_TARGET: '12' # A modern target allows us to use C++20 | ||
|
||
- uses: actions/upload-artifact@v3 | ||
with: | ||
path: | | ||
hogql_parser/dist/*.whl | ||
hogql_parser/dist/*.tar.gz | ||
if-no-files-found: error | ||
|
||
publish: | ||
name: Publish on PyPI | ||
needs: build-wheels | ||
environment: pypi-hogql-parser | ||
permissions: | ||
id-token: write | ||
runs-on: ubuntu-22.04 | ||
steps: | ||
- name: Fetch wheels | ||
uses: actions/download-artifact@v3 | ||
with: | ||
name: artifact | ||
path: dist/ | ||
|
||
- name: Publish package to PyPI | ||
uses: pypa/gh-action-pypi-publish@release/v1 | ||
|
||
- uses: actions/checkout@v4 | ||
with: | ||
token: ${{ secrets.POSTHOG_BOT_GITHUB_TOKEN }} | ||
ref: ${{ github.event.pull_request.head.ref }} | ||
|
||
- name: Update hogql-parser in requirements | ||
shell: bash | ||
run: | | ||
local=$(python hogql_parser/setup.py --version) | ||
sed -i "s/hogql-parser==.*/hogql-parser==${local}/g" requirements.in | ||
sed -i "s/hogql-parser==.*/hogql-parser==${local}/g" requirements.txt | ||
|
||
- uses: EndBug/add-and-commit@v9 | ||
with: | ||
add: '["requirements.in", "requirements.txt"]' | ||
message: 'Use new hogql-parser version' | ||
default_author: github_actions | ||
github_token: ${{ secrets.POSTHOG_BOT_GITHUB_TOKEN }} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
BasedOnStyle: Chromium | ||
ColumnLimit: 120 | ||
AlignAfterOpenBracket: BlockIndent |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Build | ||
build/ | ||
*.egg-info | ||
*.so | ||
dist/ |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth chekcing the
md5sum
here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, added that!