-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Parse HogQL with C++ for a huge speedup #17659
Changes from 76 commits
46d04cf
a8dc5d4
e53ec36
3be9872
7906414
b866ebf
b35c93d
f2a62ea
102d0a1
6a695af
ef5101a
ec8ea8b
4af40cc
b5101ac
ea137d5
c261a81
841bddb
2a8d760
c7dcefe
30a6257
37ae412
6a4eece
5bf907b
9c4f952
39f1b52
d4cd499
feccbd2
c142308
db1b7d9
8c363d6
e1453dd
e194c89
e54387b
c30afea
ca1fa3c
7e10a55
713bfec
bfed4f6
1214e21
58df63b
62e7497
cc7f5c6
66a980d
0e53256
cff8a67
507b78b
bfe4603
0d089c4
81d3c4b
f3dfb89
e8f94ed
d0b6647
729a897
826b9d4
1dfd70c
99c5bdf
2da9925
4922e64
3051b6e
c50bf47
39edff4
954e1ac
d3c8f34
2b78d5d
2f28676
e0854eb
527ec7e
19617f5
7a89bf2
4b5e8a4
4096807
5e5ba40
debb5ed
640c159
eca0d64
6015bc4
4d4d6c0
29980df
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -33,4 +33,4 @@ | |
!plugin-server/.prettierrc | ||
!share/GeoLite2-City.mmdb | ||
!hogvm/python | ||
!unit.json | ||
!unit.json |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
name: Release hogql-parser | ||
|
||
on: | ||
push: | ||
branches: | ||
- master | ||
paths: | ||
- hogql_parser/** | ||
- .github/workflows/build-hogql-parser.yml | ||
pull_request: | ||
paths: | ||
- hogql_parser/** | ||
- .github/workflows/build-hogql-parser.yml | ||
|
||
concurrency: | ||
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }} | ||
cancel-in-progress: true | ||
|
||
jobs: | ||
check-version: | ||
name: Check version legitimacy | ||
runs-on: ubuntu-22.04 | ||
outputs: | ||
parser_any_changed: ${{ steps.changed-files-yaml.outputs.parser_any_changed }} | ||
steps: | ||
- uses: actions/checkout@v4 | ||
with: | ||
fetch-depth: 0 # Fetching all for comparison since last push (not just last commit) | ||
|
||
- name: Check if hogql_parser/ has changed | ||
id: changed-files-yaml | ||
uses: tj-actions/changed-files@v39 | ||
with: | ||
since_last_remote_commit: true | ||
files_yaml: | | ||
parser: | ||
- hogql_parser/** | ||
|
||
- name: Notify about release needed | ||
if: steps.changed-files-yaml.outputs.parser_any_changed == 'true' | ||
shell: bash | ||
run: | | ||
published=$(curl -fSsl https://pypi.org/pypi/hogql-parser/json | jq -r '.info.version') | ||
local=$(python hogql_parser/setup.py --version) | ||
# TODO: Only comment if no comment alraedy exists for $local | ||
if [[ "$published" == "$local" ]]; then | ||
MESSAGE_BODY="It looks like the code of `hogql-parser` has changed since last push, but its version stayed the same at $local. 👀\nMake sure to resolve this in `hogql_parser/setup.py` before merging!" | ||
curl -s -u posthog-bot:${{ secrets.POSTHOG_BOT_GITHUB_TOKEN || secrets.GITHUB_TOKEN }} -X POST -d "{ \"body\": \"$MESSAGE_BODY\" }" "https://api.github.com/repos/${{ github.repository }}/issues/${{ github.event.pull_request.number }}/comments" | ||
fi | ||
|
||
build-wheels: | ||
name: Build wheels on ${{ matrix.os }} | ||
needs: check-version | ||
runs-on: ${{ matrix.os }} | ||
timeout-minutes: 30 | ||
if: ${{ needs.check-version.outputs.parser_any_changed == 'true' }} | ||
strategy: | ||
matrix: | ||
# As of October 2023, GitHub doesn't have ARM Actions runners… and ARM emulation is insanely slow | ||
# (20x longer) on the Linux runners (while being reasonable on the macOS runners). Hence, we use | ||
# BuildJet as a provider of ARM runners - this solution saves a lot of time and consequently some money. | ||
os: [ubuntu-22.04, buildjet-2vcpu-ubuntu-2204-arm, macos-12] | ||
|
||
steps: | ||
- uses: actions/checkout@v4 | ||
|
||
- if: ${{ !endsWith(matrix.os, '-arm') }} | ||
uses: actions/setup-python@v4 | ||
with: | ||
python-version: '3.11' | ||
|
||
- if: ${{ endsWith(matrix.os, '-arm') }} | ||
uses: deadsnakes/[email protected] # Unfortunately actions/setup-python@v4 just doesn't work on ARM! This does | ||
with: | ||
python-version: '3.11' | ||
|
||
- name: Build sdist | ||
if: matrix.os == 'ubuntu-22.04' # Only build the sdist once | ||
run: cd hogql_parser && python setup.py sdist | ||
|
||
- name: Install cibuildwheel | ||
run: python -m pip install cibuildwheel==2.16.* | ||
|
||
- name: Build wheels | ||
run: cd hogql_parser && python -m cibuildwheel --output-dir dist | ||
env: | ||
MACOSX_DEPLOYMENT_TARGET: '12' # A modern target allows us to use C++20 | ||
|
||
- uses: actions/upload-artifact@v3 | ||
with: | ||
path: | | ||
hogql_parser/dist/*.whl | ||
hogql_parser/dist/*.tar.gz | ||
if-no-files-found: error | ||
|
||
publish: | ||
name: Publish on PyPI | ||
needs: build-wheels | ||
environment: pypi-hogql-parser | ||
permissions: | ||
id-token: write | ||
runs-on: ubuntu-22.04 | ||
steps: | ||
- name: Fetch wheels | ||
uses: actions/download-artifact@v3 | ||
with: | ||
name: artifact | ||
path: dist/ | ||
|
||
- name: Publish package to PyPI | ||
uses: pypa/gh-action-pypi-publish@release/v1 | ||
|
||
- uses: actions/checkout@v4 | ||
with: | ||
token: ${{ secrets.POSTHOG_BOT_GITHUB_TOKEN }} | ||
ref: ${{ github.event.pull_request.head.ref }} | ||
|
||
- name: Update hogql-parser in requirements | ||
shell: bash | ||
run: | | ||
local=$(python hogql_parser/setup.py --version) | ||
sed -i "s/hogql-parser==.*/hogql-parser==${local}/g" requirements.in | ||
sed -i "s/hogql-parser==.*/hogql-parser==${local}/g" requirements.txt | ||
|
||
- uses: EndBug/add-and-commit@v9 | ||
with: | ||
add: '["requirements.in", "requirements.txt"]' | ||
message: 'Use new hogql-parser version' | ||
default_author: github_actions | ||
github_token: ${{ secrets.POSTHOG_BOT_GITHUB_TOKEN }} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -96,6 +96,31 @@ | |
"console": "integratedTerminal", | ||
"python": "${workspaceFolder}/env/bin/python", | ||
"cwd": "${workspaceFolder}" | ||
}, | ||
{ | ||
"name": "Pytest: Current File", | ||
"type": "python", | ||
"request": "launch", | ||
"module": "pytest", | ||
"args": ["${file}", "-vvv"], | ||
"console": "integratedTerminal", | ||
"justMyCode": true | ||
}, | ||
{ | ||
"name": "(lldb) Attach", | ||
"type": "cppdbg", | ||
"request": "attach", | ||
"program": "/Users/twixes/.pyenv/versions/3.10.10/envs/posthog-3.10/bin/python", | ||
"MIMode": "lldb" | ||
}, | ||
{ | ||
"name": "Python C++ Debugger: Current File", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Awesome for debugging a Python C++ extension, can set breakpoints in C++ code and they work perfectly when this debug config is ran in e.g. |
||
"type": "pythoncpp", | ||
"request": "launch", | ||
"pythonConfig": "custom", | ||
"pythonLaunchName": "Pytest: Current File", | ||
"cppConfig": "custom", | ||
"cppAttachName": "(lldb) Attach" | ||
} | ||
], | ||
"compounds": [ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
BasedOnStyle: Chromium | ||
ColumnLimit: 120 | ||
AlignAfterOpenBracket: BlockIndent |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Build | ||
build/ | ||
*.egg-info | ||
*.so | ||
dist/ |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
# Developing `hogql-parser` | ||
|
||
## Mandatory reading | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is great I applaud you for writing this 👏 |
||
|
||
If you're new to Python C/C++ extensions, there are some things you must have in your mind. | ||
|
||
### [Objects, Types and Reference Counts in CPython](https://docs.python.org/3/c-api/intro.html#objects-types-and-reference-counts) | ||
|
||
Key takeaways: | ||
|
||
1. `Py_INCREF()` and `Py_DECREF()` need to be used accurately, or there'll be memory leaks (or, less likely, segfaults). | ||
1. `Py_None`, `Py_True`, and `Py_False` are singletons, but they still need to be incref'd/decref'd - the best way to do create a new reference to them is wrapping them in `Py_NewRef()`. | ||
1. Pretty much only `PyList_SET_ITEM()` _steals_ references (i.e. assumes ownership of objects passed into it), if you pass an object into any other function and no longer need it after that - remember to `Py_DECREF` it! | ||
|
||
### [Building Values in CPython](https://docs.python.org/3/c-api/arg.html#building-values) | ||
|
||
Key takeaways: | ||
|
||
1. Use `Py_BuildValue()` for building tuples, dicts, and lists of static size. Use type-specific functions (e.g. `PyUnicode_FromString()` or `PyList_New()`) otherwise. | ||
1. `str`-building with `s` involves `strlen`, while `s#` doesn't - it's better to use the latter with C++ strings. | ||
1. `object`-passing with `O` increments the object's refcount, while doing it with `N` doesn't - we should use `N` pretty much exclusively, because the parse tree converter is about creating new objects (not borrowing). | ||
|
||
## Conventions | ||
|
||
1. Use `snake_case`. ANTLR is `camelCase`-heavy because of its Java heritage, but both the C++ stdlib and CPython are snaky. | ||
2. Use the `auto` type for ANTLR and ANTLR-derived types, since they can be pretty verbose. Otherwise specify the type explictly. | ||
3. Stay out of Python land as long as possible. E.g. avoid using `PyObject*`s` for bools or strings. | ||
Do use Python for parsing numbers though - that way we don't need to consider integer overflow. | ||
4. If any child rule results in an AST node, so must the parent rule - once in Python land, always in Python land. | ||
E.g. it doesn't make sense to create a `vector<PyObject*>`, that should just be a `PyObject*` of Python type `list`. | ||
|
||
## How to develop locally on macOS | ||
|
||
1. Install libraries: | ||
|
||
```bash | ||
brew install boost antlr4-cpp-runtime | ||
``` | ||
|
||
1. Install `hogql_parser` by building from local sources: | ||
|
||
```bash | ||
pip install ./hogql_parser | ||
``` | ||
|
||
1. If you now run tests, the locally-built version of `hogql_parser` will be used: | ||
|
||
```bash | ||
pytest posthog/hogql/ | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth chekcing the
md5sum
here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, added that!