-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(hog): C++ parser #23100
feat(hog): C++ parser #23100
Conversation
It looks like the code of |
It looks like the code of |
The build is still in progress (couldn't release a new hogql_parser because launchpad.net is down and the ARM deadsnakes python repo with it), but everything else could use some 👀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Only some suggestions around code clarity (and a couple missing DECREFs)
PyObject* declarations = PyList_New(0); | ||
if (!declarations) { | ||
throw PyInternalError(); | ||
} | ||
auto declaration_ctxs = ctx->declaration(); | ||
for (auto declaration_ctx : declaration_ctxs) { | ||
if (!declaration_ctx->statement() || !declaration_ctx->statement()->emptyStmt()) { | ||
PyObject* statement; | ||
try { | ||
statement = visitAsPyObject(declaration_ctx); | ||
} catch (...) { | ||
Py_DECREF(declarations); | ||
throw; | ||
} | ||
int append_code = PyList_Append(declarations, statement); | ||
Py_DECREF(statement); | ||
if (append_code == -1) { | ||
Py_DECREF(declarations); | ||
throw PyInternalError(); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
visitPyListOfObjects()
would be simpler, though wouldn't skip the empty statements – not sure how important this is
PyObject* declarations = PyList_New(0); | |
if (!declarations) { | |
throw PyInternalError(); | |
} | |
auto declaration_ctxs = ctx->declaration(); | |
for (auto declaration_ctx : declaration_ctxs) { | |
if (!declaration_ctx->statement() || !declaration_ctx->statement()->emptyStmt()) { | |
PyObject* statement; | |
try { | |
statement = visitAsPyObject(declaration_ctx); | |
} catch (...) { | |
Py_DECREF(declarations); | |
throw; | |
} | |
int append_code = PyList_Append(declarations, statement); | |
Py_DECREF(statement); | |
if (append_code == -1) { | |
Py_DECREF(declarations); | |
throw PyInternalError(); | |
} | |
} | |
} | |
PyObject* declarations = visitPyListOfObjects(ctx->declaration()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely want to skip the empty statements in the parser and return a relatively clean tree... 🤔
hogql_parser/parser.cpp
Outdated
vector<string> identifiers; | ||
auto identifier_ctxs = ctx->identifier(); | ||
identifiers.reserve(identifier_ctxs.size()); | ||
for (auto identifier_ctx : identifier_ctxs) { | ||
identifiers.push_back(visitAsString(identifier_ctx)); | ||
} | ||
return X_PyList_FromStrings(identifiers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would do the same:
vector<string> identifiers; | |
auto identifier_ctxs = ctx->identifier(); | |
identifiers.reserve(identifier_ctxs.size()); | |
for (auto identifier_ctx : identifier_ctxs) { | |
identifiers.push_back(visitAsString(identifier_ctx)); | |
} | |
return X_PyList_FromStrings(identifiers); | |
visitPyListOfObjects(ctx->identifier()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to revert this. Not sure why, but it failed with Parsing failed due to bad type casting
(even after adding the return
I forgot the first time). The above works, so I'll 🙈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, of course, because the identifier
rule visitor returns a string. So there need to be two steps (and an error guard):
vector<string> identifiers; | |
auto identifier_ctxs = ctx->identifier(); | |
identifiers.reserve(identifier_ctxs.size()); | |
for (auto identifier_ctx : identifier_ctxs) { | |
identifiers.push_back(visitAsString(identifier_ctx)); | |
} | |
return X_PyList_FromStrings(identifiers); | |
PyObject* ret = X_PyList_FromStrings(visitAsVectorOfStrings(ctx->identifier())); | |
if (!ret) { | |
throw PyInternalError(); | |
} | |
return ret; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though actually this goes against convention No. 3:
Stay out of Python land as long as possible. E.g. avoid using
PyObject*
s` for bools or strings.
So in this spirit, this visitor should only return the vector of strings, as is conventional. Then the rule that actually builds an AST node using IdentifierList
can use X_PyList_FromStrings()
itself – so that this method doesn't have to be needlessly concerned with Python internals
It looks like the code of |
It looks like the code of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
One optional note on a point where my initial suggestion was wrong
hogql_parser/parser.cpp
Outdated
vector<string> identifiers; | ||
auto identifier_ctxs = ctx->identifier(); | ||
identifiers.reserve(identifier_ctxs.size()); | ||
for (auto identifier_ctx : identifier_ctxs) { | ||
identifiers.push_back(visitAsString(identifier_ctx)); | ||
} | ||
return X_PyList_FromStrings(identifiers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, of course, because the identifier
rule visitor returns a string. So there need to be two steps (and an error guard):
vector<string> identifiers; | |
auto identifier_ctxs = ctx->identifier(); | |
identifiers.reserve(identifier_ctxs.size()); | |
for (auto identifier_ctx : identifier_ctxs) { | |
identifiers.push_back(visitAsString(identifier_ctx)); | |
} | |
return X_PyList_FromStrings(identifiers); | |
PyObject* ret = X_PyList_FromStrings(visitAsVectorOfStrings(ctx->identifier())); | |
if (!ret) { | |
throw PyInternalError(); | |
} | |
return ret; |
It looks like the code of |
Problem
The Python parser is slow
Changes
{return}
workHow did you test this code?
Moved the python-only parser tests into the universar parser tests file.
TODO:
hogql_parser
and run CI@posthog/hogvm