-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stack overflow with LEAD and LAG functions #12731
Comments
Version 42.0.0 exhibit the same issue. |
take |
I can reproduce it on Windows(can't on linux), it overflow on this loop. I would try to write a unit test in |
I found stack overflow not happening if stack size is set to 16MiB(default is 1 MiB on Windows), so you can reproduce this issue on linux by making stack size smaller. I don't think this is an bug, but it could be better if we document it somewhere(document Also of note, optimized build significantly reduce memory usage, 1 MiB may be enough. |
Any stack overflow is a security risk https://book.hacktricks.xyz/binary-exploitation/stack-overflow |
Sorry for missing additional context, I think such stack overflow won't happen if one of those requirement is meet:
And on datafusion side, I don't think there is a reasonable way reduce stack size except using iteration instead of recursive on I will try rewriting parser with iteration instead of recursive. |
Is it possible to detect this condition programmatically and raise an exception? Controlled exception is always better then running over to someone else region of memory. |
Imagine a service which implements an application to analyze structured log events. Where log events are stored in parquet files. The application is a web server which accepts adhoc SQL queries and runs them to extract useful patterns from log events. In such scenario the malicious user can craft complex SQL query to cause stack overflow and mount remote code execution attack. In this scenario no matter how big the stack size is. It will always be possible to craft more complex query to cause the problem. |
I personally think that not something to be taken into consideration under optimized build, as query with thousand of token might needed, in that case, it should be bound checked(as physical memory might not be enough). But totally I agree that stack overflow under unoptimized build is unacceptable(it hurt developer experience), so I will work on that. For now, you can develop the application with more stack size, and release it with optimization, it should be safe if the input were bound checked for some reasonable large number. |
I think rust consumes quite a bit more stack space for debug builds than release builds (there are several past tickets in this repo related to this)
Indeed -- it would be great if someone could put a stack limit in to try and avoid overflows (make a real error) if the query is too deply nested |
This particular query doesn't look like it have deep nesting though -- can someone post the stack trace here? Maybe we can improve the behavior for whatever function is happening |
Here is one stacktrace. I am trying to understand which stack frame take some stack memory by manually read In addition, if I recall correctly, it take about 100 frames to parse it. (that's where stack overflow happen) |
Here is another one that actually stack overflow. |
I think it's better to open another issue. I would like this issue to focus on reducing stack usage. We could use asm macro to get stack usage, and detect usage(return Error) at some point fn get_stack_usage() -> usize {
let rbp: usize;
let rsp: usize;
unsafe {
asm!("mov {0}, rbp", out(reg) rbp);
asm!("mov {0}, rsp", out(reg) rsp);
}
return rbp - rsp;
} |
My original thought to reduce stack usage is to move more thing to heap, but I just found another possible solution using stacker. I will try stacker next day, then submit a PR to see if it's appropriate. |
Interesting library |
Another option is to make parser tail recursive and annotate the function using https://docs.rs/tailcall/latest/tailcall/ |
The sqlparser library already supports limiting recursion: https://docs.rs/sqlparser/latest/sqlparser/parser/struct.Parser.html#method.with_recursion_limit Which is a pretty good protection against stack overflows I think stacker is a somewhat low level solution and that we would have to ensure any use doesn't cause performance or safety issues |
I agree |
Given this is a stack overflow in sqlparser, perhaps we can move the ticket there and try to solve it in the sqlparser library? |
Let's use apache/datafusion-sqlparser-rs#1465 to track this issue |
BTW, you can change stack size with Cargo.toml # 64 bit MSVC
[target.x86_64-pc-windows-msvc]
rustflags = [
"-C", "link-arg=/STACK:8000000"
]
# 64 bit Mingw
[target.x86_64-pc-windows-gnu]
rustflags = [
"-C", "link-arg=-Wl,--stack,8000000"
] |
Describe the bug
The following query causes stack overflow in rust
To Reproduce
Expected behavior
The query should detect gaps in the seq number.
Additional context
The text was updated successfully, but these errors were encountered: