Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect stack overflow and reduce stack usage on debug build #1465

Closed
Eason0729 opened this issue Oct 10, 2024 · 11 comments
Closed

Detect stack overflow and reduce stack usage on debug build #1465

Eason0729 opened this issue Oct 10, 2024 · 11 comments

Comments

@Eason0729
Copy link
Contributor

related to apache/datafusion#12731

The issue is that unoptimized(debug) build consume more stack memory, which known to cause stack overflow if stack size is less than 1MiB.

Expected change:

  1. Return error on stack overflow
  2. Reduce stack usage (or grow stack
@Eason0729
Copy link
Contributor Author

I would like to take stacker approach, that is:

  1. implement stack guard at some point
  2. if red zone is reached, try grow stack
  3. if unable to grow stack, return an error

If those words aren't enough to explain the approach I would like to take, I can submit a draft PR to make it clear with code.

@alamb
Copy link
Contributor

alamb commented Oct 10, 2024

@Eason0729
Copy link
Contributor Author

#1468

@Eason0729
Copy link
Contributor Author

Eason0729 commented Oct 17, 2024

I spent some time manually run reg read sp fp, and write down rbp-rsp (in debug build.

Detail

export RUSTFLAGS="-C force-frame-pointers=yes"

parse_query 12592
parse_boxed_query_body 12592
parse_select 17024
parse_optional_alias 256
parse_comma_separated 1056
parse_select 17024
parse_table_and_joins 10016
parse_derived_table_factor 2384

And here is some large struct(in byte):

Expr 296
Statement 3528
let parser = Parser::new(&GenericDialect {});
let sql = r#"
    SELECT seq
    FROM (
        SELECT seq,
        LEAD(seq) OVER (ORDER BY seq) AS next_seq
        FROM parquet_table
    ) AS subquery
    WHERE next_seq - seq > 1;
"#;
let stmt = parser.try_with_sql(sql).unwrap().parse_statement().unwrap();
print!("{}", stmt.to_string());

In this example, sqlparser consume about 140KB stack at peak, so I think there is little to optimize in sqlparser.
I would try, but we still need to reduce stack usage on datafusion.

@Eason0729
Copy link
Contributor Author

I am busy doing some school works recently (would probably stay busy until the end of this semester), so if there is anyone interested in this issue, feel free to work on it.

@Eason0729
Copy link
Contributor Author

I will try to reproduce it with apache/datafusion#13177 next week, chances are that it's related.

@alamb
Copy link
Contributor

alamb commented Nov 13, 2024

It seems like we decided to go the stacker route in DataFusion with the rationale descrsibed here: apache/datafusion#13177 (comment)

Perhaps it is worth revisiting the use of stacker in this library as well

@blaginin
Copy link

Oops, sorry, @Eason0729, I was searching for issues to link my pr to and just came across this ticket. Happy to work on this together, if you would like to review my pr once it's ready

@Eason0729
Copy link
Contributor Author

I will try to reproduce it with apache/datafusion#13177 next week, chances are that it's related.

I try to reproduce the issue on with apache/datafusion#13177, it seem to work fine, and the stack usage seem more reasonable.

I will investigate what make binary on Windows consume significantly more stack later.

@Eason0729
Copy link
Contributor Author

Oops, sorry, @Eason0729, I was searching for issues to link my pr to and just came across this ticket. Happy to work on this together, if you would like to review my pr once it's ready

I am not maintainer, but I am glad to review it, so maintainer can work on other things. Notify me when it's ready.

By the way, I would be the first PR I ever review.

@Eason0729
Copy link
Contributor Author

Eason0729 commented Nov 17, 2024

I will close this issue because

  1. I was unable to figure out what make binary on Windows consume significantly more stack.
  2. There is a on going PR(Add #[recursive] #1522) to mitigate this issue

Feel free to reopen it to try figuring out what make binary on Windows consume significantly more stack.

Thanks for participation in discussion!

@Eason0729 Eason0729 closed this as not planned Won't fix, can't repro, duplicate, stale Nov 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants