-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1258064 Remove/Relax the limitations in order to generate bigger files #730
Conversation
src/main/java/net/snowflake/ingest/utils/ParameterProvider.java
Outdated
Show resolved
Hide resolved
@@ -673,15 +671,6 @@ static <T> AbstractRowBuffer<T> createRowBuffer( | |||
} | |||
} | |||
|
|||
private void checkBatchSizeEnforcedMaximum(float batchSizeInBytes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this still be here but with higher limits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This change was added before to avoid OOM issue in the scanner, given that the Parquert scanner is much better now and the iceberg doc mentions that it works well for files upto 1GB compressed, I don't think this is necessary now
- We still have a warning on this, see checkBatchSizeRecommendedMaximum right below this function
- We have valid use cases where the batch size could be big, imagine the case where the Kafka Connector has a big flush interval and everything will be inserted as one batch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I somehow missed this PR until it got merged. I have concerns about this change because there was another reason why these limits were introduced - to prevent long GC pauses. We observed JVM freezes for customers who were passing hundreds of thousands of rows into insertRows()
, see more details here.
I would prefer to keep some limit there, customers should not pass hundreds of megabytes of data in a single batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sfc-gh-alhuang Do you want to run some local tests to see if things are working with a few GBs of a batch and 1 minutes of flush interval? If not, then we can add the exception back but with a much higher limit, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got GCLocker timeout error message after building a 600+MB of blobs. I'll add the assertion back with higher limit.
@@ -673,15 +671,6 @@ static <T> AbstractRowBuffer<T> createRowBuffer( | |||
} | |||
} | |||
|
|||
private void checkBatchSizeEnforcedMaximum(float batchSizeInBytes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This change was added before to avoid OOM issue in the scanner, given that the Parquert scanner is much better now and the iceberg doc mentions that it works well for files upto 1GB compressed, I don't think this is necessary now
- We still have a warning on this, see checkBatchSizeRecommendedMaximum right below this function
- We have valid use cases where the batch size could be big, imagine the case where the Kafka Connector has a big flush interval and everything will be inserted as one batch
src/main/java/net/snowflake/ingest/streaming/internal/AbstractRowBuffer.java
Show resolved
Hide resolved
src/main/java/net/snowflake/ingest/utils/ParameterProvider.java
Outdated
Show resolved
Hide resolved
src/main/java/net/snowflake/ingest/utils/ParameterProvider.java
Outdated
Show resolved
Hide resolved
src/main/java/net/snowflake/ingest/utils/ParameterProvider.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
src/main/java/net/snowflake/ingest/utils/ParameterProvider.java
Outdated
Show resolved
Hide resolved
src/test/java/net/snowflake/ingest/streaming/internal/SnowflakeStreamingIngestClientTest.java
Outdated
Show resolved
Hide resolved
Keen to see this released, any idea on how long til this is cut? |
Relax file size limitation to deal with longer client flush lag which might produce larger file.
JIRA