Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1821246 Add parameter for blob upload timeout #916

Closed
wants to merge 3 commits into from

Conversation

sfc-gh-alhuang
Copy link
Contributor

The original blob upload timeout is hard coded as 5 seconds. This might not be feasible for Iceberg table streaming as the 30 seconds client lag in Iceberg streaming might create larger file which requires larger blob upload timeout. Make the BLOB_UPLOAD_TIMEOUT constant to configurable parameter and keep the default timeout of FDN table streaming as 5 seconds while have a 20 seconds default for Iceberg table streaming.

RegisterService<StubChunkData> rs = new RegisterService<>(null, true);
SnowflakeStreamingIngestClientInternal<StubChunkData> client =
Mockito.mock(SnowflakeStreamingIngestClientInternal.class);
ParameterProvider parameterProvider = Mockito.mock(ParameterProvider.class);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why do we need to mock the parameter provider, can we not do away with this unnecessary mocking (not for this PR) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this as RegisterServiceTest is reading blob upload timeout from parameter provider.

@sfc-gh-tzhang
Copy link
Contributor

Want to make sure we understand the upload logic correctly, the total wait time is NOT 5 seconds, the total time is BLOB_UPLOAD_TIMEOUT_IN_SEC * BLOB_UPLOAD_MAX_RETRY_COUNT which is 120s, and currently the BLOB_UPLOAD_MAX_RETRY_COUNT is already configurable. Ideally we do not want to exposing more and more configurable parameters even though they're not documented. And please make sure you have read the comment in the upload function

        // In order to guarantee fairness between the time spent on waiting blob uploading VS blob
        // registering and make sure the delay on server side commit is relatively small:
        // 1. If no exception and total time is less than or equal to
        // BLOB_UPLOAD_TIMEOUT_IN_SEC * 2, we will wait for all blobs to be uploaded and then
        // register them
        // 2. If no exception and total time is bigger than BLOB_UPLOAD_TIMEOUT_IN_SEC * 2, we
        // will break the waiting and register the processed blob first
        // 3. If hitting non-timeout exception, we will skip the current blob and continue on
        // processing next blob
        // 4. If hitting timeout exception, we will break the waiting if we haven't reached
        // BLOB_UPLOAD_TIMEOUT_IN_SEC * BLOB_UPLOAD_MAX_RETRY_COUNT and register the
        // processed blob first. Otherwise, we will skip the current blob and continue on processing
        // next blob if time out has been reached.

Copy link
Contributor

@sfc-gh-tzhang sfc-gh-tzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my other comment, thanks

@sfc-gh-alhuang
Copy link
Contributor Author

Closed this PR as the customer can configure max retry count.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants