Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iceberg Ingestion in CloudStorage mode - Carve out IStorage and ExternalVolume from InternalStage #828

Merged
merged 4 commits into from
Sep 15, 2024

Conversation

sfc-gh-hmadan
Copy link
Collaborator

This is the first in a sequence of PRs to get the client SDK to work with presigned urls for iceberg tables.

At a high level, it sets things up to have two stacks of storage objects - InternalStageManager and InternalStage (for FDN tables), and ExternalVolumeManager and ExternalVolume (for iceberg tables). Technically the latter can be used for any table that wants to use the presigned-url style of uploading files, but right now we're focused on iceberg.

There is a new interface, IStorage, that both ExternalVolume and InternalStage will implement.

Other changes:

  1. Introduce a BlobPath class to carry around the blob path / path with token. We want to be careful to not write the presigned url out to logs etc, or put it into the file metadata, thus we need this structure instead of a simple string
  2. minor changes in FlushService ot generate filename only when blobData is empty, this removes the need for decrementBlobSequencer.
  3. cleanup the IStorageManager interface to only have whats necessary
  4. introduce a TableRef structure instead of passing around fullyQualifiedTableName as a concatenated string and keying off that string in maps.
  5. introduce a hidden method (not public) on the ClientFactory that allows setting isIceberg to true when doing local development.

@sfc-gh-hmadan sfc-gh-hmadan requested review from sfc-gh-tzhang and a team as code owners September 12, 2024 21:35
@sfc-gh-hmadan sfc-gh-hmadan changed the title carve out IStorage and ExternalVolume Iceberg Ingestion in CloudStorage mode - Carve out IStorage and ExternalVolume from InternalStage Sep 12, 2024
Copy link
Contributor

@sfc-gh-alhuang sfc-gh-alhuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@@ -415,11 +411,10 @@ void distributeFlushTasks(Set<String> tablesToFlush) {
>= this.owningClient.getParameterProvider().getMaxChunksInBlob()) {
// Create a new blob if the current one already contains max allowed number of chunks
logger.logInfo(
"Max allowed number of chunks in the current blob reached. chunkCount={}"
+ " maxChunkCount={} currentBlobPath={}",
"Max allowed number of chunks in the current blob reached. chunkCount={} "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional space after chunk chunkCount={} should be removed?

@sfc-gh-hmadan sfc-gh-hmadan merged commit c063036 into master Sep 15, 2024
47 checks passed
@sfc-gh-hmadan sfc-gh-hmadan deleted the hmadan-iceberg-sep12 branch September 15, 2024 06:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants