SNOW-1497358 Support multiple storage for Iceberg mode #783

sfc-gh-alhuang · 2024-06-24T23:08:16Z

This PR includes several refactoring for FlushService and StreamingIngestInternalStage to support Iceberg/Non-Iceberg mode file upload

Create ConfigureCallHandler for different configure call includes per client (Non-Iceberg) and per channel (Iceberg mode).
Create StorageManager to manage storage and blob path generation.

Tests for Iceberg mode will be added once the refactor review is done.

sfc-gh-hmadan · 2024-06-25T17:01:52Z

src/main/java/net/snowflake/ingest/streaming/internal/StorageManager.java

+   *
+   * @param openChannelResponse response from open channel
+   */
+  void addStorage(OpenChannelResponse openChannelResponse);


lets not make the StorageManager interface directly interact with OpenChannelResponse, and pass in the storage info object instead. It leaves the door open in cas we want to use this one interface for non-iceberg tables too, over time.

Changed the signature to addStorage(dbName, schemaName, tableName, channelContext)

src/main/java/net/snowflake/ingest/streaming/internal/StorageManager.java

src/main/java/net/snowflake/ingest/streaming/internal/ConfigureCallHandler.java

sfc-gh-hmadan · 2024-06-25T17:15:37Z

src/main/java/net/snowflake/ingest/streaming/internal/ConfigureCallHandler.java

+
+  // Object mapper for creating payload, ignore null fields
+  private static final ObjectMapper mapper =
+      new ObjectMapper().setSerializationInclusion(JsonInclude.Include.NON_NULL);


nit: can we reuse object mappers across the process, via some JsonUtils class or something?

It's possible to create JsonUtils with a pool of object mappers to balance between concurrent mapping and memory. Could we add it to backlog and address this later?

src/main/java/net/snowflake/ingest/streaming/internal/ConfigureCallHandler.java

sfc-gh-hmadan · 2024-06-25T17:21:30Z

src/main/java/net/snowflake/ingest/streaming/internal/ExternalVolumeManager.java

+  @Override
+  public StreamingIngestStorage getStorage(List<List<ChannelData<T>>> blobData) {
+    // Only one chunk per blob in Iceberg mode.
+    ChannelFlushContext channelContext = blobData.get(0).get(0).getChannelContext();


lets pass in the channelContext on the method signature instead of assuming blobData's structure is constructed a certain way by the caller.

sfc-gh-hmadan · 2024-06-25T17:24:33Z

src/main/java/net/snowflake/ingest/streaming/internal/ExternalVolumeManager.java

+        String.format(
+            "%s.%s.%s",


do we have other places that do this concatenation? lets unify all these fully-qualified-name-construction pieces of code into one util method.

How is each component's encoding taken care of? There's some single quotes wrapping/unwrapping business that probably needs to be accounted for.

String.format behaves differently for different "cultures" (string.format for english typically behaves differently than string.format for arabic), please investigate how String.format infers the culture - is it some context on the thread? on the process? You might have to explicitly pass in "invariant" culture if such a thing exists.

Created unify method for fully-qualified-name in StreamingIngestUtils. I don't think we deal with the cultures issue explicitly now. I think if the naming pattern follows the snowflake rule we should be fine.

sfc-gh-hmadan · 2024-06-25T17:26:50Z

src/main/java/net/snowflake/ingest/streaming/internal/ExternalVolumeManager.java

+                this.owningClient.getName(),
+                DEFAULT_MAX_UPLOAD_RETRIES));
+      } catch (SnowflakeSQLException | IOException err) {
+        throw new SFException(err, ErrorCode.UNABLE_TO_CONNECT_TO_STAGE);


what happens when this exception is thrown? Does the customer's code get an opportunity to handle this exception, or does it go unhandled and cause a process crash?

Lets add retry handling to all API calls made via the new SnowflakeServiceClient I proposed in another thread. Need retry with backoff and jitter.

Can we log to whatever log provider is used by the sdk before throwing?

Added log for this and retry logic is in executeWithRetries. If the error is thrown, the process stop immediately.

sfc-gh-hmadan · 2024-06-25T17:32:26Z

src/main/java/net/snowflake/ingest/streaming/internal/ExternalVolumeManager.java

+
+    if (stage == null) {
+      throw new SFException(
+          ErrorCode.INTERNAL_ERROR,


same question as another thread: does customer code get a chance to intercept this exception? Does it cause a process crash (or does it silently get ignored) ?

This cause a process crash. I don't think this is suppose to happen given that client seems impossible to ingest to a channel without open it via StreamingIngestClient

sfc-gh-hmadan · 2024-06-25T18:38:12Z

src/main/java/net/snowflake/ingest/streaming/internal/ExternalVolumeManager.java

+            fullyQualifiedTableName,
+            new StreamingIngestStorage(
+                isTestMode,
+                configureCallHandler,


I like the thought process here but here's the problem - you're forcing CLIENT_CONFIGURE and CHANNEL_CONFIGURE APIs to behave almost exactly the same.

The goal is for StreamingIngestStorage to have a clean way to call the configure API without knowing whether we're in iceberg mode or not, and without knowing which API was called to get that info.

The StorageManager abstraction is serving pretty much this purpose already

I suggest passing in the StorageManager interface object to StreamingIngestStorage, and adding a configure() method on that interface that StreamingIngestStorage can call. Let this method return a strongly typed object, and each concrete impl of that interface will call whatever it wants on the service side to get latest tokens / figs ids / etc.

Added owningManager for this issue in StreamingIngestStorage.

src/main/java/net/snowflake/ingest/streaming/internal/ExternalVolumeManager.java

sfc-gh-hmadan · 2024-06-25T18:43:53Z

src/main/java/net/snowflake/ingest/streaming/internal/ExternalVolumeManager.java

+                .setSchema(openChannelResponse.getSchemaName())
+                .setTable(openChannelResponse.getTableName())
+                .build();
+        this.externalVolumeMap.put(


what if there were two concurrent addStorage calls for the same table? Suggest taking a lock after if (!containsKey) in line 76 so that we only ever construct one StreamingIngestStorage object.

Changed this to putIfAbsent with ConcurrentHashMap to address concurrent acces.

sfc-gh-hmadan · 2024-06-25T18:46:55Z

src/main/java/net/snowflake/ingest/streaming/internal/ExternalVolumeManager.java

+    if (this.externalVolumeMap.isEmpty()) {
+      return null;
+    }
+    return this.externalVolumeMap.values().iterator().next().getClientPrefix();


lets chat about this, am not clear on what's being done here. Couple issues I see here:

return type should be Optional if there's a chance we'll return null

why is the first external volume "special"?

map.values doesn't typically guarantee what order it iterates in (for some map implementations if not all), so there's no guarantee that every call to map.values.iterator.next will give you the same object.

It appears that clientPrefix is a static value that isn't meant to change, so we should pass it in via the ctor if possible?

Moved the clientPrefix logic to first configure call.

src/main/java/net/snowflake/ingest/streaming/internal/FlushService.java

sfc-gh-hmadan · 2024-06-25T19:14:11Z

src/main/java/net/snowflake/ingest/streaming/internal/FlushService.java

@@ -444,11 +408,8 @@ && shouldStopProcessing(
      }

      // Kick off a build job


The counter.decrement is gone, which manes blob names can have gaps now?

src/main/java/net/snowflake/ingest/streaming/internal/FlushService.java

src/main/java/net/snowflake/ingest/streaming/internal/StorageManager.java

src/main/java/net/snowflake/ingest/streaming/internal/InternalStageManager.java

sfc-gh-hmadan · 2024-06-25T19:29:07Z

src/main/java/net/snowflake/ingest/streaming/internal/StreamingIngestStorage.java

@@ -116,26 +111,43 @@ state to record unknown age.
    }
  }

+  /**
+   * Default constructor for external volume


Hmm this looks odd, having different constructors for different flows when we already are trying to abstract away the differences with the StorageManager interface.
Am guessing what's happening here is that in iceberg mode, we first get the fileLocation from open_channel but subsequently call channel_configure for token renewal. Whereas for iceberg mode we get file location and renewal tokens from the same CLIENT_CONFIGURE api call, thus you're doing it this way? If so I have a suggestion here:

As suggested in another thread - ExternalVolumemanager's line 93 - the StreamingIngestStorage class should just take in an object that knows how to renew tokens and doesn't impose client_configure / channel_configure should have the same call handler.

Always take in a FileLocationInfo object for both iceberg and non-iceberg modes, so that the "boostrap" codepath is also identical

this means for non-iceberg, you'll have to make an explicit call to client_configure at the callsite of StreamingIngestStorage.ctor to get the FileLocationInfo for the internal stage, which IMO is okay.

Refactored to the same ctor.

src/main/java/net/snowflake/ingest/streaming/internal/StreamingIngestStorage.java

sfc-gh-alhuang · 2024-06-28T00:08:53Z

src/main/java/net/snowflake/ingest/streaming/internal/StreamingIngestResponse.java

Should we move these API request and response class to a separate package/directory?

...ain/java/net/snowflake/ingest/streaming/internal/SnowflakeStreamingIngestClientInternal.java

src/main/java/net/snowflake/ingest/streaming/internal/StorageManager.java

src/main/java/net/snowflake/ingest/streaming/internal/InternalStageManager.java

sfc-gh-hmadan · 2024-07-01T20:42:58Z

src/main/java/net/snowflake/ingest/streaming/internal/FlushService.java

@@ -560,26 +518,36 @@ BlobMetadata buildAndUpload(String blobPath, List<List<ChannelData<T>>> blobData

    blob.blobStats.setBuildDurationMs(buildContext);

-    return upload(blobPath, blob.blobBytes, blob.chunksMetadataList, blob.blobStats);
+    return upload(
+        this.storageManager.getStorage(blobData.get(0).get(0).getChannelContext()),


lets chat about this f2f. Looks ripe for a nullref or a logical bug where the wrong channel context gets passed in.
Ideally you want to pass the channelContext into this method as an argument and not code in this get(0).get(0) assumption.

Moved the getting channel context logic to call site.

… configure response (#787) cleanup configurerequest / configureresponse and separate out channelConfigureResponse class

sfc-gh-tzhang

Left some comments, PTAL!

src/main/java/net/snowflake/ingest/streaming/internal/ChannelConfigureRequest.java

src/main/java/net/snowflake/ingest/streaming/internal/ConfigureRequest.java

src/main/java/net/snowflake/ingest/streaming/internal/ExternalVolumeManager.java

sfc-gh-tzhang · 2024-07-09T07:03:13Z

src/main/java/net/snowflake/ingest/streaming/internal/ExternalVolumeManager.java

+      ChannelFlushContext channelFlushContext) {
+    // Only one chunk per blob in Iceberg mode.
+    StreamingIngestStorage<T, ExternalVolumeLocation> stage =
+        this.externalVolumeMap.get(channelFlushContext.getFullyQualifiedTableName());


Does channelFlushContext.getFullyQualifiedTableName take care of the name resolution? For example, A.B.C and a.b.c are the same table

getFullyQualifiedTableName uses the information directly from server response. As long as the table name from server is consist, it should be fine.

As long as the table name from server is consist, it should be fine.

Have you verify this? Basically do an experiment with A.B.C and a.b.c, the fully qualified name should be A.B.C for both case

src/main/java/net/snowflake/ingest/streaming/internal/OpenChannelRequestInternal.java

src/main/java/net/snowflake/ingest/streaming/internal/SnowflakeServiceClient.java

sfc-gh-tzhang · 2024-07-09T07:28:32Z

src/main/java/net/snowflake/ingest/streaming/internal/SnowflakeServiceClient.java

+/**
+ * The SnowflakeServiceClient class is responsible for making API requests to the Snowflake service.
+ */
+class SnowflakeServiceClient {


It looks to me that there're many duplicate logic for each function, is there any room for consolidation?

Any reason this is marked as resolved?

sfc-gh-tzhang · 2024-07-09T07:30:59Z

...ain/java/net/snowflake/ingest/streaming/internal/SnowflakeStreamingIngestClientInternal.java

-        throw new SFException(ErrorCode.OPEN_CHANNEL_FAILURE, response.getMessage());
-      }
+      OpenChannelRequestInternal openChannelRequest =
+          new OpenChannelRequestInternal(


looks like you miss isOffsetTokenProvided?

The request.getOffsetToken() is okay to be null. As @JsonInclude(JsonInclude.Include.NON_NULL) in OpenChannelRequest will ignore it automatically.

not sure if I understand, do you mean that the isOffsetTokenProvided field is redundant? I'm assuming it was added for a reason.

It was used to avoid putting null value in payload string. As we are changing to json serialization now, using @JsonInclude(JsonInclude.Include.NON_NULL) will ignore the null fields. Removed isOffsetTokenProvided in OpenChannelRequest.

sfc-gh-hmadan

lgtm!

sfc-gh-tzhang · 2024-07-11T01:37:39Z

src/main/java/net/snowflake/ingest/streaming/internal/FlushService.java

@@ -102,8 +94,8 @@ List<List<ChannelData<T>>> getData() {
  // Reference to the channel cache
  private final ChannelCache<T> channelCache;

-  // Reference to the Streaming Ingest stage
-  private final StreamingIngestStage targetStage;
+  // Reference to the Stream Ingest storage manager


Stream Ingest?

sfc-gh-tzhang · 2024-07-11T01:50:10Z

src/main/java/net/snowflake/ingest/streaming/internal/SnowflakeServiceClient.java

+/**
+ * The SnowflakeServiceClient class is responsible for making API requests to the Snowflake service.
+ */
+class SnowflakeServiceClient {


Any reason this is marked as resolved?

sfc-gh-tzhang · 2024-07-11T01:51:31Z

...ain/java/net/snowflake/ingest/streaming/internal/SnowflakeStreamingIngestClientInternal.java

-        throw new SFException(ErrorCode.OPEN_CHANNEL_FAILURE, response.getMessage());
-      }
+      OpenChannelRequestInternal openChannelRequest =
+          new OpenChannelRequestInternal(


not sure if I understand, do you mean that the isOffsetTokenProvided field is redundant? I'm assuming it was added for a reason.

sfc-gh-tzhang · 2024-07-11T02:19:05Z

src/main/java/net/snowflake/ingest/streaming/internal/ChannelCache.java

@@ -101,4 +101,11 @@ void invalidateChannelIfSequencersMatch(
  int getSize() {
    return cache.size();
  }
+
+  /** Get the number of channels for a given table */
+  int getSizePerTable(String fullyQualifiedTableName) {


Suggested change

int getSizePerTable(String fullyQualifiedTableName) {

int getChannelCountForTable(String fullyQualifiedTableName) {

sfc-gh-tzhang · 2024-07-11T02:19:54Z

src/main/java/net/snowflake/ingest/streaming/internal/ChannelCache.java

+    ConcurrentHashMap<String, SnowflakeStreamingIngestChannelInternal<T>> channelsMapPerTable =
+        cache.get(fullyQualifiedTableName);


Is this thread safe? It's possible that the count changes between these two calls, right?

sfc-gh-tzhang · 2024-07-11T02:25:28Z

src/test/java/net/snowflake/ingest/streaming/internal/FlushServiceTest.java

@@ -93,25 +93,28 @@ private abstract static class TestContext<T> implements AutoCloseable {
    ChannelCache<T> channelCache;
    final Map<String, SnowflakeStreamingIngestChannelInternal<T>> channels = new HashMap<>();
    FlushService<T> flushService;
-    StreamingIngestStage stage;
+    StorageManager<T, InternalStageLocation> storageManager;


Do we have tests for both the storage types?

sfc-gh-tzhang

Discussed offline with @sfc-gh-alhuang, thanks!

sfc-gh-alhuang changed the base branch from master to iceberg-support June 24, 2024 23:08

sfc-gh-alhuang changed the title ~~SNOW-1497358 Support multiple storages for Iceberg mode~~ SNOW-1497358 Support multiple storage for Iceberg mode Jun 24, 2024

sfc-gh-alhuang force-pushed the alhuang-iceberg-multiple-stages branch from d98dabd to 082bf07 Compare June 24, 2024 23:16

sfc-gh-alhuang requested a review from sfc-gh-hmadan June 24, 2024 23:19

Support multiple stage for iceberg mode

44cc28e

sfc-gh-alhuang force-pushed the alhuang-iceberg-multiple-stages branch from 082bf07 to 44cc28e Compare June 24, 2024 23:45

sfc-gh-hmadan reviewed Jun 25, 2024

View reviewed changes

src/main/java/net/snowflake/ingest/streaming/internal/StorageManager.java Outdated Show resolved Hide resolved

sfc-gh-hmadan reviewed Jun 25, 2024

View reviewed changes

src/main/java/net/snowflake/ingest/streaming/internal/ConfigureCallHandler.java Outdated Show resolved Hide resolved

sfc-gh-hmadan reviewed Jun 25, 2024

View reviewed changes

src/main/java/net/snowflake/ingest/streaming/internal/ConfigureCallHandler.java Outdated Show resolved Hide resolved

sfc-gh-hmadan reviewed Jun 25, 2024

View reviewed changes

src/main/java/net/snowflake/ingest/streaming/internal/ExternalVolumeManager.java Outdated Show resolved Hide resolved

sfc-gh-hmadan reviewed Jun 25, 2024

View reviewed changes

src/main/java/net/snowflake/ingest/streaming/internal/FlushService.java Outdated Show resolved Hide resolved

sfc-gh-hmadan reviewed Jun 25, 2024

View reviewed changes

src/main/java/net/snowflake/ingest/streaming/internal/FlushService.java Outdated Show resolved Hide resolved

sfc-gh-hmadan reviewed Jun 25, 2024

View reviewed changes

src/main/java/net/snowflake/ingest/streaming/internal/StorageManager.java Outdated Show resolved Hide resolved

sfc-gh-hmadan reviewed Jun 25, 2024

View reviewed changes

src/main/java/net/snowflake/ingest/streaming/internal/InternalStageManager.java Outdated Show resolved Hide resolved

sfc-gh-hmadan reviewed Jun 25, 2024

View reviewed changes

src/main/java/net/snowflake/ingest/streaming/internal/StreamingIngestStorage.java Outdated Show resolved Hide resolved

Refatcor API client

30dda19

sfc-gh-alhuang commented Jun 28, 2024

View reviewed changes

sfc-gh-alhuang requested a review from sfc-gh-hmadan June 28, 2024 00:10

fix test

c71f04a

sfc-gh-hmadan reviewed Jul 1, 2024

View reviewed changes

...ain/java/net/snowflake/ingest/streaming/internal/SnowflakeStreamingIngestClientInternal.java Outdated Show resolved Hide resolved

sfc-gh-hmadan reviewed Jul 1, 2024

View reviewed changes

...ain/java/net/snowflake/ingest/streaming/internal/SnowflakeStreamingIngestClientInternal.java Show resolved Hide resolved