SNOW-1675591 Fill in ExternalVolume and ExternalVolumeManager to do presigned url retrieval + blobname population #837

sfc-gh-hmadan · 2024-09-20T22:36:51Z

Add a GeneratePresignedUrls API call (new contracts + snowflakeClient API)
Enhance ExternalVolume.java to do presigned url file uploads to S3 (GCP and Azure testing pending thus commented out)
fix json field names where applicable
Add @JsonIgnoreProperties(ignoreUnknown = true) for forward compatiblity with service side evolution to some contracts that were missing this tag

fix unit tests change non_default to non_null

sfc-gh-psaha

Looks mostly good. Just some minor comments.

src/main/java/net/snowflake/ingest/streaming/internal/ChunkMetadata.java

src/main/java/net/snowflake/ingest/streaming/internal/ExternalVolume.java

sfc-gh-psaha · 2024-09-24T20:51:17Z

src/main/java/net/snowflake/ingest/streaming/internal/ExternalVolume.java

+      }
+    }
+    if (generate) {
+      // TODO: do this generation on a background thread to allow the current thread to make


I feel like the background thread approach would also be simpler - you won't need all the Semaphore business. The bg thread could just keep refreshing in a loop whenever it sees the remaining urls hit a low.

Yes, couple reasons I kept this as simple as possible:

There can be many many external volumes (one per table), and the right fix there would be to have a small threadpool that multiplexes over all the active external volumes, and potentially even does a batched retrieval instead of one RPC per ExtVol. This is doable, just a matter of work ordering

rate matching - some ext vols will need 1 URL every minute, other volumes will need 10 URLs every second; to do a proper fix the low watermark needs to be adaptive instead of the hardcoded 5 right now.

Multipart upload support - to do multipart upload, each part requires its' own presigned URL AND needs to be associated with an upload-id; however now we can't cache URLs in advance as we won't know (in advance) how many parts any given file will have and don't have a batch of upload-ids either to quickly choose from.

Of these three, ended up just having a roundabout fix for (2) by allowing concurrent URL retrieval, as soon as there's high pressure on URL retrieval we'll end up caching a lot of URLs and reduce pressure.

- fix the test failures due to new json fields that the already-deployed snowflake service does not expect to see (it is not forward compatible). minor cleanup along the way.

sfc-gh-hmadan requested review from sfc-gh-alhuang and sfc-gh-psaha September 20, 2024 22:36

sfc-gh-hmadan requested review from sfc-gh-tzhang and a team as code owners September 20, 2024 22:36

add extvol and extvolmanager support

ceee1a2

fix unit tests change non_default to non_null

sfc-gh-hmadan force-pushed the hmadan-iceberg-sep19 branch from 0f005ae to ceee1a2 Compare September 23, 2024 23:10

sfc-gh-psaha reviewed Sep 24, 2024

View reviewed changes

sfc-gh-hmadan added 2 commits September 24, 2024 22:21

CR comments

15be2ce

- undo accidental change to thread shutdown timeout in seconds

126799e

- fix the test failures due to new json fields that the already-deployed snowflake service does not expect to see (it is not forward compatible). minor cleanup along the way.

sfc-gh-hmadan force-pushed the hmadan-iceberg-sep19 branch from 00d842f to 126799e Compare September 25, 2024 05:06

sfc-gh-psaha approved these changes Sep 25, 2024

View reviewed changes

sfc-gh-hmadan merged commit 7b3881b into master Sep 25, 2024
45 checks passed

sfc-gh-hmadan deleted the hmadan-iceberg-sep19 branch September 25, 2024 20:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SNOW-1675591 Fill in ExternalVolume and ExternalVolumeManager to do presigned url retrieval + blobname population #837

SNOW-1675591 Fill in ExternalVolume and ExternalVolumeManager to do presigned url retrieval + blobname population #837

sfc-gh-hmadan commented Sep 20, 2024

sfc-gh-psaha left a comment

sfc-gh-psaha Sep 24, 2024

sfc-gh-hmadan Sep 24, 2024

SNOW-1675591 Fill in ExternalVolume and ExternalVolumeManager to do presigned url retrieval + blobname population #837

SNOW-1675591 Fill in ExternalVolume and ExternalVolumeManager to do presigned url retrieval + blobname population #837

Conversation

sfc-gh-hmadan commented Sep 20, 2024

sfc-gh-psaha left a comment

Choose a reason for hiding this comment

sfc-gh-psaha Sep 24, 2024

Choose a reason for hiding this comment

sfc-gh-hmadan Sep 24, 2024

Choose a reason for hiding this comment