FlatfieldCorrection fails on GCP if solution images larger than 2GB #41

carshadi · 2022-05-24T19:17:17Z

My input tiles are 16-bit, shape (2000,1600,420) 2.563GB each. The job runs all the way up to the end where it exports the solution TIFFs to the google storage bucket, and then

22/05/24 06:23:05 INFO TaskSchedulerImpl: Removed TaskSet 55.0, whose tasks have all completed, from pool 
22/05/24 06:23:05 INFO DAGScheduler: ResultStage 55 (foreach at FlatfieldCorrectionSolver.java:199) finished in 12601.346 s
22/05/24 06:23:05 INFO DAGScheduler: Job 34 is finished. Cancelling potential speculative or zombie tasks for this job
22/05/24 06:23:05 INFO TaskSchedulerImpl: Killing all running tasks in stage 55: Stage finished
22/05/24 06:23:05 INFO DAGScheduler: Job 34 finished: foreach at FlatfieldCorrectionSolver.java:199, took 12601.395705 s
22/05/24 06:23:05 INFO TorrentBroadcast: Destroying Broadcast(62) (from destroy at FlatfieldCorrectionSolver.java:394)
Stack is larger than 4GB. Most TIFF readers will only open the first image. Use this information to open as raw:
name=Untitled, dir=, width=2000, height=1600, nImages=420, offset=4233, gap=0, type=float, byteOrder=big, format=0, url=, whiteIsZero=f, lutSize=0, comp=1, ranges=null, samples=1
22/05/24 08:14:32 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/05/24 08:14:33 INFO MemoryStore: MemoryStore cleared
22/05/24 08:14:33 INFO BlockManager: BlockManager stopped
22/05/24 08:14:33 INFO BlockManagerMaster: BlockManagerMaster stopped
22/05/24 08:14:33 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/05/24 08:14:33 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" java.lang.OutOfMemoryError: Required array size too large
        at java.nio.file.Files.readAllBytes(Files.java:3156)
        at org.janelia.dataaccess.googlecloud.GoogleCloudDataProvider.saveImage(GoogleCloudDataProvider.java:246)
        at org.janelia.flatfield.FlatfieldCorrection.saveSolutionComponent(FlatfieldCorrection.java:487)
        at org.janelia.flatfield.FlatfieldCorrection.run(FlatfieldCorrection.java:394)
        at org.janelia.flatfield.FlatfieldCorrection.run(FlatfieldCorrection.java:195)
        at org.janelia.flatfield.FlatfieldCorrection.main(FlatfieldCorrection.java:80)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
22/05/24 08:14:33 INFO ShutdownHookManager: Shutdown hook called
22/05/24 08:14:33 INFO ShutdownHookManager: Deleting directory /home/jupyter/spark/local/spark-86ed0dcc-292c-48d0-b42a-6ec045286b6a
22/05/24 08:14:33 INFO ShutdownHookManager: Deleting directory /tmp/spark-40ada513-d5d3-46fe-90bf-83df8ec8b5f7

We run into that pesky 2GB limit on the size of single byte[]

stitching-spark/src/main/java/org/janelia/dataaccess/googlecloud/GoogleCloudDataProvider.java

Line 222 in e118564

final byte[] bytes = Files.readAllBytes( tempPath );

It would be useful to document this limitation so that others might avoid having to re-run the entire pipeline with smaller chunks.

Alternatively, there are methods to create blobs from Paths in newer versions of the google-cloud-storage Java API that could come in handy here.
See https://github.com/googleapis/java-storage/blob/854d7a3edcab88c410ccf7947dbec36bd5ba4585/google-cloud-storage/src/main/java/com/google/cloud/storage/StorageImpl.java#L209-L213

This repo pulls in google-cloud-storage 1.106.0 through n5-google-cloud 3.2.1, which does not come with those methods. Maybe a good time to update those dependencies?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FlatfieldCorrection fails on GCP if solution images larger than 2GB #41

FlatfieldCorrection fails on GCP if solution images larger than 2GB #41

carshadi commented May 24, 2022 •

edited

Loading

FlatfieldCorrection fails on GCP if solution images larger than 2GB #41

FlatfieldCorrection fails on GCP if solution images larger than 2GB #41

Comments

carshadi commented May 24, 2022 • edited Loading

carshadi commented May 24, 2022 •

edited

Loading