You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My dataset is composed of 1200 tiles of shape [2000,1600,105], unsigned 16-bit ints. Each tile is ~640MB, and the total dataset is 768GB.
I had tried several Dataproc cluster configurations but would always run out of memory before the job finished.
Here is the log from one such failed run:
Working interval is at [0, 0, 0] of size [2000, 1600, 105]
Working with stack of size 1120
Output directory: gs://xxxx/spark-stitching-test/tile_config-flatfield/fullsize/solution
Running flatfield correction script in 3D mode
Histogram intensity range: min=0.0, max=596.0
Background intensity value: 2.0
Binning the input stack and saving as N5 blocks...
22/05/31 09:36:11 ERROR org.apache.spark.scheduler.AsyncEventQueue: Dropping event from queue eventLog. This likely means one of the listeners is too slow and cannot keep up with the rate at which tasks are being started by the scheduler.
22/05/31 09:36:11 WARN org.apache.spark.scheduler.AsyncEventQueue: Dropped 1 events from eventLog since the application started.
Collected reference histogram of size 258 (first and last bins are tail bins):
[0.0, 437.7783683452381, 81.77367467857142, 46.8757134047619, 47.477869, 24.27659158333333, 20.217159428571428, 24.67996761904762, 13.484565666666667, 11.611294119047619, 14.610392464285715, 8.293473321428571, 7.379666666666667, 9.752576595238095, 5.83438805952381, 5.138283369047619, 6.672545178571428, 3.938613880952381, 3.604872380952381, 4.9050658333333335, 2.9963505, 2.8183993095238096, 3.935544, 2.454618130952381, 2.342203142857143, 3.3578584166666667, 2.1719595833333334, 2.1521068928571427, 3.1912209285714286, 2.0808182619047617, 2.057839535714286, 3.0716244761904763, 2.0486201904761905, 2.0560531785714287, 3.1066629285714287, 2.091556119047619, 2.1118542023809526, 3.2133971071428573, 2.176664880952381, 2.2079175714285713, 3.376959154761905, 2.293583488095238, 2.3201689880952383, 3.5095552261904763, 2.3501815, 2.354911095238095, 3.5369314166666665, 2.358748130952381, 2.3579966904761904, 3.5322006071428573, 2.350168214285714, 2.3450787976190477, 3.5074072857142857, 2.330146857142857, 2.3218056666666667, 3.4597665714285712, 2.2850581785714286, 2.2618299404761903, 3.343573261904762, 2.1940888095238096, 2.165262333333333, 3.1936283452380954, 2.092879726190476, 2.0641206666666667, 2.0355645, 3.0004284285714284, 1.9654259047619047, 1.9381174285714287, 2.8543376904761906, 1.866112130952381, 1.835421869047619, 2.6926035119047618, 1.7538462976190476, 1.7213971547619047, 2.52253375, 1.6434050476190476, 1.6141225714285714, 2.367681869047619, 1.544810738095238, 1.5184377976190475, 2.2299729642857145, 1.4552315119047619, 1.4303141071428571, 2.099422119047619, 1.3690689404761904, 1.3446135714285714, 1.971852369047619, 1.285029369047619, 1.2631100238095239, 1.854925011904762, 1.2108501428571428, 1.1915200595238096, 1.752440119047619, 1.1456031309523809, 1.127990380952381, 1.658482880952381, 1.0839164285714287, 1.0666084642857143, 1.567604988095238, 1.0240427142857143, 1.0078844166666667, 1.4823280357142856, 0.969960880952381, 0.9559713928571428, 1.4111897142857144, 0.9262692261904761, 0.91561625, 1.3544762023809525, 0.890556880952381, 0.8807474166666667, 1.3031114047619048, 0.8570969404761904, 0.8480894404761905, 1.2557332261904761, 0.8268718095238096, 0.8195214285714286, 1.2149739404761906, 0.8017395833333333, 0.7954338333333333, 1.1819282976190477, 0.7809344761904762, 0.7757366904761904, 1.154225892857143, 0.7637096428571428, 0.7590340238095238, 1.1298775833333334, 0.7474632142857143, 0.7429755833333334, 0.7380995952380952, 1.0983775119047618, 0.7262675357142857, 0.7218658214285715, 1.074690619047619, 0.7112021190476191, 0.7072245, 1.0541304523809525, 0.6980834047619048, 0.6944824642857143, 1.0356507261904762, 0.6859677023809524, 0.6826705119047619, 1.0172985833333332, 0.6741086428571429, 0.6706799523809523, 0.9999159642857143, 0.6627480238095238, 0.6596391547619047, 0.9839666190476191, 0.6525576785714285, 0.6499785, 0.9700708690476191, 0.6436101428571429, 0.6411394047619048, 0.9570107976190476, 0.6348474285714286, 0.6324135357142857, 0.9440853333333333, 0.6262217380952381, 0.6238707380952381, 0.9315843095238096, 0.6182572619047619, 0.6162029523809524, 0.9202849523809524, 0.6108643452380952, 0.6087792261904762, 0.909237630952381, 0.6035397023809523, 0.6010273333333334, 0.8971875238095238, 0.5952812619047619, 0.5925989523809524, 0.8841085595238095, 0.5863143928571428, 0.5834745238095238, 0.8707290238095238, 0.5772933214285715, 0.5747278333333333, 0.8573223571428571, 0.5682029047619047, 0.5656684404761905, 0.84311375, 0.5587734285714285, 0.5557833214285715, 0.8280252023809523, 0.5484250476190476, 0.5450879761904762, 0.81207725, 0.5374864880952381, 0.5344990833333333, 0.7961647142857143, 0.5270869523809524, 0.5241008214285714, 0.5211760833333333, 0.7760217142857143, 0.5136853928571429, 0.5109437380952381, 0.7608697261904762, 0.5039475833333333, 0.5011205595238095, 0.7468406666666667, 0.49464491666666666, 0.49217063095238095, 0.7338129523809523, 0.4861535, 0.48381580952380954, 0.7214703452380953, 0.47814615476190475, 0.476042619047619, 0.7102251071428571, 0.47076435714285714, 0.4687870238095238, 0.6995748452380952, 0.4640189880952381, 0.46221634523809524, 0.6898104880952382, 0.45763016666666667, 0.4558778095238095, 0.6806325357142857, 0.4518012619047619, 0.4501145357142857, 0.6721814761904762, 0.4463145, 0.44457219047619045, 0.6640260595238096, 0.440697380952381, 0.43915423809523807, 0.6558015952380952, 0.43531659523809524, 0.4337336785714286, 0.6476901428571429, 0.4297576904761905, 0.4282166785714286, 0.6391359642857143, 0.4241249047619048, 0.42246446428571427, 0.6305123809523809, 0.4181372857142857, 0.41641934523809526, 0.6212703928571428, 0.4119727976190476, 0.4100477261904762, 0.6116464404761904, 0.40528240476190475, 0.40338982142857144, 0.6014080714285714, 0.3984663333333333, 0.3963685476190476, 0.5907914642857143, 0.39122344047619045, 0.38909815476190474, 0.5797881190476191, 0.3837952261904762, 0.38173467857142857, 0.5684315595238095, 0.37630688095238096, 0.37413361904761905, 0.5570520952380953, 56.52632755952381]
Solving for scale 6: size=[31, 25, 2], model=AffineModel, regularizer=IdentityModel
Solving for scale 5: size=[63, 50, 3], model=AffineModel, regularizer=AffineModel
Solving for scale 4: size=[125, 100, 7], model=AffineModel, regularizer=AffineModel
Solving for scale 3: size=[250, 200, 13], model=AffineModel, regularizer=AffineModel
Solving for scale 2: size=[500, 400, 26], model=FixedScalingAffineModel, regularizer=AffineModel
Solving for scale 1: size=[1000, 800, 53], model=FixedScalingAffineModel, regularizer=AffineModel
Solving for scale 0: size=[2000, 1600, 105], model=FixedScalingAffineModel, regularizer=AffineModel
22/05/31 09:58:34 INFO org.sparkproject.jetty.server.AbstractConnector: Stopped Spark@16073fa8{HTTP/1.1, (http/1.1)}{0.0.0.0:0}
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at net.imglib2.img.basictypeaccess.array.AbstractDoubleArray.<init>(AbstractDoubleArray.java:50)
at net.imglib2.img.basictypeaccess.array.DoubleArray.<init>(DoubleArray.java:47)
at net.imglib2.img.basictypeaccess.array.DoubleArray.createArray(DoubleArray.java:58)
at net.imglib2.img.basictypeaccess.array.DoubleArray.createArray(DoubleArray.java:43)
at net.imglib2.img.array.ArrayImgFactory.create(ArrayImgFactory.java:91)
at net.imglib2.img.array.ArrayImgFactory.create(ArrayImgFactory.java:68)
at net.imglib2.img.array.ArrayImgs.doubles(ArrayImgs.java:558)
at org.janelia.flatfield.FlatfieldCorrectionSolver.unpivotSolution(FlatfieldCorrectionSolver.java:414)
at org.janelia.flatfield.FlatfieldCorrection.run(FlatfieldCorrection.java:391)
at org.janelia.flatfield.FlatfieldCorrection.run(FlatfieldCorrection.java:195)
at org.janelia.flatfield.FlatfieldCorrection.main(FlatfieldCorrection.java:80)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
The configuration that finally worked:
1 m1-megamem-96 (96vcpus, 1.4TB RAM) node as master/worker.
500GB SCSI standard persistent disk
1 local 375GB NVMe disk
I did not change any yarn / spark cluster or job properties.
The job took 25.1hrs to run.
From the executors page, it shows peak JVM on-heap memory up to ~60GB per executor (full disclosure, I don't have a great idea about what these metrics mean).
With 8 cores per executor, that's a ~8GB minimum requirement per core.
That gives us 8 * 96 = 768GB required memory, which is the size of my full dataset.
Is this expected in the general case? Does it depend on the number of cores used?
Thank you,
Cameron
P.S. is this step mandatory, or can I just skip to the stitching after converting the input tiles to N5?
The text was updated successfully, but these errors were encountered:
Hello,
My dataset is composed of 1200 tiles of shape
[2000,1600,105]
, unsigned 16-bit ints. Each tile is ~640MB, and the total dataset is 768GB.I had tried several Dataproc cluster configurations but would always run out of memory before the job finished.
Here is the log from one such failed run:
The configuration that finally worked:
m1-megamem-96
(96vcpus, 1.4TB RAM) node as master/worker.I did not change any yarn / spark cluster or job properties.
The job took 25.1hrs to run.
From the executors page, it shows peak JVM on-heap memory up to ~60GB per executor (full disclosure, I don't have a great idea about what these metrics mean).
With 8 cores per executor, that's a ~8GB minimum requirement per core.
That gives us 8 * 96 = 768GB required memory, which is the size of my full dataset.
Is this expected in the general case? Does it depend on the number of cores used?
Thank you,
Cameron
P.S. is this step mandatory, or can I just skip to the stitching after converting the input tiles to N5?
The text was updated successfully, but these errors were encountered: