Save / Load indexed spatial & partitioned Rdd #1213

vbmacher · 2024-01-22T15:01:31Z

Expected behavior

Maybe this is possible somehow, but I haven't find this anywhere. I'm relatively new to Sedona and Geo-processing.
I'd like to see a possibility to save and then load a spatial RDD which is already analyzed, partitioned and possibly with the index. We have a use case we use such dataset in many jobs (which use the same spatial data) and it's time-consuming to create the partitioning & build index every time.
Not sure if it's possible though.

For example:

// save once:
val spatialRdd = Adapter.toSpatialRdd(df, ...)
spatialRdd.analyze()
spatialRdd.spatialPartitioning(GridType.KDBTREE, math.min(Integer.MAX_VALUE, df.count() / 2).toInt) // IllegalArgumentException: [Sedona] Number of partitions cannot be larger than half of total records num 
spatialRdd.buildIndex(IndexType.RTREE, true)
SomeSedonaUtility.saveSpatialRdd(spatialRdd, path) // <-- save with index and partitioned

// load & use multiple times:
val rdd = SomeSedonaUtility.loadSpatialRdd(path)

// and usage:
val otherRdd = Adapter.toSpatialRdd(otherDs, ...)
otherRdd.spatialPartitioning(rdd.getPartitioner)

val useIndex = true
val considerBoundaryIntersection = SpatialPredicate.COVERS
val params = new JoinQuery.JoinParams(useIndex, considerBoundaryIntersection, IndexType.RTREE, JoinBuildSide.LEFT)

val joined = JoinQuery.spatialJoin(rdd, otherRdd, params)

Actual behavior

Index & partitioning must be set at runtime (to my knowledge).

Steps to reproduce the problem

The feature is missing, so it's not possible to reproduce it.

Settings

Sedona version = 1.5.1

Apache Spark version = 3.5

API type = Scala

Scala version = 2.12

JRE version = 1.8

Environment = EMR

The text was updated successfully, but these errors were encountered:

jiayuasu · 2024-01-25T18:09:19Z

@vbmacher Unfortunately, a spatial partitioned RDD cannot be saved and loaded back because it will lead to wrong results. See the explanation here: https://sedona.apache.org/1.5.1/tutorial/rdd/#save-an-spatialrdd-spatialpartitioned-wo-indexed

vbmacher · 2024-01-31T10:27:45Z

Thanks @jiayuasu, so I read there also it is possible to save indexed rdd (https://sedona.apache.org/1.5.1/tutorial/rdd/#save-an-spatialrdd-indexed), but to my knowledge building an index requires spatial partitioning. So when I save the indexed RDD and then reload it back, there won't be partitioning set up but index will work ?

Also I'd like to know more details on this one, if possible:

We are working on some solutions. Stay tuned!

Is it something which we can expect maybe next release? Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save / Load indexed spatial & partitioned Rdd #1213

Save / Load indexed spatial & partitioned Rdd #1213

vbmacher commented Jan 22, 2024 •

edited

Loading

jiayuasu commented Jan 25, 2024

vbmacher commented Jan 31, 2024

Save / Load indexed spatial & partitioned Rdd #1213

Save / Load indexed spatial & partitioned Rdd #1213

Comments

vbmacher commented Jan 22, 2024 • edited Loading

Expected behavior

Actual behavior

Steps to reproduce the problem

Settings

jiayuasu commented Jan 25, 2024

vbmacher commented Jan 31, 2024

vbmacher commented Jan 22, 2024 •

edited

Loading