You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Maybe this is possible somehow, but I haven't find this anywhere. I'm relatively new to Sedona and Geo-processing.
I'd like to see a possibility to save and then load a spatial RDD which is already analyzed, partitioned and possibly with the index. We have a use case we use such dataset in many jobs (which use the same spatial data) and it's time-consuming to create the partitioning & build index every time.
Not sure if it's possible though.
For example:
// save once:
val spatialRdd = Adapter.toSpatialRdd(df, ...)
spatialRdd.analyze()
spatialRdd.spatialPartitioning(GridType.KDBTREE, math.min(Integer.MAX_VALUE, df.count() / 2).toInt) // IllegalArgumentException: [Sedona] Number of partitions cannot be larger than half of total records num
spatialRdd.buildIndex(IndexType.RTREE, true)
SomeSedonaUtility.saveSpatialRdd(spatialRdd, path) // <-- save with index and partitioned
// load & use multiple times:
val rdd = SomeSedonaUtility.loadSpatialRdd(path)
// and usage:
val otherRdd = Adapter.toSpatialRdd(otherDs, ...)
otherRdd.spatialPartitioning(rdd.getPartitioner)
val useIndex = true
val considerBoundaryIntersection = SpatialPredicate.COVERS
val params = new JoinQuery.JoinParams(useIndex, considerBoundaryIntersection, IndexType.RTREE, JoinBuildSide.LEFT)
val joined = JoinQuery.spatialJoin(rdd, otherRdd, params)
Actual behavior
Index & partitioning must be set at runtime (to my knowledge).
Steps to reproduce the problem
The feature is missing, so it's not possible to reproduce it.
Settings
Sedona version = 1.5.1
Apache Spark version = 3.5
API type = Scala
Scala version = 2.12
JRE version = 1.8
Environment = EMR
The text was updated successfully, but these errors were encountered:
Thanks @jiayuasu, so I read there also it is possible to save indexed rdd (https://sedona.apache.org/1.5.1/tutorial/rdd/#save-an-spatialrdd-indexed), but to my knowledge building an index requires spatial partitioning. So when I save the indexed RDD and then reload it back, there won't be partitioning set up but index will work ?
Also I'd like to know more details on this one, if possible:
We are working on some solutions. Stay tuned!
Is it something which we can expect maybe next release? Thanks!
Expected behavior
Maybe this is possible somehow, but I haven't find this anywhere. I'm relatively new to Sedona and Geo-processing.
I'd like to see a possibility to save and then load a spatial RDD which is already analyzed, partitioned and possibly with the index. We have a use case we use such dataset in many jobs (which use the same spatial data) and it's time-consuming to create the partitioning & build index every time.
Not sure if it's possible though.
For example:
Actual behavior
Index & partitioning must be set at runtime (to my knowledge).
Steps to reproduce the problem
The feature is missing, so it's not possible to reproduce it.
Settings
Sedona version = 1.5.1
Apache Spark version = 3.5
API type = Scala
Scala version = 2.12
JRE version = 1.8
Environment = EMR
The text was updated successfully, but these errors were encountered: