Make Local and Spark Arrays have identical APIs #100

kmader · 2016-10-17T21:54:30Z

For testing and developing functions it would be nice if Local and Spark Arrays had identical functions. This might be easiest by a faking SparkContext and RDD and using BoltArraySpark for everything.

jwittenbach · 2016-10-18T17:25:54Z

@kmader thanks for the interest in this!

We thought about trying something similar a while back. Recently though, we have been leaning more toward removing the local version of the BoltArray entirely and making the project completely about a distributed ndarray. The thought was that this might be best in terms of modularity -- then a higher level project could wrap the NumPy array, Bolt array, and maybe even the Dask array.

Would be interested to hear your thoughts on this.

kmader · 2016-10-18T18:07:42Z

@jwittenbach That seems like it might be a good direction to go in. I started on a localspark implementation that simply faked the Spark backend so commands like chunk and unchunk work the same with or without pyspark (https://github.com/bolt-project/bolt/pull/101/files)

Another approach would be to have a generic ParallelBackend for BoltArrays that could be implemented in Spark, Flink, Multiprocessing, etc and allow the operations to all be done the same way.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Local and Spark Arrays have identical APIs #100

Make Local and Spark Arrays have identical APIs #100

kmader commented Oct 17, 2016

jwittenbach commented Oct 18, 2016

kmader commented Oct 18, 2016 •

edited

Loading

Make Local and Spark Arrays have identical APIs #100

Make Local and Spark Arrays have identical APIs #100

Comments

kmader commented Oct 17, 2016

jwittenbach commented Oct 18, 2016

kmader commented Oct 18, 2016 • edited Loading

kmader commented Oct 18, 2016 •

edited

Loading