You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 20, 2022. It is now read-only.
importcom.twitter.summingbird.memory.Memoryvalsource=Memory.toSource(Seq(1,2,3))
valmapped= source.map { x => x ->1 }
valm=newMemoryvalstore1= scala.collection.mutable.Map.empty[Int, Int]
valstore2= scala.collection.mutable.Map.empty[Int, Int]
valtail= mapped.sumByKey(store1).also(mapped.sumByKey(store2))
m.run(m.plan(tail))
You would expect, for example, that store1 and store2 have the same contents. Instead, store1 has the expected contents while store2 is empty.
In Memory#toStream, we are careful to only compare producers on reference equality (Identity, etc.). But prior to that we call the DagOptimizer, which doesn't do the same.
Then we break the Producer equality (by giving it a different dependent Producer, or a different Store), the second sumByKey isn't optimized away, and the execution gives the expected result.
The text was updated successfully, but these errors were encountered:
I guess the store equality is screwing us here. We need each store to be
unique, right? Thus can be fixed by making a reference equality wrapper
type for stores.
On Tue, Aug 9, 2016 at 07:28 Joe Nievelt [email protected] wrote:
This may have impact beyond Memory.
If you setup a job like:
import com.twitter.summingbird.memory.Memory
val source = Memory.toSource(Seq(1,2,3))val mapped = source.map { x => x -> 1 }val m = new Memoryval store1 = scala.collection.mutable.Map.empty[Int, Int]val store2 = scala.collection.mutable.Map.empty[Int, Int]val tail = mapped.sumByKey(store1).also(mapped.sumByKey(store2))
m.run(m.plan(tail))
You would expect, for example, that store1 and store2 have the same
contents. Instead, store1 has the expected contents while store2 is empty.
In Memory#toStream, we are careful to only compare producers on reference
equality (Identity, etc.). But prior to that we call the DagOptimizer,
which doesn't do the same.
If we make either of the following changes:
val tail = mapped.sumByKey(store1).also(mapped.map(identity).sumByKey(store2))val store2 = scala.collection.mutable.Map(100 -> 100)
Then we break the Producer equality (by giving it a different dependent
Producer, or a different Store), the second sumByKey isn't optimized
away, and the execution gives the expected result.
This may have impact beyond
Memory
.If you setup a job like:
You would expect, for example, that
store1
andstore2
have the same contents. Instead,store1
has the expected contents whilestore2
is empty.In
Memory#toStream
, we are careful to only compare producers on reference equality (Identity
, etc.). But prior to that we call theDagOptimizer
, which doesn't do the same.If we make either of the following changes:
Then we break the
Producer
equality (by giving it a different dependentProducer
, or a differentStore
), the secondsumByKey
isn't optimized away, and the execution gives the expected result.The text was updated successfully, but these errors were encountered: