-
Notifications
You must be signed in to change notification settings - Fork 247
reduce
Reduce() is another important function for MapReduce.
A Reduce() step is divided into 2 steps:
- Reduce locally for each dataset shard.
- Merge results from step 1, and reduced to a single value.
Here are the actual Reduce() source code:
func (d *Dataset) Reduce(f interface{}) (ret *Dataset) {
return d.LocalReduce(f).MergeReduce(f)
}
Reduce(), LocalReduce(), MergeReduce() accept the same kind of function as parameter:
func(Value, Value)Value // Value is any user defined type
When this reducer function runs, it takes 2 values of "Value" type, and returns 1 value of the same type.
The initial value would be the zero value for type "Value".
Here are the actual ReduceByKey() source code:
func (d *Dataset) ReduceByKey(f interface{}) *Dataset {
return d.LocalSort(nil).LocalReduceByKey(f).MergeSorted(nil).LocalReduceByKey(f)
}
It will perform these steps:
- LocalSort() sorts the (Key,Value) pairs by Key on each local dataset shard
- LocalReduceByKey() locally reduce Values from the same Key, into (Key, Value) pairs. Now the Key within each local dataset shard will be unique.
- MergeSorted() merges each locally sorted reduced results
- LocalReduceByKey() runs local reduce again, but this time there is only one dataset shard after the merge.
ReduceByKey(), LocalReduceByKey() accept still the same kind of function as parameter:
func(Value, Value)Value // Value is any user defined type
You may notice ReduceByKey() is using nil, which is a default lessThanFunc to compare keys. It will work when keys are integer, string, or float, but not work for user defined key type. ReduceByUserDefinedKey() is to customize lessThanFunc for user defined keys
func (d *Dataset) ReduceByUserDefinedKey(lessThanFunc interface{}, reducer interface{}) *Dataset {
return d.LocalSort(lessThanFunc).LocalReduceByKey(reducer).MergeSorted(lessThanFunc).LocalReduceByKey(reducer)
}
Here are the function types:
lessThanFunc: func(Key, Key) bool // Key is any user defined type for key
reducer: func(Value, Value) Value // Value is any user defined type for value