-
Notifications
You must be signed in to change notification settings - Fork 247
reduce
Chris Lu edited this page Nov 11, 2015
·
3 revisions
Reduce() is another important function for MapReduce.
A Reduce() step is divided into 2 steps:
- Reduce locally for each dataset shard.
- Merge results from step 1, and reduced to a single value.
Here are the actual Reduce() source code:
func (d *Dataset) Reduce(f interface{}) (ret *Dataset) {
return d.LocalReduce(f).MergeReduce(f)
}
Reduce(), LocalReduce(), MergeReduce() accept the same kind of function as parameter:
func(Value, Value)Value // Value is any user defined type
When this reducer function runs, it takes 2 values of "Value" type, and returns 1 value of the same type.
The initial value would be the zero value for type "Value".
Here are the actual ReduceByKey() source code:
func (d *Dataset) ReduceByKey(f interface{}) *Dataset {
return d.LocalSort(nil).LocalReduceByKey(f).MergeSorted(nil).LocalReduceByKey(f)
}
It will perform these steps:
- LocalSort() sorts the (Key,Value) pairs by Key on each local dataset shard
- LocalReduceByKey() locally reduce Values from the same Key, into (Key, Value) pairs. Now the Key within each local dataset shard will be unique.
- MergeSorted() merges each locally sorted reduced results
- LocalReduceByKey() runs local reduce again, but this time there is only one dataset shard after the merge.
ReduceByKey(), LocalReduceByKey() accept still the same kind of function as parameter:
func(Value, Value)Value // Value is any user defined type