Skip to content
Chris Lu edited this page Nov 11, 2015 · 3 revisions

Reduce() is another important function for MapReduce.

Reduce(), LocalReduce(), MergeReduce()

A Reduce() step is divided into 2 steps:

  1. Reduce locally for each dataset shard.
  2. Merge results from step 1, and reduced to a single value.

Here are the actual Reduce() source code:

func (d *Dataset) Reduce(f interface{}) (ret *Dataset) {
	return d.LocalReduce(f).MergeReduce(f)
}

Reduce(), LocalReduce(), MergeReduce() accept the same kind of function as parameter:

  func(Value, Value)Value  // Value is any user defined type

When this reducer function runs, it takes 2 values of "Value" type, and returns 1 value of the same type.

The initial value would be the zero value for type "Value".

ReduceByKey(), LocalReduceByKey()

Here are the actual ReduceByKey() source code:

func (d *Dataset) ReduceByKey(f interface{}) *Dataset {
	return d.LocalSort(nil).LocalReduceByKey(f).MergeSorted(nil).LocalReduceByKey(f)
}

It will perform these steps:

  1. LocalSort() sorts the (Key,Value) pairs by Key on each local dataset shard
  2. LocalReduceByKey() locally reduce Values from the same Key, into (Key, Value) pairs. Now the Key within each local dataset shard will be unique.
  3. MergeSorted() merges each locally sorted reduced results
  4. LocalReduceByKey() runs local reduce again, but this time there is only one dataset shard after the merge.

ReduceByKey(), LocalReduceByKey() accept still the same kind of function as parameter:

  func(Value, Value)Value  // Value is any user defined type
Clone this wiki locally