Skip to content
Chris Lu edited this page Nov 16, 2015 · 4 revisions

Sorting is used in many functions, e.g., Join(), ReduceByKey().

Sort(), LocalSort(), MergeSorted()

A Sort() step is divided into 2 steps:

  1. Sort locally for each dataset shard.
  2. Merge results from step 1.

Here are the actual Sort() source code:

func (d *Dataset) Sort(f interface{}) (ret *Dataset) {
	return d.LocalSort(f).MergeSorted(f)
}

Sort(), LocalSort(), MergeSorted() accept the same lessThanFunction as parameter:

  func(a Key, b Key)bool  // Key is any user defined type

When this sorter function runs, it compares 2 keys of "Key" type, and returns true if a is less than b.

The previous dataset should output tuple (Key, Value), or just type Key.

If the Key is kind of integer, string or float, the function parameter can be just nil. By default Glow has provided the lessThanFunction for these types of keys.

Example

	flow.New().TextFile(
		"/etc/hosts", 7,
	).Partition(
		2,
	).Map(func(line string) string {
		return line
	}).Sort(func(a string, b string) bool {
		if strings.Compare(a, b) < 0 {
			return true
		}
		return false
	}).Map(func(line string) {
		println(line)
	}).Run()

Glow PartitionAndSort

Clone this wiki locally