-
Notifications
You must be signed in to change notification settings - Fork 247
sort
Chris Lu edited this page Nov 23, 2015
·
4 revisions
Sorting is used in many functions, e.g., Join(), ReduceByKey().
A Sort() step is divided into 2 steps:
- Sort locally for each dataset shard.
- Merge results from step 1.
Here are the actual Sort() source code:
func (d *Dataset) Sort(f interface{}) (ret *Dataset) {
return d.LocalSort(f).MergeSorted(f)
}
Sort(), LocalSort(), MergeSorted() accept the same lessThanFunction as parameter:
func(a Key, b Key)bool // Key is any user defined type
When this sorter function runs, it compares 2 keys of "Key" type, and returns true if a is less than b.
The previous dataset should output tuple (Key, Value), or just type Key.
If the Key is kind of integer, string or float, the function parameter can be just nil. By default Glow has provided the lessThanFunction for these types of keys.
flow.New().TextFile(
"/etc/hosts", 7,
).Partition(
2,
).Map(func(line string) string {
return line
}).Sort(func(a string, b string) bool {
if strings.Compare(a, b) < 0 {
return true
}
return false
}).Map(func(line string) {
println(line)
}).Run()