-
Notifications
You must be signed in to change notification settings - Fork 247
data source
Chris Lu edited this page Apr 27, 2016
·
3 revisions
One of key features of Glow system is that all data could be strongly typed. This includes input data type. The data type requires that all field types be simple serializable types. Pointers or Channels are not supported.
How to feed data into Glow system? There are 2 different ways:
- Pull from a known location
- Pushed through Go channel
This is useful when you already know where to fetch data. For example, you may know a HDFS folder under which there are lots of files.
import "github.com/colinmarc/hdfs"
...
flow.New().Source(func(outFiles chan os.FileInfo){
client, _ := hdfs.New("namenode:8020")
file, err := client.Open("/_test/fulldir3")
res, err := file.Readdir(0)
for _, entry := range res{
outFiles <- entry
}
})
import "github.com/colinmarc/hdfs"
...
var outFiles chan os.FileInfo
go func(){
client, _ := hdfs.New("namenode:8020")
file, err := client.Open("/_test/fulldir3")
res, err := file.Readdir(0)
for _, entry := range res{
outFiles <- entry
}
close(outFiles)
}()
flow.New().Channel(outFiles)
Slice() is a convenient method using channel underneath.
// process each file in its own mapper process
flow.New().Slice(
[]string{"/foo/bar_1","/foo/bar_2","/foo/bar_3"},
).Partition(3).Map(...)