This repository has been archived by the owner on Sep 28, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 21
/
Copy pathdoc.go
99 lines (98 loc) · 5.24 KB
/
doc.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
// Copyright 2017 Pilosa Corp.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions
// are met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the copyright holder nor the names of its
// contributors may be used to endorse or promote products derived
// from this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
// CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
// INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
// MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
// DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
// BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
// SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
// WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
// DAMAGE.
// package pdk is the Pilosa Development Kit! It contains various helper
// functions and documentation to assist in using pilosa.
//
// Of principal importance in the PDK is the ingest pipeline. Interfaces and
// basic implementations of each stage listed below are included in the PDK, and
// a number of more sophisticated implementations which may rely on other
// software are in sub-packages of the PDK.
//
// 1. Source
//
// A pdk.Source is at the beginning of every indexing journey. We know
// you, and we know your data is everywhere - S3 buckets, local files, Kafka
// topics, hard-coded in tests, SQL databases, document DBs, triple stores.
// Different Sources know how to interact with the various systems holding
// your data, and get it out, one piece at a time, all wrapped up behind one
// convenient interface. To write a new Source, simply implement the Source
// interface, returning whatever comes naturally from the underlying client
// library or API with which you are interacting. It is not the job of the
// source to manipulate or massage the data in any way - that job falls to
// the Parser which is the next stage of the ingestion journey. The reason
// for this separation is twofold: first, you may get the same type of data
// from many different sources, so it may be convenient to couple one parser
// to several different sources. Secondly, you may require different
// concurrency or scaling properties from fetching the data vs parsing it.
// For example, if you are interacting with an HTTP endpoint at significant
// latency, you may want many routines issuing concurrent calls in order to
// achieve the desired throughput, but parsing is relatively lightweight, and
// a single routine is sufficient to process the load.
//
// 2. Parser
//
// The Parser does the heavy lifting for turning some arbitrary type of data
// into something slightly more structured, recognizable, and type-safe.
// There are many choices to be made when indexing data in Pilosa around
// tradeoffs like speed vs precision, or storage size. When to use bucketing
// vs range encoding, when time quantum support is needed and at what
// granularity, etc. These things are not the job of the Parser. The Parser
// should only get the data into a regular format so that the Mapper can make
// those tradeoffs without having to worry excessively over decoding the
// data. The Parser must convert incoming data into an RDF-triple like
// representation using a handful of supported basic values detailed in
// entity.go. Determining how to collapse (e.g.) arbitrary JSON data
// into this format is not a trivial task, and indeed there may be multiple
// ways to go about it and so it is possible that multiple parsers may exist
// which operate on the same type of Source data.
//
// 2.5. Transformer
//
// One may optionally provide a number of Transformers which do in-place
// operations on the Entity before it is passed to the Mapper.
//
// 3. Mapper
//
// The Mapper's job is to take instances of pdk.Entity and create
// pdk.PilosaRecord objects. Because the pdk.Entity is fairly well-defined,
// it is possible to do this generically, and it may not be necessary to use
// a bespoke Mapper in many cases. However, as mentioned in the Parser
// description, there are performance and capability tradeoffs based on how
// one decides to map data into Pilosa. (TODO expand with more examples as
// mappers are implemented, also reference generic mapper and its config
// options)
//
// 4. Indexer
//
// The Indexer is responsible for getting data into Pilosa. Primarily, there
// is a latency/throughput tradeoff depending on the batch size selected.
package pdk