forked from Shark-ML/Shark
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.txt
188 lines (145 loc) · 8.36 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
Shark is a fast, modular, general open-source C++ machine
learning library.
Shark is licensed under the GNU Lesser General Public License, please
see the files COPYING and COPYING.LESSER, or visit
http://www.gnu.org/licenses .
Any application of the SHARK code toward military research and use is
expressly against the wishes of the SHARK development team.
INSTALLATION / DOCUMENTATION
----------------------------
The entry point to the Shark library documentation is located at
doc/index.html . For installation instructions, please click on
"Getting started" on that page. Short version of installation guide:
issue "ccmake ." in the main directory to select your build options,
and afterwards issue "make" in the main directory -- you should be
done (assuming Boost and CMake were installed). See the documentation
for detailed instructions.
BUILDING THE DOCUMENTATION: To build the documentation yourself (e.g.,
if you need to read it locally in order to install it, i.e., because
you don't have internet), see doc/README.txt
FILE STRUCTURE
--------------
README.txt This file (residing in the root directory of
the Shark library).
CMakeLists.txt Definitions for the CMake build system.
include/ This directory and its sub-directories hold
all include files of the library. Note that
some functionality is implemented in lower-
level Impl/ folders and inline .inl files.
lib/ The Shark library is placed in this directory.
In the source code distribution this directory
is initially empty, and the library is placed
into the directory as the results of
compilation. Binary distributions already
contain the library, pre-built in release mode.
doc/ All documentation files are found in this
sub-directory. In packaged versions of Shark
the html documentation is pre-built; the
repository provides the corresponding sources.
The documentation contains technical reference
documents for all classes and functions as well
as a collection of introductory and advanced
tutorials.
doc/index.html Entry point to the Shark documentation.
examples/ The examples directory contains example
use-cases of the most important algorithms
implemented in Shark. Besides exemplifying
powerful learning algorithms, these programs
are intended as starting points for
experimentation with the library. The
executables corresponding to the C++ example
programs are found in examples/bin/.
Test/ Shark comes with a large collection of unit
tests, all of which reside inside the Test
directory.
bin/ The binaries of the Shark unit tests are placed
here. Once the CMake build system is set up
(with the "ccmake" command or equivalent) the
whole test suite can be executed with the
command "make test", issued in the Shark root
directory.
src/ Source files of the Shark library. Note that
from Shark version 3 onwards large parts of the
library are templated and therefore header-only.
contrib/ The contrib directory contains (non-standard)
tools by third parties. Typically, there is no
need for users of Shark to deal with these
tools directly.
gpl-3.0.txt GNU general public license, version 3.
Note:
Depending of the type of Shark distribution (binary or source
package, or current repository snapshot) not all of theses files
and directories are present.
PACKAGE STRUCTURE
-----------------
>> Note for users of Shark 2: <<
The internal structure of the Shark library has changed in the
transition to version 3. The old infrastructure packages Array, Rng,
and FileUtil, as well as parts of LinAlg, have been replaced with
more modern solutions provided by Boost. The machine learning
related components EALib, MOO-EALib, Mixture, ReClaM, and TimeSeries
have been unified and organized into competely new interfaces.
Therefore there is no one-to-one correspondance between files or
even concepts in version 3 and in older versions of Shark. In fact,
the lion's share of the library has been rewritten from scratch,
and this is also reflected in a completely new structure. In
particular, many of the rather independent sub-modules (such as
Mixture and MOO-EALib) have been unified. They now share the same
top-level interfaces and thus form a coherent learning architecture.
The organization of the include/ directory reflects the structure of
the Shark library. It consists of the following modules:
GENERAL INFRASTRUCTURE:
LinAlg Data structures and algorithms for typical
linear algebra computations. For (dense and
sparse) vector and matrix classes Shark relies
on Boost uBLAS. Many higher level algorithms
(such as singular value decomposition) are
still implemented by the library itself.
Statistics This component is new in Shark 3. It wraps the
capabilities of Boost accumulators, and it
provides tools that appear regularly in machine
learning, such as the Mann-Whitney U-test (also
known as the Wilcoxon rank-sum test).
LEARNING INFRASTRUCTURE:
Core The core module is the central place for all
top-level interfaces. In addition it holds a
few infrastructure classes, such as exceptions.
Data The data module hosts data containers that have
been specifically designed for the needs of
machine learning code. Also, data can be
imported and exported from and to different
standard machine learning data file formats.
MACHINE LEARNING:
Models Models are adaptive systems, the architectures
on top of which (machine) learning happens.
Shark features a rich set of models, from simple
linear maps to (feed-forward and recurrent)
neural networks, support vector machines, and
different types of trees. Models can also be
concatenated with data format converters and
other models.
ObjectiveFunctions This module collects different types of cost,
fitness, or objective functions for learning.
The bandwidth includes data-dependent error
functions based on simple loss functions,
cross-validation, area under the ROC curve, and
different objectives used for model selection.
Algorithms All actual learning algorithms reside in this
module. There are two main groups of learning
algorithms, namely iterative optimizers and
more specialized model trainers. General
optimizers are organized into direct search
and gradient-based optimization. Specialized
algorithms for linear programming (a part of
GLPK, the GNU linear programming kit) and
quadratic programming for training of non-linear
support vector machines are included. Shark
also ships with algorithms for efficient
nearest neighbor search.
Fuzzy The fuzzy module provides classes for the
representation of linguistic terms, variables,
operators and rules, as well as fuzzy logic
interference engines and controllers.
Unsupervised This module contains the Shark implementation
of restricted Bolzmann machines (RBMs),
a recent experimental feature of Shark.