forked from kingfengji/gcForest
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.txt
260 lines (208 loc) · 12.1 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Description: A python 2.7 implementation of gcForest proposed in [1]. %
%A demo implementation of gcForest library as well as some demo client scripts to demostrate how to use the code. %
%The implementation is flexible enough for modifying the model or fit your own datasets. %
% %
%Reference: [1] Z.-H. Zhou and J. Feng. Deep Forest: Towards an Alternative to Deep Neural Networks. %
% In IJCAI-2017. (https://arxiv.org/abs/1702.08835v2 ) %
% %
%Requirements: This package is developed with Python 2.7, please make sure all the dependencies are installed, %
%which is specified in requirements.txt %
% %
%ATTN: This package is free for academic usage. %
% You can run it at your own risk. %
% For other purposes, please contact Prof. Zhi-Hua Zhou([email protected]) %
% %
%ATTN2: This package was developed by Mr.Ji Feng([email protected]). %
% The readme file and demo roughly explains how to use the codes. %
% For any problem concerning the codes, please feel free to contact Mr.Feng. %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Package Official Website: http://lamda.nju.edu.cn/code_gcForest.ashx
This package is provided "AS IS" and free for academic usage.
You can run it at your own risk. For other purposes, please contact Prof. Zhi-Hua Zhou ([email protected]).
Before running the demo, make sure all the dependencies are installed, for instance, please
run the following command to install dependencies before running the code:
```pip install -r requirements.txt```
===================================
Outline for README
====================================
* Package Overview
* Notes on Demo Scripts
* Notes on Model Specification Files
* Example and Demos
* Using Own Dataset
==================================
Package Overview
==================================
* lib/gcforest
- code for the implementations for gcforest
* tools/train_fg.py
- the demo script used for training Fine grained Layers
* tools/train_cascade.py
- the demo script used for training Cascade Layers
* models/
- folder to save models which can be used in tools/train_fg.py and tools/train_cascade.py
- the gcForest structure is saved in json format
* logs
- folder logs/gcforest is used to save the logfiles produced by demo scripts
============================
Notes on Demo Scripts
============================
Below is a brief description on the args needed for demo scripts
%%%%%%%%%%%%%%%%%%%%
tools/train_fg.py
%%%%%%%%%%%%%%%%%%%%
* --model: str
- The config filepath for Fine grained models (in json format)
* --save_outputs: bool
- if True. The output predictions produced by Fine Grained Model
will be saved in model_cache_dir which is specified in Model Config.
This output will be used when Training Cascade Layer.
- the default value is false
%%%%%%%%%%%%%%%%%%%%%%
tools/train_cascade.py
%%%%%%%%%%%%%%%%%%%%%%
* --model: str
- The model config filepath for cascade training (in json format)
%%%%%%%%%%%%%%%%%%%%%%
Notes on Config Files
%%%%%%%%%%%%%%%%%%%%%%
Below is a brief introduction on how to use model specification files, namely
* model specification for fine grained scanning structure.
* model specification for cascade forests.
All the model specifications (in json files) are saved in models/
For instance, all the model specification files needed for MNIST is stored in models/mnist/gcforest
* ca is short for cascade structure specifications
* fg is short for fine-grained structure specifications
You can define your own structure by writing similar json files.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
FineGrained model's config (dataset)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* dataset.train, dataset.test: [dict]
- coresponds to the particular datasets defined in lib/datasets
- type [str]: see lib/datasets/__init__.py for a reference
- You can use your own dataset by writing similar wrappers.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
FineGrained model's config (train)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* train.keep_model_in_mem: [bool] default=0
- if 0, the forest will be freed in RAM
* train.data_cache : [dict]
- coresponds to the DataCache in lib/dataset/data_cache.py
* train.data_cache.cache_dir (str)
- make sure to change "/mnt/raid/fengji/gcforest/cifar10/fg-tree500-depth100-3folds/datas" to your own path
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
FineGrained model's config (net)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* net.outputs: [list]
- List of the data names output by this model
* net.layers: [List of Layers]
- Layer's Config, see lib/gcforest/layers for a reference
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Cascade model's config (dataset)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Similar as FineGrained's model config (dataset)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Cascade model's config (cascade)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
see lib/gcforest/cascade/cascade_classifier.py __init__ for a reference
=============================
Examples and Demos
=============================
Before running the scripts, make sure to change
* train.data_cache.cache_dir in the Finegrained Model Config (eg: model/xxx/fg-xxxx.json)
* train.cascade.dataset.{train,test}.data_path in the Finegrained-Cascade Model Config (eg: model/xxx/fg-xxxx-ca.json)
* train.cascade.cascade.data_save_dir in the Finegrained Model Config (eg: model/xxx/ca-xxxx.json and model/xxx/fg-xxxx-ca.json)
To Train a gcForest(with fine grained scanning), you need to run two scripts.
* Fine Grained Scanning: 'tools/train_fg.py'
* Cascade Training: 'tools/train_cascade.py'
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[UCI Letter](http://archive.ics.uci.edu/ml/datasets/Letter+Recognition)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* Get Data: you need to download the data by yourself by running the following command:
```Shell
cd dataset/uci_letter
sh get_data.sh
```
* Since we do not need to fine-grained scaning, we only train a Cascade Forest as follows:
- `python tools/train_cascade.py --model models/uci_letter/gcforest/ca-tree500-n4x2-3folds.json --log_dir logs/gcforest/uci_letter/ca`
* Adult, YEAST can be trained with similar procedure.
%%%%%%%%%%%%%%%%%%%%%
MNIST
%%%%%%%%%%%%%%%%%%%%%
* Get the data: The data will be automatically downloaded via 'lib/datasets/mnist.py', you do not need to do it yourself
* First Train the Fine Grained Forest:
- Run `python tools/train_fg.py --model models/mnist/gcforest/fg-tree500-depth100-3folds.json --log_dir logs/gcforest/mnist/fg --save_outputs`
- This means:
1. Train a fine grained model for MNIST dataset,
2. Using the structure defined in models/mnist/gcforest/fg-tree500-depth100-3folds.json
3. save the log files in logs/gcforest/mnist/fg
4. The output for the fine grained scanning predictions is saved in train.data_cache.cache_dir
* Then, train the cascade forest (Note: make sure you run the train_fg.py first)
- run `python tools/train_cascade.py --model models/mnist/gcforest/fg-tree500-depth100-3folds-ca.json`
- This means:
1. Train the fine grained scaning results with cascade structure.
2. The cascade model specification is defined in 'models/mnist/gcforest/fg-tree500-depth100-3folds-ca.json'
* You could also train a Cascade Forest without fine-grained scanning (but the accuracy will be much lower):
- Run `python tools/train_cascade.py --model models/mnist/gcforest/ca-tree500-n4x2-3folds.json --log_dir logs/gcforest/mnist/ca`
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[UCI sEMG](http://archive.ics.uci.edu/ml/datasets/sEMG+for+Basic+Hand+movements)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* Get Data
```Shell
cd dataset/uci_semg
sh get_data.sh
```
* First Train the Fine Grained Forest:
- `python tools/train_fg.py --model models/uci_semg/gcforest/fg-tree500-depth100-3folds.json --save_outputs --log_dir logs/gcforest/uci_semg/fg`
* Then, train the cascade forest (Note: make sure you run the train_fg.py first)
- `python tools/train_cascade.py --model models/uci_semg/gcforest/fg-tree500-depth100-3folds-ca.json --log_dir logs/gcforest/uci_semg/gc`
* You could also training a Cascade Forest without fine-grained scanning(but the accuracy will be much lower):
- `python tools/train_cascade.py --model models/uci_semg/gcforest/ca-tree500-n4x2-3folds.json --log_dir logs/gcforest/uci_semg/ca`
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[GTZAN](http://marsyasweb.appspot.com/download/data_sets/)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* Requirements(you need to install the following package)
librosa
* Get Data by yourself by running the following command
```Shell
cd dataset/gtzan
sh get_data.sh
cd ../..
python tools/audio/cache_feature.py --dataset gtzan --feature mfcc --split genre.trainval
```
* First Train the Fine Grained Forest:
- `python tools/train_fg.py --model models/gtzan/gcforest/fg-tree500-depth100-3folds.json --save_outputs --log_dir logs/gcforest/gtzan/fg`
* Then, train the cascade forest (Note: make sure you run the train_fg.py first)
- `python tools/train_cascade.py --model models/gtzan/gcforest/fg-tree500-depth100-3folds-ca.json --log_dir logs/gcforest/gtzan/gc`
* You could also training a Cascade Forest without fine-grained scanning(but the accuracy will be much lower):
- `python tools/train_cascade.py --model models/gtzan/gcforest/ca-tree500-n4x2-3folds.json --log_dir logs/gcforest/gtzan/ca --save_outputs`
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
IMDB
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* Cascade Forest:
- `python tools/train_cascade.py --model models/imdb/gcforest/ca-tree500-n4x2-3folds.json --log_dir logs/gcforest/imdb/ca`
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
CIFAR10
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* First Train the Fine Grained Forest:
- `python tools/train_fg.py --model models/cifar10/gcforest/fg-tree500-depth100-3folds.json --save_outputs`
* Then, train the cascade forest (Note: make sure you run the train_fg.py first)
- `python tools/train_cascade.py --model models/cifar10/gcforest/fg-tree500-depth100-3folds-ca.json`
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
For You Own Datasets
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* Data Format:
0. Please refer lib/datasets/mnist.py as an example
1. the dataset should has attribute X,y to represent the data and label
2. y should be 1-d array
3. For fine-grained scanning, X should be 4-d array (N x channel x H x W).
(e.g. cifar10 shoud be Nx3x32x32, mnist should be Nx1x28x28, uci_semg should be Nx1x3000x1)
* Model Specifications:
1. Save the json file in models/$dataset_name (recommended)
2. for a detailed description, see section 'Config Files'
* If you only need to train a cascade forest, run tools/train_cascade.py.
Happy Hacking.
Reference:
[1] Z.-H. Zhou and J. Feng. Deep Forest: Towards an Alternative to Deep Neural Networks. In IJCAI-2017.
(https://arxiv.org/abs/1702.08835v2 )