-
Notifications
You must be signed in to change notification settings - Fork 3
/
run_algorithm
executable file
·331 lines (253 loc) · 9.92 KB
/
run_algorithm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
#!/usr/bin/env python
r"""Run Exareme2 algorithms from the command line
USAGE
./run_algorithm [-h] -a ALGORITHM -y Y [Y ...] [-x X [X ...]]
-d DATASETS [DATASETS ...] -m DATA_MODEL [-p PARAMETER PARAMETER] [-n]
echo <json> | ./run_algorithm [-h] -a ALGORITHM [-y Y [Y ...]] [-x X [X ...]]
[-d DATASETS [DATASETS ...]] [-m DATA_MODEL] [-p PARAMETER PARAMETER] [-n]
./run_algorithm --doc
DESCRIPTION
The first form is used when you want to pass all algorithm args from the
command line. In this form some args are required and some are optional,
according to the algorithm specifications.
In the second form, <json> is a valid json string. Using this form you can
pipe a json string with the algorithm args to the script, to avoid writing
them on the command line. In this form, only the ALGORITHM arg is required
and all other args are optional. If any other arg is passed from the
command line it overrides the corresponding field in <json>.
The third form prints the documentation.
ARGUMENTS
-a, --algorithm
The algorithm name. This is required in both forms.
-y
One or more space-separated fields to include in the Y variable.
-x
One or more space-separated fields to include in the X variable.
-d, --datasets
One or more space-separated dataset names.
-m, --data-model
The data model name in full form, NAME:VERSION.
-p, --parameter
A key value pair for an algorithm parameter. We can supply multiple
parameters by repeating the '-p <key> <value>' sequences. Parameter
values can be of types bool, int, float and str.
-n, --dry-run
Print request object without making actual request.
-h, --help
Print short help.
--doc
Print documentation. Cannot be combined with any other arg.
USING PIPED DATA
The second form can be used with any program that outputs a valid json
object, not only echo. For example you can keep the appropriate object in a
file and use
cat file.json | ./run_algorithm ...
The json format has to be identical to the one defined in the algorithm
specs file.
One useful use case is to run a particular test case from the expected json
file, e.g. when trying to debug a particular test that failed. This can be
done using the jq program (https://stedolan.github.io/jq/) which can search
through a json file and filter some substructure.
For example, you can run test case number 3 found in expected.json like this
jq '.test_cases[3].input' expected.json | ./run_algorithm ...
Since the algorithms always return their results in json format you can
also filter the output for a particular field. For example, running the
first test case of the PCA algorithm, you want to see only the resulting
eigenvalues
jq '.test_cases[0].input' expected.json | ./run_algorithm -a pca \
| jq '.eigenvalues'
When running a particular test case, if you want to change the value of
some argument you can supply it after ./run_algorithm. The supplied value
will override the value found in the json data structure. For example, if
you want the same variables and parameters, but a different dataset
jq '.test_cases[0].input' expected.json | ./run_algorithm -a pca -d ppmi0
ALGORITHM EXAMPLES
Examples of valid algorithm inputs
PEARSON CORRELATION
./run_algorithma -a pearson_correlation -y leftamygdala lefthippocampus \
-d ppmi0 -m dementia:0.1 -p alpha 0.95
PCA
./run_algorithm -a pca -y leftamygdala lefthippocampus -d ppmi0 \
-m dementia:0.1
ONE WAY ANOVA
./run_algorithm -a anova_oneway -x ppmicategory -y lefthippocampus \
-d ppmi1 -m dementia:0.1
TWO WAY ANOVA
./run_algorithm -a anova -x ppmicategory gender -y lefthippocampus \
-d ppmi1 -m dementia:0.1 -p sstype 2
LINEAR REGRESSION
./run_algorithm -a linear_regression -y righthippocampus \
-x lefthippocampus leftamygdala -d ppmi0 -m dementia:0.1
LINEAR REGRESSION CROSS-VALIDATION
./run_algorithm -a linear_regression_cv -y righthippocampus \
-x lefthippocampus leftamygdala -d ppmi0 -m dementia:0.1 -p n_splits 4
LOGISTIC REGRESSION
./run_algorithm -a logistic_regression -y ppmicategory \
-x lefthippocampus leftamygdala -d ppmi0 ppmi1 -m dementia:0.1 \
-p positive_class PD
LOGISTIC REGRESSION CROSS-VALIDATION
./run_algorithm -a logistic_regression_cv -y gender \
-x lefthippocampus righthippocampus -d ppmi0 ppmi1 -m dementia:0.1 \
-p n_splits 4 -p positive_class F
T-TEST ONE-SAMPLE
./run_algorithm -a ttest_onesample -y lefthippocampus -d ppmi0 -m dementia:0.1 \
-p alt_hypothesis two-sided -p alpha 0.95 -p mu 0
T-TEST PAIRED
./run_algorithm -a ttest_paired -y lefthippocampus -x righthippocampus \
-d ppmi0 -m dementia:0.1 -p alt_hypothesis two-sided -p alpha 0.95
"""
import json
import pydoc
import sys
from argparse import ArgumentParser
import requests
ALGORITHMS_URL = "http://127.0.0.1:5000/algorithms/"
def parse_argv(argv):
parser = ArgumentParser(
prog=f"{sys.argv[0]}",
epilog=f"To see full documentation run {sys.argv[0]} --doc.",
)
required_when_tty = sys.stdin.isatty()
parser.add_argument(
"-a",
"--algorithm",
required=True,
help="Algorithm name",
)
parser.add_argument(
"-y",
required=required_when_tty,
nargs="+",
help="Y variables, multiple values",
)
parser.add_argument(
"-x",
required=False,
nargs="+",
help="X variables, multiple values",
)
parser.add_argument(
"-d",
"--datasets",
required=required_when_tty,
nargs="+",
help="Datasets, multiple values",
)
parser.add_argument(
"-m",
"--data-model",
required=required_when_tty,
help="Data model, full name NAME:VERSION",
)
parser.add_argument(
"-p",
"--parameter",
required=False,
nargs=2,
action="append",
dest="parameters",
help="Parameter as key value pair. Can repeat '-p key value' multiple times.",
)
parser.add_argument(
"-n",
"--dry-run",
required=False,
action="store_true",
help="Print request object without making actual request.",
)
parser.add_argument(
"-f",
"--flag",
required=False,
nargs=2,
action="append",
dest="flags",
help="Flags as key value pair. Can repeat '-f key value' multiple times.",
)
return parser.parse_args(argv)
def guess_type_and_cast(value: str):
"""Casts values passed in the command line by guessing their type.
Types supported: bool, int, float, valid json as string and any other string.
"""
try:
return int(value)
except ValueError:
pass
try:
return float(value)
except ValueError:
pass
if value == "True":
return True
if value == "False":
return False
try:
return json.loads(value)
except json.JSONDecodeError:
pass
return value
def get_commandline_request_data(cl_args):
"""Formats an algorithm request data object from command line args."""
parameters = {
key: guess_type_and_cast(value) for key, value in cl_args.parameters or []
}
flags = {key: guess_type_and_cast(value) for key, value in cl_args.flags or []}
return dict(
inputdata=dict(
data_model=cl_args.data_model,
datasets=cl_args.datasets,
filters=None,
y=cl_args.y,
x=cl_args.x,
),
parameters=parameters,
flags=flags,
)
def get_piped_request_data():
"""Gets an algorithm request data object from the stdin. If nothing is piped,
return empty object."""
if not sys.stdin.isatty():
piped_input = json.loads(sys.stdin.read())
piped_input["inputdata"]["filters"] = None # fix for empty string in test case
sys.stdin = open("/dev/tty") # get back stdin in case we need it for debugger
return piped_input
return {"inputdata": {}, "parameters": {}, "flags": {}}
def merge_request_data(data1, data2):
"""Merge request data1 and request data2 prioritizing data1."""
# merge inpudata
keys = set(data1["inputdata"].keys()) | set(data2["inputdata"].keys())
inputdata = {
key: data1["inputdata"].get(key, None) or data2["inputdata"].get(key, None)
for key in keys
}
# merge parameters
parameters = data1.get("parameters", None) or data2.get("parameters", None)
# merge flags
flags = data1.get("flags", None) or data2.get("flags", None)
merged = {"inputdata": inputdata, "parameters": parameters, "flags": flags}
return merged
def do_post_request(algorithm, request_data):
url = ALGORITHMS_URL + algorithm
headers = {"Content-type": "application/json", "Accept": "text/plain"}
response = requests.post(url, data=json.dumps(request_data), headers=headers)
return response
def print_response(response):
try:
print(json.dumps(json.loads(response.text), indent=4))
except json.decoder.JSONDecodeError:
print(f"Something went wrong:\n\n{response.text}")
def main(argv):
if len(argv) == 1 and argv[0] == "--doc":
pydoc.pager(__doc__)
return
cl_args = parse_argv(argv)
cl_data = get_commandline_request_data(cl_args)
piped_data = get_piped_request_data()
request_data = merge_request_data(piped_data, cl_data)
if cl_args.dry_run:
print(json.dumps(request_data, indent=4))
else:
response = do_post_request(cl_args.algorithm, request_data)
print_response(response)
if __name__ == "__main__":
main(sys.argv[1:])