-
Notifications
You must be signed in to change notification settings - Fork 3
/
README
340 lines (233 loc) · 12 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
Fault Injection
===============
Author: Roman Penyaev <[email protected]>
Author: Roman Penyaev <[email protected]>
Introduction
------------
Fault injection is a framework which aims to help developers and testers
to reproduce subtle bugs and to test thorougly error paths in the code.
Firstly developer has to put special fault points in desired places all
over the code, then those fault points can be configured in different
ways and various faults can be injected.
Currently fault injection supports three kind of faults: delays, errors
and panics.
Implementation Details
----------------------
Fault injection is based on jump labels, which means that big amount of
fault points spread all over the code should not impact performance,
because CPU will execute NOP if fault points are disabled.
Fault points are statically compiled in and thus do not require memory
allocation. Also, debugfs API is provided for flexible configuration
from userspace.
Generally speaking, fault point is a code branch which is executed only
if fault point has been enabled. This code branch should be put by the
developer and only she decides, according to the code logic, what should
be executed if fault was injected. Let's take a real code example:
mem = kmalloc(len, GFP_KERNEL);
if (unlikely(!mem)) {
pr_err("%s: allocation failed\n", __func__);
return -ENOMEM;
}
Here it will be interesting to introduce an error instead of calling
kmalloc and to test an error path. Using fault injection framework it
can be done like this:
+ #define kmalloc(...) \
+ (INJECT_FAULT() ? NULL : kmalloc(__VA_ARGS__))
+
mem = kmalloc(len, GFP_KERNEL);
if (unlikely(!mem)) {
pr_err("%s: allocation failed\n", __func__);
return -ENOMEM;
}
If this fault point was configured to inject an error kmalloc will not
be executed and an error will be returned from the macro.
In the example above compiler will optimize INJECT_FAULT, and by default
NOP will be executed followed. When this fault point is configured to
introduce an error, NOP instruction will be replaced with JMP instruction
and slow path of the code will be executed.
Internally fault point is represented as a static stateless entry compiled
in __jump_table, so at any point of time we can list all the points,
configure, enable or disable them.
As was told, fault point is stateless, thus it does not know what kind
offault it should inject, and that should be decided while doing
configuration. That approach gives a lot of freedom to manipulate fault
points and to configure them according to testing scenarios.
Injecting faults in the code or macros API
------------------------------------------
Firstly fault injection must be registered by the module, in which
it is supposed to be used:
static struct fault_inject inj;
fault_inject_register(&inj, THIS_MODULE);
Fault injection framework provides three kinds of macros for fault
injection:
o INJECT_FAULT -
Simplest one, returns an error if fault has been injected.
Code example:
+ err = INJECT_FAULT(&inj, "MEM");
+ if (unlikely(err))
+ return err;
o INJECT_FAULT_INT -
Accepts function parameter, which will be executed if fault was
not injected. In case of fault injection function will not be
executed and error as integer will be returned.
Code example:
- err = do_useful_stuff(a, b, c);
+ err = INJECT_FAULT_INT(&inj, "DEV", do_useful_stuff(a, b, c));
if (unlikely(err))
return err;
o INJECT_FAULT_PTR -
This macro is almost the same as INJECT_FAULT_INT, but instead
of integer representation of an error pointer will be returned.
If error has happened it will be incapsulated inside a pointer.
Code example:
- ptr = do_useful_stuff(a, b, c);
+ ptr = INJECT_FAULT_PTR(&inj, "DEV", do_useful_stuff(a, b, c));
if (unlikely(IS_ERR(ptr)))
return PTR_ERR(ptr);
Second parameter for each INJECT_FAULT macro is a shost fault class
name, which can be used for easy faults parsing. In the example
above two fault classes were created "MEM" and "DEV".
As was told in introduction those fault points can introduce three kind
of faults: delays, errors or panics. What faults should be raised is
decided on the configuration stage.
Configuration from userspace or debugfs API
-------------------------------------------
Each module has it's own fault injection configuration which is based on
debugfs and is located here: /sys/kernel/debug/fault_inject/{MODULE_NAME}/.
List of debugfs entries and their meaning:
o list_fault_points [read only file]
Reading from this file outputs the whole list of compiled in fault
points, e.g. the output can be:
fault group class address function+off/size file:line
----- ----- ---- 0xffffffffa068b0b4 init+0x0b4/0x5c6 main.c:2140
----- ----- ---- 0xffffffffa068b13a init+0x13a/0x5c6 main.c:2150
DE--- 1 ---- 0xffffffffa061234b exit+0x12a/0x33 main.c:342
-E--- 2 ---- 0xffffffffa042313c foo+0x124a/0x14a main.c:11
'fault' column - types of faults which are configured for this fault
point, where D - delay, E - error, P - panic.
'group' column - the group to which the fault point belongs, where
'-----' means fault point does not belong to any
group.
'class' column - class of the fault.
'address' column - address in the code where fault point is placed.
'function' column - function name, offset and the size where fault
point is placed.
'file' column - file and line of the current source file where
fault point is placed. Keep in mind, that because
of function inlining multiple different functions
can have fault points with equal file name and line.
Reading from this file will give you explicit information about all
fault points for the module and their configuration, like to which
group it belongs or what kinds of faults are enabled right now.
o create_group [write only file]
Writing some number to that file creates fault group with name corresponding
to the number you have written. Fault group is an abstraction which unites
variety of fault points which should be configured equally. So, basically
configration is performed for fault group, not for each fault point, but
fault group includes variety of fault points. E.g. you can use this command:
# echo 0 > ./create_group
If group already exists error will be returned. In case of success
directory with group name will be created, i.e. 0/ in current example.
The maximum possible amount of groups are limited to 256.
o next_group [read only file]
Reading from this file returns next free group number, e.g.
# cat ./next_group
1
That can be helpful to use from testing scripts in such command:
# cat ./next_group > ./create_group
Of course user should think about concurrent access.
o delete_group [write only file]
Writing group number to that file removes specified group. If group does
not exist error will be returned.
o {group_number}/ [directory]
Fault group directory which has configuration files for this group.
o {group_number}/list_fault_points [read only file]
Reading from this file outputs the list of compiled in fault points
included to the current group. By default the list is empty and contains
the header only:
# cat ./list_fault_points
fault group class address function+off/size file:line
o {group_number}/add_fault_points [write only file]
Writing line with code address to that file will add fault point to this
group. Basic requirements are:
o line should have '\n' at the end
o the total length of the line should not exceed 127 chars, i.e. 126 + \n
o line should have any address in hex starting from 0x
For example:
# echo 0xffffffffa068b0b4 > ./add_fault_points
or more convenient way to add all fault points (please note that we
catting from 'list_fault_points' located in parent directory):
# cat ../list_fault_points > ./add_fault_points
or even filtering:
# cat ../list_fault_points | grep main.c > ./add_fault_points
If fault point already belongs to some group an error will be returned.
If address is parsed but does not exist an error will be returned.
o {group_number}/del_fault_points [write only file]
Writing line with code address to that file will remove fault point
from that group. Basic requirements and restrictions are similar to
what is listed in in the description to add_fault_points file.
For example:
# cat ./list_fault_points | grep main.c | sponge ./del_fault_points
BEWARE: use 'sponge' to soak up all the input before writing to
del_fault_points, because in other case you will get an error.
Why? Keep in mind that you are reading the fault points from
the list and at the same time removing them from the same list.
'sponge' will help you!
o {group_number}/delay/ [directory]
Directory with delay configuration for this group of fault points.
o delay_us [read write file, default value 0]
How many us of delay should be introduced if fault is injected.
Accepts integer value.
o {group_number}/error/ [directory]
Directory with error configuration for this group of fault points.
o errors [read write file, default value is empty string]
Writing comma separate list of errors to that file will introduce
specified error on fault injection. Errors will be processed sequentially,
using round-robin pattern.
Reading from the file returns configured list of errors prefixed
with minuses.
For example writing:
# echo -ENOSPC,-EAGAIN > ./errors
or without minuses:
# echo ENOSPC,EAGAIN > ./errors
and reading them back:
# cat ./errors
-EAGAIN,-ENOSPC
In case of parsing failure error will be returned.
o {group_number}/panic/ [directory]
Directory with panic configuration for this group of fault points.
If enabled and execution steps on fault point the kernel will panic.
o Common files for delay/, error/ panic configurations:
o enable [read write file, default value 0]
Writing 1 or 0 enables or disables respectively the configuration.
Reading from that file tells you if that configuration is enabled
or not.
o hits [read only file, default value 0]
Statistics value which tells how many times that group of fault
points was hit. Keep in mind, that hitting does not mean fault
injection, because we have other parameters like probability,
interval or times.
o injected [read only file, default value 0]
Statistics value which tells how many times that group of fault
points was really injected. So basically for group, which is
configured to delay, that means how many times executed fault points
was delayed, or for group, which was configured to introduce an error,
that means exactly how many times error was returned.
o times [read write file, default value -1]
Specifies how many times failures may happen at most.
A value of -1 means "no limit".
Zero value is not accepted.
o probability [read write file, default value 100]
Likelihood of failure injection, in percent.
Accepted values are in the range [1, 100].
For example, if probability is set to 50 the ratio of hits and
injected should be close to 2.
o interval [read write file, default value 1]
Specifies the interval between failures.
Zero value is not accepted.
o task_filter [read write file, default value 0]
The default value is 0, which means filtering by task is disabled.
Any positive value limits failures to only processes indicated by
/proc/<pid>/make-it-fail==1.
NOTE: This option is available only if CONFIG_FAULT_INJECTION is
enabled for the kernel configuration.