-
Notifications
You must be signed in to change notification settings - Fork 2
/
appendix.7
250 lines (222 loc) · 10.3 KB
/
appendix.7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
/*man-start*********************************************************************
========================================================================
APPENDIX 7 - REGULAR EXPRESSIONS IN THE
========================================================================
This appendix contains details on regular expression usage in THE. There are
two places where THE uses regular expressions; in targets in commands like
<LOCATE> and <ALL>, and in the specification of patterns in THE Language
Definition files used for syntax highlighting.
THE uses the GNU Regular Expression Library to implement regular expressions.
This library has several different regular expression syntaxes that can be used
when specifying targets.
Note that all pattern specifications used for syntax highlighting always uses
the EMACS regular expression syntax.
The following table lists the features of each of the regular expression
syntaxes that can be set via the <SET REGEXP> command. Each feature in the
table is explained later.
This appendix is not intended to explain everything about regular expressions.
If you want to find out more about GNU Regular Expressions, then view the on-line
documentation at <http://hessling-editor.sf.net/doc/regex/>.
+------------------------+----------------------------+
| Syntax | Features |
+------------------------+----------------------------+
| EMACS | None set |
+------------------------+----------------------------+
| AWK | BACKSLASH_ESCAPE_IN_LISTS |
| | DOT_NOT_NULL |
| | NO_BACKSLASH_PARENS |
| | NO_BACKSLASH_REFS |
| | NO_BACKSLASH_VBAR |
| | NO_EMPTY_RANGES |
| | UNMATCHED_RIGHT_PAREND_ORD |
+------------------------+----------------------------+
| POSIX_AWK | CHAR_CLASSES |
| | DOT_NEWLINE |
| | DOT_NOT_NULL |
| | INTERVALS |
| | NO_EMPTY_RANGES |
| | CONTEXT_INDEP_ANCHORS |
| | CONTEXT_INDEP_OPS |
| | NO_BACKSLASH_BRACES |
| | NO_BACKSLASH_PARENS |
| | NO_BACKSLASH_VBAR |
| | UNMATCHED_RIGHT_PAREN_ORD |
| | BACKSLASH_ESCAPE_IN_LISTS |
+------------------------+----------------------------+
| GREP | BACKSLASH_PLUS_QM |
| | CHAR_CLASSES |
| | HAT_LISTS_NOT_NEWLINE |
| | INTERVALS |
| | NEWLINE_ALT |
+------------------------+----------------------------+
| EGREP | CHAR_CLASSES |
| | HAT_LISTS_NOT_NEWLINE |
| | NEWLINE_ALT |
| | CONTEXT_INDEP_ANCHORS |
| | CONTEXT_INDEP_OPS |
| | NO_BACKSLASH_PARENS |
| | NO_BACKSLASH_VBAR |
+------------------------+----------------------------+
| POSIX_EGREP | CHAR_CLASSES |
| | HAT_LISTS_NOT_NEWLINE |
| | NEWLINE_ALT |
| | CONTEXT_INDEP_ANCHORS |
| | CONTEXT_INDEP_OPS |
| | NO_BACKSLASH_PARENS |
| | NO_BACKSLASH_VBAR |
| | NO_BACKSLASH_BRACES |
| | INTERVALS |
+------------------------+----------------------------+
| SED | CHAR_CLASSES |
| | DOT_NEWLINE |
| | DOT_NOT_NULL |
| | INTERVALS |
| | NO_EMPTY_RANGES |
| | BACKSLASH_PLUS_QM |
+------------------------+----------------------------+
| POSIX_BASIC | CHAR_CLASSES |
| | DOT_NEWLINE |
| | DOT_NOT_NULL |
| | INTERVALS |
| | NO_EMPTY_RANGES |
| | BACKSLASH_PLUS_QM |
+------------------------+----------------------------+
| POSIX_MINIMAL_BASIC | CHAR_CLASSES |
| | DOT_NEWLINE |
| | DOT_NOT_NULL |
| | INTERVALS |
| | NO_EMPTY_RANGES |
| | LIMITED_OPS |
+------------------------+----------------------------+
| POSIX_EXTENDED | CHAR_CLASSES |
| | DOT_NEWLINE |
| | DOT_NOT_NULL |
| | INTERVALS |
| | NO_EMPTY_RANGES |
| | CONTEXT_INDEP_ANCHORS |
| | CONTEXT_INDEP_OPS |
| | NO_BACKSLASH_BRACES |
| | NO_BACKSLASH_PARENS |
| | NO_BACKSLASH_VBAR |
| | UNMATCHED_RIGHT_PAREN_ORD |
+------------------------+----------------------------+
| POSIX_MINIMAL_EXTENDED | CHAR_CLASSES |
| | DOT_NEWLINE |
| | DOT_NOT_NULL |
| | INTERVALS |
| | NO_EMPTY_RANGES |
| | CONTEXT_INDEP_ANCHORS |
| | CONTEXT_INVALID_OPS |
| | NO_BACKSLASH_BRACES |
| | NO_BACKSLASH_PARENS |
| | NO_BACKSLASH_REFS |
| | NO_BACKSLASH_VBAR |
| | UNMATCHED_RIGHT_PAREN_ORD |
+------------------------+----------------------------+
------------
BACKSLASH_ESCAPE_IN_LISTS
------------
If this feature is not set, then \ inside a bracket expression is
literal.
If set, then such a \ quotes the following character.
------------
BACKSLASH_PLUS_QM
------------
If this feature is not set, then + and ? are operators, and \+ and \? are
literals.
If set, then \+ and \? are operators and + and ? are literals.
------------
CHAR_CLASSES
------------
If this feature is set, then character classes are supported.
They are:
[:alpha:], [:upper:], [:lower:], [:digit:], [:alnum:], [:xdigit:],
[:space:], [:print:], [:punct:], [:graph:], and [:cntrl:].
If not set, then character classes are not supported.
------------
CONTEXT_INDEP_ANCHORS
------------
If this feature is set, then ^ and $ are always anchors (outside bracket
expressions, of course).
If this feature is not set, then it depends:
^ is an anchor if it is at the beginning of a regular
expression or after an open-group or an alternation operator;
$ is an anchor if it is at the end of a regular expression, or
before a close-group or an alternation operator.
This feature could be (re)combined with CONTEXT_INDEP_OPS, because
POSIX draft 11.2 says that * etc. in leading positions is undefined.
------------
CONTEXT_INDEP_OPS
------------
If this feature is set, then special characters are always special regardless
of where they are in the pattern.
If this feature is not set, then special characters are special only in some
contexts; otherwise they are ordinary. Specifically, * + ? and intervals
are only special when not after the beginning, open-group, or alternation operator.
------------
CONTEXT_INVALID_OPS
------------
If this feature is set, then *, +, ?, and { cannot be first in an RE or immediately
after an alternation or begin-group operator.
------------
DOT_NEWLINE
------------
If this feature is set, then . matches newline. If not set, then it does not.
------------
DOT_NOT_NULL
------------
If this feature is set, then . does not match NUL. If not set, then it does.
------------
HAT_LISTS_NOT_NEWLINE
------------
If this feature is set, nonmatching lists [^...] do not match newline.
If not set, they do.
------------
INTERVALS
------------
If this feature is set, either \{...\} or {...} defines an interval, depending
on NO_BACKSLASH_BRACES.
If not set, \{, \}, {, and } are literals.
------------
LIMITED_OPS
------------
If this feature is set, +, ? and | are not recognized as operators.
If not set, they are.
------------
NEWLINE_ALT
------------
If this feature is set, newline is an alternation operator.
If not set, newline is literal.
------------
NO_BACKSLASH_BRACES
------------
If this feature is set, then `{...}' defines an interval, and \{ and \} are literals.
If not set, then `\{...\}' defines an interval.
------------
NO_BACKSLASH_PARENS
------------
If this feature is set, (...) defines a group, and \( and \) are literals.
If not set, \(...\) defines a group, and ( and ) are literals.
------------
NO_BACKSLASH_REFS
------------
If this feature is set, then \<digit> matches <digit>. If not set, then \<digit>
is a back-reference.
------------
NO_BACKSLASH_VBAR
------------
If this feature is set, then | is an alternation operator, and \| is literal.
If not set, then \| is an alternation operator, and | is literal.
------------
NO_EMPTY_RANGES
------------
If this feature is set, then an ending range point collating higher than the starting
range point, as in [z-a], is invalid.
If not set, then when ending range point collates higher than the starting range
point, the range is ignored.
------------
UNMATCHED_RIGHT_PAREN_ORD
------------
If this feature is set, then an unmatched ) is ordinary.
If not set, then an unmatched ) is invalid.
**man-end**********************************************************************/