Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tcl core 8.5 with PCRE (and DFA) regular expressions engine #5

Open
wants to merge 29 commits into
base: core-8-5-branch
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
99af25f
pcre: applied andreas patch "268_tk85.patch_final.diff" [f40de304f799…
sebres Nov 14, 2017
ebe1515
fixed compiling without HAVE_PCRE (no matches variable in TSD in such…
sebres Nov 14, 2017
ec10c97
[win] Introduces nmake command line parameter ADDLINKOPTS, to specify…
sebres Nov 14, 2017
bdd7096
recognize explicit type of regexp engine used (avoid usage of default…
sebres Nov 14, 2017
ab53f1e
**interim commit** bug fixing: note ready
sebres Nov 14, 2017
11a7093
bugs fixed, code review, more backwards compatibility is constituted …
sebres Nov 15, 2017
d751b42
**interim commit** almost ready
sebres Nov 15, 2017
b102ffb
Bugs fixed, code review, backwards compatibility etc.
sebres Nov 16, 2017
0c61084
reactivate faster replacement of simple words (not really regexp) for…
sebres Nov 16, 2017
496d5c3
1st shot trying to implement DFA mode of PCRE
sebres Nov 16, 2017
f890912
DFA type usable, more improvements, like common storage for matches (…
sebres Nov 17, 2017
0a91b0d
DFA workspace vector (with reallocation) implemented with shared TSD …
sebres Nov 17, 2017
bfc8830
add pcre support for windows build (automake e. g. msys/mingw), also …
sebres Jun 29, 2021
acb4fcc
resolve several warnings
sebres Jun 29, 2021
4e9a8ae
merge remote branch 'fossil/core-8-5-branch' into core_8_5_pcre
sebres Jun 29, 2021
b72b844
tests/regexp.test: increase test coverage: added multi-byte utf capab…
sebres Jun 29, 2021
61189c7
fixes prce behavior on multi-byte utf-8 sequences (indices, start off…
sebres Jun 30, 2021
2898883
match byte-array as string (safe against shimmer, compatible to class…
sebres Jun 30, 2021
7ef01ba
code review; new flags for Tcl_RegExpExecObj and TclRegexp*, in TclRe…
sebres Jun 30, 2021
ed93f27
fixes regression (support of \uXXXX escape sequences), compile pcre w…
sebres Jun 30, 2021
1eb72fc
merge remote branch 'fossil/core-8-5-branch' into core_8_5_pcre
sebres Jun 30, 2021
51d743e
optimizes byte 2 char offset (rewritten as function, enlarged mapping…
sebres Jul 6, 2021
be70193
try to improve pcre regexp (re)allocate storage (offsets/matches) at …
sebres Oct 12, 2021
be2dc67
don't capture groups if not needed (no variables or -inline arguments…
sebres Oct 13, 2021
e516db6
regexp2.test - allow to include certain tests from regexp.test (filte…
sebres Oct 13, 2021
9daff73
use JIT-compile if possible; strange is that some REs become faster w…
sebres Oct 12, 2021
6ee1727
small fixes (if compiled without pcre) and code review
sebres Oct 18, 2021
659cde3
tests/interp.test: fixed test "bad type" for interp regexp
sebres May 31, 2022
ac04fd7
regexp: amend to pcre-support: forgotten shift flags in emitting of I…
sebres Sep 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 31 additions & 15 deletions generic/tcl.h
Original file line number Diff line number Diff line change
Expand Up @@ -545,27 +545,43 @@ typedef void (Tcl_ThreadCreateProc) _ANSI_ARGS_((ClientData clientData));
* Flag values passed to Tcl_GetRegExpFromObj.
*/

#define TCL_REG_BASIC 000000 /* BREs (convenience). */
#define TCL_REG_EXTENDED 000001 /* EREs. */
#define TCL_REG_ADVF 000002 /* Advanced features in EREs. */
#define TCL_REG_ADVANCED 000003 /* AREs (which are also EREs). */
#define TCL_REG_QUOTE 000004 /* No special characters, none. */
#define TCL_REG_NOCASE 000010 /* Ignore case. */
#define TCL_REG_NOSUB 000020 /* Don't care about subexpressions. */
#define TCL_REG_EXPANDED 000040 /* Expanded format, white space &
#define TCL_REG_BASIC 0x00000000 /* BREs (convenience). */
#define TCL_REG_EXTENDED 0x00000001 /* EREs. */
#define TCL_REG_ADVF 0x00000002 /* Advanced features in EREs. */
#define TCL_REG_ADVANCED 0x00000003 /* AREs (which are also EREs). */
#define TCL_REG_QUOTE 0x00000004 /* No special characters, none. */
#define TCL_REG_NOCASE 0x00000008 /* Ignore case. */
#define TCL_REG_NOSUB 0x00000010 /* Don't care about subexpressions. */
#define TCL_REG_EXPANDED 0x00000020 /* Expanded format, white space &
* comments. */
#define TCL_REG_NLSTOP 000100 /* \n doesn't match . or [^ ] */
#define TCL_REG_NLANCH 000200 /* ^ matches after \n, $ before. */
#define TCL_REG_NEWLINE 000300 /* Newlines are line terminators. */
#define TCL_REG_CANMATCH 001000 /* Report details on partial/limited
#define TCL_REG_NLSTOP 0x00000040 /* \n doesn't match . or [^ ] */
#define TCL_REG_NLANCH 0x00000080 /* ^ matches after \n, $ before. */
#define TCL_REG_NEWLINE 0x000000C0 /* Newlines are line terminators. */
#define TCL_REG_CANMATCH 0x00000200 /* Report details on partial/limited
* matches. */
#define TCL_REG_EXPLTYPE 0x10000000 /* Explicit type (avoid usage of
* default interp engine, mean it specified as parameter) */
#define TCL_REG_PCRE 0x20000000 /* Make sure it doesn't conflict with
* existing TCL_REG_* or PCRE_* bits */
#define TCL_REG_PCDFA 0x40000000 /* DFA variant of PCRE engine */

/* Following two macros used to supply TCL_REG_PCRE, TCL_REG_PCDFA and TCL_REG_EXPLTYPE
* to INST_REGEXP over one byte op (instead of first 3 bits, that currently never compiled
* e. g. TCL_REG_ADVANCED, that is always set in compiled variant) */
#define TCL_REG_COMPILE_SHIFT(v) ((v&~0x70000000)|((v>>28)&0x07))
#define TCL_REG_COMPILE_UNSHIFT(v) ((v&~0x07)|((v&0x07)<<28)|TCL_REG_ADVANCED)

/*
* Flags values passed to Tcl_RegExpExecObj.
* Flags values passed to Tcl_RegExpExecObj and TclRegexp*.
*/

#define TCL_REG_NOTBOL 0001 /* Beginning of string does not match ^. */
#define TCL_REG_NOTEOL 0002 /* End of string does not match $. */
#define TCL_REG_NOTBOL 0x00000001 /* Beginning of string does not match ^. */
#define TCL_REG_NOTEOL 0x00000002 /* End of string does not match $. */
#define TCL_REG_RETALL 0x00000010 /* Return all occurences (repeat as long as matches). */
#define TCL_REG_RETIDX 0x00000020 /* Return indices of matches (instead of strings). */
#define TCL_REG_DOINLINE 0x00000040 /* Return matches as a list (instead of placing in variables). */
#define TCL_REG_BYTEOFFS 0x01000000 /* Consider offsets in bytes instead of in chars (PCRE only) */


/*
* Structures filled in by Tcl_RegExpInfo. Note that all offset values are
Expand Down
7 changes: 7 additions & 0 deletions generic/tclBasic.c
Original file line number Diff line number Diff line change
Expand Up @@ -504,6 +504,13 @@ Tcl_CreateInterp(void)
iPtr->evalFlags = 0;
iPtr->scriptFile = NULL;
iPtr->flags = 0;
#ifdef HAVE_PCRE
#ifdef USE_DEFAULT_PCRE
if (getenv("TCL_REGEXP_CLASSIC") == NULL) { iPtr->flags |= INTERP_PCRE; }
#else
if (getenv("TCL_REGEXP_PCRE") != NULL) { iPtr->flags |= INTERP_PCRE; }
#endif
#endif
iPtr->tracePtr = NULL;
iPtr->tracesForbiddingInline = 0;
iPtr->activeCmdTracePtr = NULL;
Expand Down
Loading