[for reference] all work done which is not in original repo #21

GerHobbelt · 2017-01-31T14:16:57Z

Features / Fixes / Changes

[to be edited]
engine adds stop-gap check at lexer runtime to prevent user-programmer mistakes from causing obscure/hard-to-diagnose internal lexer crashes (Not able to use INITIAL and an exclusive state #19) - see SHA-1: 57dd804 (Check whether a sane condition has been pushed before: this makes the lexer robust against user-programmer bugs such as Not able to use INITIAL and an exclusive state #19)
fixes:
- startConditions error #9 + fix undefined start conditions error : https://github.com/zaach/jison-le... #10 (undefined start conditions)
- Simplify simple return statements. #11 (faster exec of simple return statements via lookup table)
- Pass command line options to underlying lexer. #15 (CLI to lexer options: now its own CLI and jison CLI pass the options (filtered) to the lexer)
- Lexer should pass character position to parseError #16 (no input location info passed to error: now lexer parseError receives a hash info object with location info and other context at the time of error, just like we already did in the parser)
- Lexer should pass character position to parseError #16 + Changed lex() from recursive to iterative, preventing Maximum Call Stack errors. #17 (recursive lex() call === stack overflow crash)
- Not able to use INITIAL and an exclusive state #19
- Parse error when using arrow function in rules #23 (feed the API lexer rules' action code in the form of JavaScript arrow functions instead of classic JavaScript functions (function () { return 'TOKEN'; }) or action code string ('return "TOKEN";')

…as a prelude to enabling `bitarray2set` to produce shorthand set representations *including* `\\p{NAME}` macros when `%options xregexp` has been turned on.

…h as `\d` for digits, instead of the raw regex set, such as `[0-9]`: 'shorthand notation' for special sets.

…y no-one (except one unit test, which has been removed)

… `\S`, `\s`, `\D`, `\d`, `\W` and `\w` regex escapes when the entire bitarray (set) is spanned is spanned by one of them, i.e. is the equivalent of one of these escapes.

…and `\s`, etc. regex escapes in the regex sets to help produce minimum size regex expressions in the lexer.

…ex set bitarray conversion routines to recognize PART of the bitset as a \pNAME or \W regex escape. Adjusted tests to match the current state of affairs.

…s/escapes in sets.

…s where one or more sets are NEGATED, e.g. `[0-9]|[^a-z]` -- unit tests have been added / updated to check for this now.

…scape detection code works okay for non-inverted sets.

…t tests accordingly.

… for regex set minification: when that option is NOT set, we must not recognize \pNAME regex sets as matching/contained-in our current regex sets.

…n the end, we also check against the plain, unadulterated regex set expressions.

…ut(null)`. Also ensure that the lexer instance is fully initialized at construction time, i.e. `setInput()` is now *always* invoked as part of the contructor `function RegExpLexer(dict, input, tokens)` call: this results in more predictable ~ more reliable run-time code flow.

… `clear()` has already been invoked just above near the start of this `next()` call.

… added a test to ensure that any <<EOF>> lex rule only matches end-of-input *once*. (After that first match, any subsequent call to the `lex()` API must produce a plain EOF token (integer 1) without executing *any* <<EOF>> rule action code what-so-ever. Note that the new test also checks if the lex compiler did indeed recognize the <<EOF>> token correctly and didn't mistake it erroneously for another match-this-literal-input-string rule!

…nized char (assuming the (custom) parseError doesn't throw an exception, i.e. we are *expected* to be able to continue after running into a lexer error.

…se it! No need to only do so once the *grammar parser* has observed this fact and has set up the `parseError` handler accordingly, i.e. has set up a custom `yy.parser.parseError` handler. 2. always make sure the *lexer* instance is the `this` for the `parseError` function being invoked: this should be so not just for the default `parseError` API provided by the lexer instance but also for the `yy.parseError` and `yy.parser.parseError` methods. Note: precedence of these functions from high to low (the first one we encounter is the one being invoked for any lexer error): - `yy.parser.parseError` - `yy.parseError` - `lexer.parseError`

…ror` construction as done in jison itself, including the use of the `printFunctionSourceCode` helper function and the choice when to stringify the array of source code lines which are produced by the construction *and* the internal system tests using `assert()`: these serve the same purpose as unit tests, but are much easier, in this case, to set up, write, and maintain.

… that the `this` will be the *parser* reference, i.e. `yy.parser`, rather than the current lexer reference. (This is fine because lexer errors are recognizable by their own error hash object layout: they *do* have a `lexer` member, but *do not* have a `parser` member! 2. added the `yy` member to the lexer error hash object passed to `parseError` to help advanced usage of this interface. This is behaviour identical to the behaviour already exhibited by the jison *parser*. 3. remove the superfluous lexer.yy initial setup as that one is now more adequately handled by the `setInput()` API / member function.

…re better off using the already existing patch tool for cli.js as well, rather than fixing the stuff in/for rollup. (Granted, it's a tad hacky, but it works)

…ge by testing whether enabling or disabling this option helps make the generated lexer pass the compile/exec test: when it does, the userland code must have failed to properly load the XRegExp module. (This is particularly relevant in ES6 generator mode as we currently don't have a jison which supports `%code imports %{...%}` feature yet.)

…r which contains a chunk of code which should be serialized as-is. The problem is compounded by the fact that when we tweak the code to actually serialize the content, the content has already been UNDESIRABLY REWRITTEN to define a function/class `XRegExp$$1` instead of `XRegExp` so we have to refrain from using straight JS code there and instead rewrite it in string format without the serialization trick. A pitty, alas.

…to ES5, etc. in the DIST directory + adjust the package module definitions accordingly.

…more.

… rollup tree shaking.

…he ES6 code generator migration

…lopment utility scripts

…s this is the secondary source repo!)

…e 'external dependencies' ARE NOT included in the rollup but kept external. ==> jison-helpers-lib + lex-parser MUST NOT be included in the dist/ rollup library files for jison-lex!

…ADME's

GerHobbelt added 30 commits November 10, 2016 20:50

typo fix in comments

144c2f3

refactoring: cache \\p{NAME} bitarray expansions; this also serves …

9917bef

…as a prelude to enabling `bitarray2set` to produce shorthand set representations *including* `\\p{NAME}` macros when `%options xregexp` has been turned on.

refactoring: enable bitarray2set() to produce a regex 'escape', suc…

fda6af3

…h as `\d` for digits, instead of the raw regex set, such as `[0-9]`: 'shorthand notation' for special sets.

refactoring: next step towards recognizing the sets for various escapes

357bef4

refactoring: next step: in_inv_set in macro lookup table was used b…

002d0fc

…y no-one (except one unit test, which has been removed)

refactoring / optimization: now the bitarray operations recognize the…

c4c2cb0

… `\S`, `\s`, `\D`, `\d`, `\W` and `\w` regex escapes when the entire bitarray (set) is spanned is spanned by one of them, i.e. is the equivalent of one of these escapes.

refactoring / optimization: step towards recognizing \pNAME pcodes …

80635ab

…and `\s`, etc. regex escapes in the regex sets to help produce minimum size regex expressions in the lexer.

refactoring / optimizing: fixing bugs and moving towards allowing reg…

124b022

…ex set bitarray conversion routines to recognize PART of the bitset as a \pNAME or \W regex escape. Adjusted tests to match the current state of affairs.

refactoring / regex minification: next bit of work to recognize pcode…

f80a61d

…s/escapes in sets.

bugfix for faulty processing of lexer macros which OR-merge regex set…

938cfab

…s where one or more sets are NEGATED, e.g. `[0-9]|[^a-z]` -- unit tests have been added / updated to check for this now.

refactoring / regex set minification: next step done: now the pcode/e…

a09bc28

…scape detection code works okay for non-inverted sets.

refactoring: removed dead code

432a6f7

added regex set minification code for inverted sets; adjusted the uni…

a268dd7

…t tests accordingly.

removed debug code

54ef8ae

watch for the %options xregexp setting when we go and set things up…

3af0015

… for regex set minification: when that option is NOT set, we must not recognize \pNAME regex sets as matching/contained-in our current regex sets.

As some pcode/escapes still happen to deliver a LARGER regex string i…

cdc13d1

…n the end, we also check against the plain, unadulterated regex set expressions.

regenerated library + bumped build revision

5e3cd17

bump revision & regenerate library files

0f83b04

bump revision & regenerated library files

feb6c10

bump build revision & rebuild

6e0f5e4

fix typo in comments

3faafc1

more comment typo fixes

1e17d3a

separate assignments for better code readability.

b3d4bd8

better code: removed superfluous clear() call in next() EOF path:…

e2e9b5c

… `clear()` has already been invoked just above near the start of this `next()` call.

test if lexer continues correctly after having encountered an unrecog…

f82406c

…nized char (assuming the (custom) parseError doesn't throw an exception, i.e. we are *expected* to be able to continue after running into a lexer error.

GerHobbelt added 30 commits October 12, 2017 23:18

migrate cli.js to ES6: as rollup has trouble loading package.json we'…

1a6c128

…re better off using the already existing patch tool for cli.js as well, rather than fixing the stuff in/for rollup. (Granted, it's a tad hacky, but it works)

migrating regexp-set-management.js to ES6

776de81

migrating regexp-lexer.js to ES6

22c60f2

add the config files and make commands to compile ES6 generator code …

0ee83e2

…to ES5, etc. in the DIST directory + adjust the package module definitions accordingly.

include the generated DIST library files.

85aa998

fix pointer to binary/CLI file

2bfd7f0

fix: node/cli hashbang prelude patcher build utility: no crashing no …

fa0366c

…more.

rebuilt library files

6251b00

remove superfluous variable declarations which were uncovered through…

89b9124

… rollup tree shaking.

sync changes with jison monorepo: corrections and augmentations for t…

96c7298

…he ES6 code generator migration

sync with jison monorepo changes: updated version and npm-ignore deve…

ea98e67

…lopment utility scripts

synchronized with monorepo JISON

0fc0154

updated NPM packages

ad09f62

updated NPM packages and regenerated library files

fba8c83

added deprecation/secondary-source notice to README

7ddc6f3

sync README

7b208ce

prevent npm publish from succeeding (that would be VERY undesirable a…

3e2d58b

…s this is the secondary source repo!)

all rollup.config.js files should define the same rollup process wher…

06090c0

…e 'external dependencies' ARE NOT included in the rollup but kept external. ==> jison-helpers-lib + lex-parser MUST NOT be included in the dist/ rollup library files for jison-lex!

removed dangerous make targets & rebuilt library files

32e6a30

make everything

3251d59

bumped build revision

e7975e9

sync

0e807ea

sync + added/updated badges for all jison modules in their related RE…

24f37da

…ADME's

bumped build revision + sync

fd08b55

obsoleted. point at the jison monorepo.

25d92a2

sync incl. README fix

5ca5e0f

reference the correct npm package to show the active version

cf85210

update mention how to install and how to get at the API

495347c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[for reference] all work done which is not in original repo #21

[for reference] all work done which is not in original repo #21

GerHobbelt commented Jan 31, 2017 •

edited

Loading

[for reference] all work done which is not in original repo #21

Are you sure you want to change the base?

[for reference] all work done which is not in original repo #21

Conversation

GerHobbelt commented Jan 31, 2017 • edited Loading

Features / Fixes / Changes

GerHobbelt commented Jan 31, 2017 •

edited

Loading