Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[for reference] all work done which is not in original repo #21

Open
wants to merge 426 commits into
base: master
Choose a base branch
from

Conversation

GerHobbelt
Copy link
Contributor

@GerHobbelt GerHobbelt commented Jan 31, 2017

Co-exists with zaach/jison#338.

Features / Fixes / Changes

…as a prelude to enabling `bitarray2set` to produce shorthand set representations *including* `\\p{NAME}` macros when `%options xregexp` has been turned on.
…h as `\d` for digits, instead of the raw regex set, such as `[0-9]`: 'shorthand notation' for special sets.
…y no-one (except one unit test, which has been removed)
… `\S`, `\s`, `\D`, `\d`, `\W` and `\w` regex escapes when the entire bitarray (set) is spanned is spanned by one of them, i.e. is the equivalent of one of these escapes.
…and `\s`, etc. regex escapes in the regex sets to help produce minimum size regex expressions in the lexer.
…ex set bitarray conversion routines to recognize PART of the bitset as a \pNAME or \W regex escape. Adjusted tests to match the current state of affairs.
…s where one or more sets are NEGATED, e.g. `[0-9]|[^a-z]` -- unit tests have been added / updated to check for this now.
…scape detection code works okay for non-inverted sets.
… for regex set minification: when that option is NOT set, we must not recognize \pNAME regex sets as matching/contained-in our current regex sets.
…n the end, we also check against the plain, unadulterated regex set expressions.
…ut(null)`. Also ensure that the lexer instance is fully initialized at construction time, i.e. `setInput()` is now *always* invoked as part of the contructor `function RegExpLexer(dict, input, tokens)` call: this results in more predictable ~ more reliable run-time code flow.
… `clear()` has already been invoked just above near the start of this `next()` call.
… added a test to ensure that any <<EOF>> lex rule only matches end-of-input *once*. (After that first match, any subsequent call to the `lex()` API must produce a plain EOF token (integer 1) without executing *any* <<EOF>> rule action code what-so-ever.

Note that the new test also checks if the lex compiler did indeed recognize the <<EOF>> token correctly and didn't mistake it erroneously for another match-this-literal-input-string rule!
…nized char (assuming the (custom) parseError doesn't throw an exception, i.e. we are *expected* to be able to continue after running into a lexer error.
…se it! No need to only do so once the *grammar parser* has observed this fact and has set up the `parseError` handler accordingly, i.e. has set up a custom `yy.parser.parseError` handler.

2. always make sure the *lexer* instance is the `this` for the `parseError` function being invoked: this should be so not just for the default `parseError` API provided by the lexer instance but also for the `yy.parseError` and `yy.parser.parseError` methods.

Note: precedence of these functions from high to low (the first one we encounter is the one being invoked for any lexer error):

- `yy.parser.parseError`
- `yy.parseError`
- `lexer.parseError`
…ror` construction as done in jison itself, including the use of the `printFunctionSourceCode` helper function and the choice when to stringify the array of source code lines which are produced by the construction *and* the internal system tests using `assert()`: these serve the same purpose as unit tests, but are much easier, in this case, to set up, write, and maintain.
… that the `this` will be the *parser* reference, i.e. `yy.parser`, rather than the current lexer reference. (This is fine because lexer errors are recognizable by their own error hash object layout: they *do* have a `lexer` member, but *do not* have a `parser` member!

2. added the `yy` member to the lexer error hash object passed to `parseError` to help advanced usage of this interface. This is behaviour identical to the behaviour already exhibited by the jison *parser*.

3. remove the superfluous lexer.yy initial setup as that one is now more adequately handled by the `setInput()` API / member function.
…re better off using the already existing patch tool for cli.js as well, rather than fixing the stuff in/for rollup. (Granted, it's a tad hacky, but it works)
…ge by testing whether enabling or disabling this option helps make the generated lexer pass the compile/exec test: when it does, the userland code must have failed to properly load the XRegExp module. (This is particularly relevant in ES6 generator mode as we currently don't have a jison which supports `%code imports %{...%}` feature yet.)
…r which contains a chunk of code which should be serialized as-is. The problem is compounded by the fact that when we tweak the code to actually serialize the content, the content has already been UNDESIRABLY REWRITTEN to define a function/class `XRegExp$$1` instead of `XRegExp` so we have to refrain from using straight JS code there and instead rewrite it in string format without the serialization trick. A pitty, alas.
…to ES5, etc. in the DIST directory + adjust the package module definitions accordingly.
…e 'external dependencies' ARE NOT included in the rollup but kept external. ==> jison-helpers-lib + lex-parser MUST NOT be included in the dist/ rollup library files for jison-lex!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants