Unicode support for keywords #133

stuchl4n3k · 2019-09-25T10:03:42Z

Since /u is supported now, is there some convenient way to define a rule using an array of keywords with unicode enabled? Sth. like:

const keywords = ['foo', 'bar'];
moo.compile({
   KEY: {
      match: keywords, 
      type: moo.keywords({KEY: keywords}), 
      unicode: true,
   },
});

In my understanding moo.keywords in the unicode scenario only work if the "match" is a pattetrn with an /u flag.

The text was updated successfully, but these errors were encountered:

nathan · 2019-09-25T19:03:06Z

moo.keywords only works properly when you use it on a matcher that matches anything that could be a word—not just keywords. For example, this lexer doesn't work the way you seem to expect it to:

const moo = require('moo')

const KW = ['ban', 'this']
const lexer = moo.compile({
  kw: {match: KW, type: moo.keywords({kw: KW})},
  w: /[A-Za-z_][\w]*/,
  ws: / +/,
})
lexer.reset('banana ban')
lexer.next() // {type: 'kw', value: 'ban'}
lexer.next() // {type: 'w', value: 'ana'}

The normal use case for moo.keywords looks like this:

const moo = require('moo')

const KW = ['ban', 'this']
const lexer = moo.compile({
  w: {match: /[A-Za-z_][\w]*/, type: moo.keywords({kw: KW})},
  ws: / +/,
})
lexer.reset('banana ban')
lexer.next() // {type: 'w', value: 'banana'}
lexer.next() // {type: 'ws', value: ' '}
lexer.next() // {type: 'kw', value: 'ban'}

It actually works fine with Unicode as-is:

const moo = require('moo')

const KW = ['η', 'ο', 'το', 'οι', 'τα']
const lexer = moo.compile({
  w: {match: /\p{XIDS}\p{XIDC}*/u, type: moo.keywords({kw: KW})},
  ws: {match: /\p{WSpace}+/u, lineBreaks: true},
})
lexer.reset('η ηθική')
lexer.next() // {type: 'kw', value: 'η'}
lexer.next() // {type: 'ws', value: ' '}
lexer.next() // {type: 'w', value: 'ηθική'}

We also already allow string literal and array matches to be combined with /u regular expressions, so I'm not sure what you're asking for here.

(Some of these changes haven't been published to npm yet [@tjvr]; maybe that's where the confusion is coming from?)

stuchl4n3k · 2019-09-25T20:10:48Z

Thank nathan, after seeing the first two examples it became much clearer.

Regarding the array match combined with /u - I haven't found that in the doc nor in the tests.

nathan · 2019-09-26T01:31:28Z

I haven't found that in the doc nor in the tests.

We should probably have a test for that. The /u tests are a bit sparse at the moment.

agorischek · 2019-09-26T16:38:15Z

When’s the next npm publish planned?

tjvr · 2019-09-29T18:31:32Z

I've published 0.5.1. 👍

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode support for keywords #133

Unicode support for keywords #133

stuchl4n3k commented Sep 25, 2019

nathan commented Sep 25, 2019 •

edited

Loading

stuchl4n3k commented Sep 25, 2019

nathan commented Sep 26, 2019

agorischek commented Sep 26, 2019

tjvr commented Sep 29, 2019

Unicode support for keywords #133

Unicode support for keywords #133

Comments

stuchl4n3k commented Sep 25, 2019

nathan commented Sep 25, 2019 • edited Loading

stuchl4n3k commented Sep 25, 2019

nathan commented Sep 26, 2019

agorischek commented Sep 26, 2019

tjvr commented Sep 29, 2019

nathan commented Sep 25, 2019 •

edited

Loading