Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Use Oniguruma-To-ES in the JS engine (#828) #832

Merged
merged 10 commits into from
Nov 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,8 @@ jobs:
node: [lts/*]
os: [ubuntu-latest, windows-latest, macos-latest]
include:
- node: 20.x
os: ubuntu-latest
- node: 18.x
os: ubuntu-latest
fail-fast: false
Expand Down
28 changes: 21 additions & 7 deletions docs/guide/regex-engines.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ outline: deep

# RegExp Engines

TextMate grammars is based on regular expressions to match tokens. Usually, we use [Oniguruma](https://github.com/kkos/oniguruma) (a regular expression engine written in C) to parse the grammar. To make it work in JavaScript, we compile Oniguruma to WebAssembly to run in the browser or Node.js.
TextMate grammars are based on regular expressions to match tokens. Usually, we use [Oniguruma](https://github.com/kkos/oniguruma) (a regular expression engine written in C) to parse the grammar. To make it work in JavaScript, we compile Oniguruma to WebAssembly to run in the browser or Node.js.

Since v1.15, we expose the ability to for users to switch the RegExp engine and provide custom implementations.

Expand All @@ -20,7 +20,7 @@ const shiki = await createShiki({
})
```

Shiki come with two built-in engines:
Shiki comes with two built-in engines:

## Oniguruma Engine

Expand All @@ -43,7 +43,7 @@ const shiki = await createShiki({
This feature is experimental and may change without following semver.
:::

This experimental engine uses JavaScript's native RegExp. As TextMate grammars' regular expressions are in Oniguruma flavor that might contains syntaxes that are not supported by JavaScript's RegExp, we use [`oniguruma-to-js`](https://github.com/antfu/oniguruma-to-js) to lowering the syntaxes and try to make them compatible with JavaScript's RegExp.
This engine uses JavaScript's native RegExp. As regular expressions used by TextMate grammars are written for Oniguruma, they might contain syntax that is not supported by JavaScript's RegExp, or expect different behavior for the same syntax. So we use [Oniguruma-To-ES](https://github.com/slevithan/oniguruma-to-es) to transpile Oniguruma patterns to native JavaScript RegExp.

```ts {2,4,9}
import { createHighlighter } from 'shiki'
Expand All @@ -60,17 +60,31 @@ const shiki = await createHighlighter({
const html = shiki.codeToHtml('const a = 1', { lang: 'javascript', theme: 'nord' })
```

Please check the [compatibility table](/references/engine-js-compat) to check the support status of the languages you are using.
Please check the [compatibility table](/references/engine-js-compat) for the support status of the languages you are using.

If mismatches are acceptable and you want it to get results whatever it can, you can enable the `forgiving` option to suppress any errors happened during the conversion:
Unlike the Oniguruma engine, the JavaScript engine is strict by default. It will throw an error if it encounters a pattern that it cannot convert. If mismatches are acceptable and you want best-effort results whenever possible, you can enable the `forgiving` option to suppress any errors that happened during the conversion:

```ts
const jsEngine = createJavaScriptRegexEngine({ forgiving: true })
// ...use the engine
```

::: info
If you runs Shiki on Node.js (or at build time), we still recommend using the Oniguruma engine for the best result, as most of the time bundle size or WebAssembly support is not a concern.
If you run Shiki on Node.js (or at build time) and bundle size or WebAssembly support is not a concern, we still recommend using the Oniguruma engine for the best result.

The JavaScript engine is more suitable for running in the browser in some cases that you want to control the bundle size.
The JavaScript engine is best when running in the browser and in cases when you want to control the bundle size.
:::

### JavaScript Runtime Target

For the most accurate result, [Oniguruma-To-ES](https://github.com/slevithan/oniguruma-to-es) requires the [RegExp `v` flag support](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicodeSets), which is available in Node.js v20+ and ES2024 ([Browser compatibility](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicodeSets#browser_compatibility)).

For older environments, it can simulate the behavior but `u` flag but might yield less accurate results.

By default, it automatically detects the runtime target and uses the appropriate behavior. You can override this behavior by setting the `target` option:

```ts
const jsEngine = createJavaScriptRegexEngine({
target: 'ES2018', // or 'ES2024', default is 'auto'
})
```
Loading
Loading