Skip to content

Commit

Permalink
render to json (#4)
Browse files Browse the repository at this point in the history
* wip

* first pass

* Update types

* Fix image span rendering

* Add ci script

* Update ci script

* Update ci script

* Update ci script

* Updat ci script

* Updat ci script

* handle nesting structure

* Fix wikilink rendering

* Clean up

* Update types

* Update testings

* Upate readme

* Update readme

* Typo

* Update node type enum

* Fix code block parsing with `has_code_highlighter` option

* Fix unwasm export

* Clean up

* Fix typo
  • Loading branch information
ije authored Feb 19, 2024
1 parent c837fee commit 2303989
Show file tree
Hide file tree
Showing 10 changed files with 1,088 additions and 159 deletions.
40 changes: 40 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
name: CI

on:
push:
branches: [main]
pull_request:
branches: [main]

jobs:
test:
name: Testing
runs-on: ubuntu-latest

steps:
- name: Checkout
uses: actions/checkout@v3
with:
submodules: recursive

- name: Setup Zig
uses: goto-bus-stop/setup-zig@v2
with:
version: 0.11.0

- name: Setup Binaryen
uses: Aandreba/[email protected]
with:
token: ${{ secrets.GITHUB_TOKEN }}
version: 116

- name: Setup Deno
uses: denoland/setup-deno@main
with:
deno-version: v1.x

- name: Run `zig build`
run: zig build

- name: Run `deno test`
run: deno test -A
82 changes: 63 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,35 +4,38 @@ A **Markdown** renderer written in Zig & C, compiled to **WebAssymbly** for all
JS runtimes.

- **Compliance**: powered by [md4c](https://github.com/mity/md4c) that is fully
compliant to CommonMark 0.31, and supports partial GFM like task lists,
tables, etc.
- **Fast**: written in Zig & C, compiled to WebAssembly (it's about 2.5x faster than
markdown-it, see [benchmark](#benchmark)).
compliant to CommonMark 0.31, and partially supports GFM like task list,
table, etc.
- **Fast**: written in Zig & C, compiled to WebAssembly (it's about 2.5x faster
than markdown-it, see [benchmark](#benchmark)).
- **Small**: `~25KB` gzipped.
- **Simple**: zero dependencies, easy to use.
- **Streaming**: supports web streaming API for large markdown files.
- **Universal**: works in any JavaScript runtime (Node.js, Deno, Bun, Browsers,
Cloudflare Workers, etc.).
Cloudflare Workers, etc).

## Usage

```js
// npm i md4w (Node.js, Bun, Cloudflare Workers, etc.)
import { init, mdToHtml, mdToReadableHtml } from "md4w";
import { init, mdToHtml, mdToJSON, mdToReadableHtml } from "md4w";
// or use the CDN url (Deno, Browsers)
import { init, mdToHtml, mdToReadableHtml } from "https://esm.sh/md4w";
import { init, mdToHtml, mdToReadableHtml, mdToJSON } from "https://esm.sh/md4w";

// waiting for md4w.wasm...
await init();

// markdown -> HTML
const html = mdToHtml("# Hello, World!");
const html = mdToHtml("Stay _foolish_, stay **hungry**!");

// markdown -> HTML (ReadableStream)
const readable = mdToReadableHtml("# Hello, World!");
const readable = mdToReadableHtml("Stay _foolish_, stay **hungry**!");
const response = new Response(readable, {
headers: { "Content-Type": "text/html" },
});

// markdown -> JSON
const tree = mdToJSON("Stay _foolish_, stay **hungry**!");
```

## Parse Flags
Expand All @@ -49,13 +52,13 @@ By default, md4w uses the following parse flags:
You can use the `parseFlags` option to change the renderer behavior:

```ts
mdToHtml("# Hello, World!", {
parseFlags: {
DEFAULT: true,
NO_HTML: true,
LATEX_MATH_SPANS: true,
mdToHtml("Stay _foolish_, stay **hungry**!", {
parseFlags: [
"DEFAULT",
"NO_HTML",
"LATEX_MATH_SPANS",
// ... other parse flags
},
],
});
```

Expand Down Expand Up @@ -122,14 +125,13 @@ setCodeHighlighter((code, lang) => {

## Web Streaming API

md4w supports web streaming API for large markdown files, this also is useful for a
http server to stream the response.
md4w supports web streaming API for large markdown files, this also is useful
for a http server to stream the outputed html.

```js
import { mdToReadableHtml } from "md4w";

const largeMarkdown = `# Hello, World!\n`.repeat(1_000_000);
const readable = mdToReadableHtml(largeMarkdown);
const readable = mdToReadableHtml(readFile("large.md"));

// write to file
const file = await Deno.open("/foo/bar.html", { write: true, create: true });
Expand Down Expand Up @@ -160,6 +162,48 @@ mdToReadableHtml(largeMarkdown, {
The streaming API currently only uses the buffer for html output, you still need
to load the whole markdown data into memory.

## Rendering to JSON

md4w also provides a `mdToJSON` function to render the markdown to JSON.

```js
const traverse = (node) => {
if (typeof node === "string") {
// text node
console.log(node);
return;
}
// element type
console.log(node.type);
// element attributes (may be undefined)
console.log(node.props);
// element children (may be undefined)
node.children?.forEach(traverse);
};

const tree = mdToJSON("Stay _foolish_, stay **hungry**!");
traverse(tree);
```
### Node Type
The node type is a number that represents the type of the node. You can import
the `NodeType` enum to get the human-readable node type.
```ts
import { NodeType } from "md4w";

console.log(NodeType.P); // 9
console.log(NodeType.IMG); // 33

if (node.type === NodeType.IMG) {
console.log("This is an image node, `src` is", node.props.src);
}
```
> All available node types are defined in the
> [`NodeType` enum](./js/md4w.d.ts#L76).
## Development
The renderer is written in [Zig](https://ziglang.org/), ensure you have it
Expand Down
3 changes: 2 additions & 1 deletion js/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,8 @@ if (!fs.readFile) {
export async function init() {
const wasmURL = new URL("md4w.wasm", import.meta.url);
const wasmBytes = await fs.readFile(wasmURL);
initWasm(await WebAssembly.compile(wasmBytes));
const wasmModule = await WebAssembly.compile(wasmBytes);
initWasm(wasmModule);
}

export * from "./md4w.js";
108 changes: 108 additions & 0 deletions js/md4w.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,114 @@ export function mdToReadableHtml(
options?: Options,
): ReadableStream<Uint8Array>;

/**
* NodeType is a type of the markdown node.
*/
export enum NodeType {
QUOTE = 1,
UL = 2,
OL = 3,
LI = 4,
HR = 5,
CODE_BLOCK = 7,
HTML = 8,
P = 9,
TABLE = 10,
THEAD = 11,
TBODY = 12,
TR = 13,
TH = 14,
TD = 15,
H1 = 21,
H2 = 22,
H3 = 23,
H4 = 24,
H5 = 25,
H6 = 26,
EM = 30,
STRONG = 31,
A = 32,
IMG = 33,
CODE_SPAN = 34,
DEL = 35,
LATEXMATH = 36,
LATEXMATH_DISPLAY = 37,
WIKILINK = 38,
U = 39,
}

/**
* NodeProps is a type of the node properties.
*/
export type NodeProps<T> = Record<string, undefined> & T;

/**
* MDNode is a node in the markdown tree.
*/
export type MDNode = {
readonly type: Exclude<
number,
| NodeType.CODE_BLOCK
| NodeType.OL
| NodeType.LI
| NodeType.TH
| NodeType.TD
| NodeType.HR
| NodeType.A
| NodeType.IMG
| NodeType.WIKILINK
>;
readonly props?: Record<string, undefined>;
readonly children?: readonly (string | MDNode)[];
} | {
readonly type: NodeType.CODE_BLOCK;
readonly props?: NodeProps<{ lang: string }>;
readonly children: readonly string[];
} | {
readonly type: NodeType.OL;
readonly props?: NodeProps<{ start: number }>;
readonly children: readonly (string | MDNode)[];
} | {
readonly type: NodeType.LI;
readonly props?: NodeProps<{ isTask: boolean; done: boolean }>;
readonly children: readonly (string | MDNode)[];
} | {
readonly type: NodeType.TH | NodeType.TD;
readonly props: NodeProps<{ align: "left" | "center" | "right" | "" }>;
readonly children: readonly (string | MDNode)[];
} | {
readonly type: NodeType.HR;
readonly props: undefined;
readonly children: undefined;
} | {
readonly type: NodeType.A;
readonly props: NodeProps<{ href: string; title?: string }>;
readonly children: readonly (string | MDNode)[];
} | {
readonly type: NodeType.IMG;
readonly props: NodeProps<{ src: string; alt: string; title?: string }>;
readonly children: undefined;
} | {
readonly type: NodeType.WIKILINK;
readonly props: NodeProps<{ target: string }>;
readonly children: readonly (string | MDNode)[];
};

/**
* MDTree is a parsed markdown tree.
*/
export interface MDTree {
readonly children: MDNode[];
}

/**
* Converts markdown to json.
* @param {string | Uint8Array} input markdown input
* @param {Options} options parse options
* @returns {MDTree} json output
*/
export function mdToJSON(input: string | Uint8Array, options?: Options): MDTree;

/**
* Code highlighter interface.
*/
Expand Down
Loading

0 comments on commit 2303989

Please sign in to comment.