Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

πŸš€πŸš€πŸš€ Transformers.js V3 πŸš€πŸš€πŸš€ #545

Merged
merged 498 commits into from
Oct 18, 2024
Merged
Show file tree
Hide file tree
Changes from 47 commits
Commits
Show all changes
498 commits
Select commit Hold shift + click to select a range
0dba266
Early dereferencing for performance boosts
xenova Jul 2, 2024
5e4e20f
cleanup
xenova Jul 2, 2024
dd6af93
Move quantization logic to `quantize.py`
xenova Jul 3, 2024
04af3d5
update deps
xenova Jul 3, 2024
9128651
Fix q4 quantization
xenova Jul 3, 2024
83cbb21
save q4 quantization
xenova Jul 4, 2024
eb61344
Add decode ASR test
xenova Jul 4, 2024
cec2400
Do not process last chunk unnecessarily
xenova Jul 4, 2024
c835b54
fp16 disable_shape_infer if model is too large
xenova Jul 4, 2024
45cd8d4
Use `check_and_save_model` for saving fp16 model
xenova Jul 4, 2024
88f3e44
Reorder functions
xenova Jul 4, 2024
23440f0
formatting
xenova Jul 4, 2024
b411e9f
Remove debug log
xenova Jul 4, 2024
04a334a
Fix q8 quantization for models > 2GB
xenova Jul 4, 2024
cd1ea69
correct attribute
xenova Jul 4, 2024
a167f6e
Fix `TextGenerationPipeline`
xenova Jul 4, 2024
ea73289
Fix pauses in whisper word-level timestamps
xenova Jul 4, 2024
344af32
Formatting
xenova Jul 4, 2024
c305c38
Sort added tokens by length to avoid early partial matches
xenova Jul 5, 2024
d6f6fd4
Add new tokenizer test
xenova Jul 8, 2024
1557b8d
Only finish with newline if running in Node.js
xenova Jul 8, 2024
9ac7ceb
Consider token timestamps when selecting longest common sequence
xenova Jul 9, 2024
79ed46e
Create whisper word-level timestamps demo
xenova Jul 10, 2024
8da6886
cleanup
xenova Jul 10, 2024
d709bd0
Fallback to WASM if WebGPU not supported
xenova Jul 10, 2024
9ef3a6d
Reload model for each quantization mode
xenova Jul 12, 2024
9787b75
Update converstion script requirements
xenova Jul 12, 2024
974f086
Separate IO and Quantization args
xenova Jul 12, 2024
d042868
Use `const` where possible
xenova Jul 16, 2024
1b4d242
Add `InterruptableStoppingCriteria`
xenova Jul 16, 2024
31101c8
`@xenova/transformers` -> `@huggingface/transformers`
xenova Jul 17, 2024
e84322b
Override semver version
xenova Jul 17, 2024
bd94334
Add support for pyannote models
xenova Jul 17, 2024
3dbc633
Update README.md
xenova Jul 17, 2024
858e55d
Add listed support for pyannote
xenova Jul 17, 2024
8bf0349
Add pyannote example code
xenova Jul 17, 2024
c52618c
Support specifying `min_num_frames`
xenova Jul 17, 2024
96f19b0
Support simultaneous instantiation of multiple inference sessions
xenova Jul 20, 2024
4ad43e2
Support broadcasting encoder outputs over decoder inputs
xenova Jul 22, 2024
c6aeb4b
Fix test
xenova Jul 22, 2024
6d3ea4b
fix bundler config for latest ORT
fs-eire Jul 25, 2024
38a3bf6
Only check fp16 support for webgpu device
xenova Jul 29, 2024
9df84c4
Remove default chat templates
xenova Aug 7, 2024
fc3d860
Add support for gemma2
xenova Aug 7, 2024
939920d
Add gemma2 generation test
xenova Aug 7, 2024
5bb93a0
Update gemma2 config mapping
xenova Aug 7, 2024
72ec168
Prioritize high-performance adapter when possible
xenova Aug 7, 2024
9068a53
Set defaults for `tools` and `documents` in `apply_chat_template`
xenova Aug 7, 2024
824538b
bump `@huggingface/jinja` -> 0.3.0
xenova Aug 7, 2024
836c0af
Add `apply_chat_template` default parameters unit test
xenova Aug 7, 2024
487d8b2
Merge branch 'v3' into @huggingface/transformers
xenova Aug 7, 2024
1f6e0e1
Add prettier
xenova Aug 7, 2024
55494d1
prettier format config files
xenova Aug 7, 2024
5a68461
remove incorrect comment
xenova Aug 7, 2024
437cb34
Merge branch 'pr/864' into @huggingface/transformers
xenova Aug 7, 2024
5a6c926
Update onnxruntime-web version
xenova Aug 7, 2024
b19251b
Update webpack.config.js
xenova Aug 7, 2024
820c1e2
Fix copy path
xenova Aug 7, 2024
b0dab91
Run `npm ci`
xenova Aug 7, 2024
86b9b62
Fix bundling
xenova Aug 7, 2024
222b94e
Do not set `preferredOutputLocation` if we are proxying
xenova Aug 7, 2024
b326cc9
Merge branch 'v3' into @huggingface/transformers
xenova Aug 7, 2024
ca67092
Update `@webgpu/types`
xenova Aug 7, 2024
42076fd
Update SAM example
xenova Aug 7, 2024
48d3142
Use `??=` operator where possible
xenova Aug 7, 2024
3b1a4fd
Fix commonjs usage
xenova Aug 8, 2024
9a73b5e
Mark `onnxruntime-node` and `sharp` as externals
xenova Aug 8, 2024
9951aa5
Move `externals` into config
xenova Aug 8, 2024
c04d37e
Downgrade to onnxruntime 1.18.0
xenova Aug 8, 2024
d32fe2b
Finalize module/commonjs build
xenova Aug 8, 2024
1530d50
Separate web and node builds
xenova Aug 8, 2024
b4df0e2
[version] Update to 3.0.0-alpha.1
xenova Aug 8, 2024
ab59c51
Default to CDN-hosted .wasm files
xenova Aug 8, 2024
866b219
[version] Update to 3.0.0-alpha.2
xenova Aug 8, 2024
4a3398d
bump versions
xenova Aug 8, 2024
8891a14
[version] Update to 3.0.0-alpha.3
xenova Aug 8, 2024
a315933
Merge branch 'improve-conversion-script' into v3
xenova Aug 8, 2024
12569b8
Consolidate conversion and quantization script
xenova Aug 9, 2024
83f5718
Downgrade `onnxconverter-common`
xenova Aug 9, 2024
6fa5fa6
Link to types in exports
xenova Aug 9, 2024
2f1b210
Update list of supported tasks
xenova Aug 10, 2024
27bc55d
Fixed unit tests
xenova Aug 10, 2024
23d1150
Update imports
xenova Aug 10, 2024
f9070dc
Bump versions to `3.0.0-alpha.4`
xenova Aug 10, 2024
c3494e1
[version] Update to 3.0.0-alpha.4
xenova Aug 10, 2024
973fb0d
Fix "Default condition should be last one"
xenova Aug 12, 2024
7376ecf
Bump versions
xenova Aug 12, 2024
0a04bc0
[version] Update to 3.0.0-alpha.5
xenova Aug 12, 2024
e4603cd
Update next.js client-side demo
xenova Aug 12, 2024
ff1853c
Initial WebNN Support
ibelem Aug 14, 2024
15574bc
Mark fs, path and url as external packages for node build
xenova Aug 15, 2024
7282862
Move content type map outside of `FileResponse` object
xenova Aug 15, 2024
22f7ced
Add GPU support for Node.js
xenova Aug 15, 2024
1e319a4
Bump versions
xenova Aug 15, 2024
d278891
[version] Update to 3.0.0-alpha.6
xenova Aug 15, 2024
3fefa17
Fix conflicts
ibelem Aug 16, 2024
fa6cc70
bump dependency versions
xenova Aug 16, 2024
7fa5326
Add support for device auto-detection
xenova Aug 16, 2024
4ec77c1
Fix default device selection
xenova Aug 16, 2024
5799e30
Merge branch 'pr/ibelem/890-1' into v3
xenova Aug 16, 2024
5b2cac2
Improve WebNN selection
xenova Aug 17, 2024
ad23c50
Skip token callback if `skip_prompt` is set
xenova Aug 17, 2024
5b84b62
Bump versions
xenova Aug 19, 2024
bcf6a86
[version] Update to 3.0.0-alpha.7
xenova Aug 19, 2024
b97ed0d
bump versions
xenova Aug 21, 2024
c5b7083
[version] Update to 3.0.0-alpha.8
xenova Aug 21, 2024
cbeefde
bump versions
xenova Aug 23, 2024
59600f2
[version] Update to 3.0.0-alpha.9
xenova Aug 23, 2024
b2e025a
Add support for Sapiens
xenova Aug 27, 2024
8661d95
Update default ONNX env
xenova Aug 27, 2024
57db34d
Fix types
xenova Aug 27, 2024
1b7f978
Topologically sort fp16 nodes
xenova Aug 27, 2024
45d1526
Add marian unit test
xenova Aug 27, 2024
b903757
Re-order imports
xenova Aug 27, 2024
633976f
Fix `NoBadWordsLogitsProcessor`
xenova Aug 27, 2024
24d8787
Update package.json
xenova Aug 27, 2024
9412ec4
[jest] Disable coverage
xenova Aug 27, 2024
08e7388
Bump versions
xenova Aug 27, 2024
d5a8f87
[version] Update to 3.0.0-alpha.10
xenova Aug 27, 2024
7843ad0
Improve node/web interoperability
xenova Aug 28, 2024
bf093ae
Fix scripts/requirements.txt
xenova Aug 28, 2024
9a5ee42
Bump versions
xenova Aug 28, 2024
535cdfe
[version] Update to 3.0.0-alpha.11
xenova Aug 28, 2024
4e1acf0
Add support for JAIS models (#906)
xenova Aug 28, 2024
488548d
Add JAIS to README
xenova Aug 28, 2024
13aed41
Fix node/web interop (again)
xenova Aug 28, 2024
7655f81
Bump versions
xenova Aug 28, 2024
1c7e226
[version] Update to 3.0.0-alpha.12
xenova Aug 28, 2024
ab6b28b
Set `SapiensForNormalEstimation` to encoder-only
xenova Aug 28, 2024
66c05d5
Implement `sub` tensor operation
xenova Aug 28, 2024
31e8b2a
Bump versions
xenova Aug 28, 2024
bf3f7d5
[version] Update to 3.0.0-alpha.13
xenova Aug 28, 2024
c025356
Improve typing for `wrap` helper function
xenova Aug 28, 2024
7ebdaf2
Update `preferredOutputLocation` type
xenova Aug 28, 2024
3b8ddcb
Make `wrap` type more generic
xenova Aug 28, 2024
a385c6e
Re-use `segmentation_data`
xenova Aug 28, 2024
537e958
Fix `min` type
xenova Aug 28, 2024
bcb28b3
Add support for Hiera models
xenova Aug 29, 2024
d21c87c
Fix reused loop variable (closes #910)
xenova Aug 30, 2024
1d281f6
Add logits processor test file
xenova Aug 30, 2024
ba0427f
Fix test imports
xenova Aug 30, 2024
3bc3e86
Bump versions
xenova Aug 30, 2024
0518960
[version] Update to 3.0.0-alpha.14
xenova Aug 30, 2024
552cdea
Add another `bad_words` logits processor test (closes #913)
xenova Aug 30, 2024
3422a8b
Add support for GroupViT
xenova Aug 30, 2024
3599902
Add zero-shot-image-classification unit test
xenova Aug 30, 2024
5892ee8
Add maskformer model definitions
xenova Aug 30, 2024
c4dac77
Support universal image segmentation in `image-segmentation` pipeline
xenova Aug 30, 2024
f0c47be
Add support for PVT models
xenova Aug 30, 2024
d80d3a4
Add `post_process_instance_segmentation` function template
xenova Aug 30, 2024
844099d
Add `library_name` option to convert.py
xenova Sep 2, 2024
ba5d725
Wrap onnxslim with try block
xenova Sep 2, 2024
b3691c8
Use const where possible
xenova Sep 2, 2024
dcf117f
Use const where possible (again)
xenova Sep 2, 2024
9af026c
Create `MaskFormerFeatureExtractor`
xenova Sep 2, 2024
0f8200c
Add support for MaskFormer
xenova Sep 2, 2024
e278c5e
Improve tool-use chat template detection
xenova Sep 2, 2024
83fa58f
Add object detection pipeline unit test
xenova Sep 2, 2024
86d6da4
Add support for ViTMSN and VitMAE
xenova Sep 2, 2024
93b25fb
Bump ORT versions
xenova Sep 7, 2024
2f680ee
Create `get_chat_template` helper function
xenova Sep 7, 2024
2f9b2ed
Fix CI
xenova Sep 9, 2024
deec350
Run prettier on `tests/**`
xenova Sep 9, 2024
48fa226
move certain tests to utils subfolder
xenova Sep 9, 2024
a10828f
Bump onnxruntime-web version
xenova Sep 9, 2024
ba58ea2
Bump `onnxruntime==1.19.2` in scripts/requirements.txt
xenova Sep 9, 2024
4f17e95
Merge branch 'main' into v3
xenova Sep 9, 2024
c40a151
Merge branch 'main' into v3
xenova Sep 9, 2024
30315b2
Sort `this.added_tokens` before creating regex (`.toSorted` is not av…
xenova Sep 9, 2024
d7df575
Rather make a copy of `this.added_tokens`
xenova Sep 9, 2024
a519379
Fix `.tokenize` with `fuse_unk=true`
xenova Sep 9, 2024
89ddccf
Add blenderbot tokenizer tests
xenova Sep 9, 2024
36ad144
Add t5 tokenizer tests
xenova Sep 9, 2024
4765dd6
Add falcon tokenizer tests
xenova Sep 10, 2024
fd8b9a2
Run prettier
xenova Sep 10, 2024
710816e
Add ESM tokenizer tests
xenova Sep 10, 2024
0d3cd30
Run unit tests in parallel
xenova Sep 10, 2024
cc258c2
Fix `fuse_unk` for tokenizers with `byte_fallback=true` but no byte f…
xenova Sep 10, 2024
4798755
Add llama tokenizer unit tests
xenova Sep 10, 2024
c6c3ae1
Update emoji test string names
xenova Sep 10, 2024
79a7409
Move whisper-specific unit tests to subfolder
xenova Sep 10, 2024
1a38804
Code formatting
xenova Sep 10, 2024
dabe6ae
Bump versions
xenova Sep 10, 2024
54f1f21
[version] Update to 3.0.0-alpha.15
xenova Sep 10, 2024
a912d79
Add emoji tokenizer test cases for LlamaTokenizer
xenova Sep 12, 2024
969d10e
Attempt to fix encoder-decoder memory leak
xenova Sep 17, 2024
072cbbc
Remove unused code
xenova Sep 17, 2024
14b4bd4
Fix BertNormalizer (strip `Mn` unicode characters)
xenova Sep 17, 2024
6797771
Handle ZERO WIDTH JOINER (U+200D) characters
xenova Sep 17, 2024
f148afd
Add more spm normalization characters
xenova Sep 17, 2024
ca4b5b9
Add emoji unit tests for bert/t5
xenova Sep 17, 2024
113c81e
[WebNN] Add support for specifying `free_dimension_overrides` in config
xenova Sep 18, 2024
9005acc
Log warning if webnn is selected by `free_dimension_overrides` is not…
xenova Sep 18, 2024
682c7d0
Fix unigram for multi-byte tokens
xenova Sep 18, 2024
4a31e54
Add gemma tokenizer tests
xenova Sep 22, 2024
7a16065
Allow user to specify device and dtype in config.json
xenova Sep 23, 2024
4c1d21b
Update dependency versions
xenova Sep 23, 2024
3c6a95a
Bump versions
xenova Sep 23, 2024
ac391d2
[version] Update to 3.0.0-alpha.16
xenova Sep 23, 2024
d30d3b7
Add CLIP tokenizer unit tests
xenova Sep 23, 2024
e089ef4
Add more tokenizer tests
xenova Sep 23, 2024
2c9e271
Bump onnxruntime-web version
xenova Sep 27, 2024
ee1e32a
Bump versions
xenova Sep 27, 2024
f41e995
[version] Update to 3.0.0-alpha.17
xenova Sep 27, 2024
9a42cf3
Add support for new `tokenizers>=0.2.0` BPE serialization format
xenova Sep 27, 2024
f534b35
Bump onnxruntime-web version
xenova Sep 29, 2024
0c8b1af
Bump versions
xenova Sep 29, 2024
2ca4178
[version] Update to 3.0.0-alpha.18
xenova Sep 30, 2024
a82e7ef
Keep encoder outputs on GPU
xenova Sep 30, 2024
c37a38c
Update whisper-webgpu demo dependencies
xenova Sep 30, 2024
e1c4fc6
Bump versions
xenova Sep 30, 2024
fe51609
[version] Update to 3.0.0-alpha.19
xenova Sep 30, 2024
b518866
Support to load ONNX APIs based on JS runtime (#947)
kallebysantos Sep 30, 2024
95c8cc5
Allow specification of `use_external_data_format` in custom config
xenova Oct 3, 2024
03eb77b
Update deberta unit tests
xenova Oct 3, 2024
c61a76b
Update roberta tokenizer tests
xenova Oct 3, 2024
32d8df4
Support inferringunigram tokenizer type
xenova Oct 4, 2024
6505abb
Reuse tokenizer tests for original t5-small
xenova Oct 4, 2024
9619218
Remove redundant null coalesce
xenova Oct 4, 2024
52c4ce7
Enable unit test coverage reports
xenova Oct 7, 2024
12edaf0
Use `PROBLEMATIC_REGEX_MAP` for bloom tokenizer
xenova Oct 7, 2024
5e7e82b
Improve tokenizer unit tests
xenova Oct 7, 2024
795a61a
Update tokenizer unit tests
xenova Oct 8, 2024
77ebe0d
Remove unused code
xenova Oct 8, 2024
56eda3b
Add m2m_100 tokenizer unit tests
xenova Oct 8, 2024
2040ad5
Add m2m translation pipeline unit test
xenova Oct 8, 2024
8718c17
Add support for Depth Pro models
xenova Oct 9, 2024
a32efa3
Add whisper turbo alignment heads
xenova Oct 9, 2024
8b0d330
Remove in-library list of supported models
xenova Oct 9, 2024
cf3f5c3
Bump versions
xenova Oct 9, 2024
86fe175
[version] Update to 3.0.0-alpha.20
xenova Oct 9, 2024
1c78278
Add function to map tensor data array.
BritishWerewolf Oct 9, 2024
a5e0210
Merge branch 'main' into v3
xenova Oct 9, 2024
9f8fac0
Optimise loop to reduce calls to `this`
BritishWerewolf Oct 9, 2024
1c43e3f
Merge branch 'pr/966' into v3
xenova Oct 10, 2024
7a0f77c
Add back tensor map test
xenova Oct 10, 2024
da03a0a
Add support for granite models
xenova Oct 12, 2024
37effa3
Allow multiple optional configs to be passed (+ reduce code duplication)
xenova Oct 12, 2024
f21b36e
Bump dependencies
xenova Oct 14, 2024
d26a663
Bump versions
xenova Oct 14, 2024
c337c3b
[version] Update to 3.0.0-alpha.21
xenova Oct 14, 2024
92d0dc6
Add support for per-dtype `kv_cache_dtype`
xenova Oct 17, 2024
ea03bf5
Add text streamer unit test
xenova Oct 17, 2024
27a033f
Bump ORT web version
xenova Oct 17, 2024
19277ea
Bump versions
xenova Oct 17, 2024
90a7490
[version] Update to 3.0.0-alpha.22
xenova Oct 17, 2024
38773ea
Update repo name to `@huggingface/transformers.js`
xenova Oct 18, 2024
832b5b7
Update tested node versions
xenova Oct 18, 2024
b871c08
Bump versions
xenova Oct 18, 2024
7a58d6e
[version] Update to 3.0.0
xenova Oct 18, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,17 @@ on:
pull_request:
branches:
- main

types:
- opened
- reopened
- synchronize
- ready_for_review
env:
TESTING_REMOTELY: true

jobs:
build:
if: github.event.pull_request.draft == false
runs-on: ubuntu-latest

strategy:
Expand Down
5 changes: 2 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ npm i @xenova/transformers
Alternatively, you can use it in vanilla JS, without any bundler, by using a CDN or static hosting. For example, using [ES Modules](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Modules), you can import the library with:
```html
<script type="module">
import { pipeline } from 'https://cdn.jsdelivr.net/npm/@xenova/transformers@2.16.0';
import { pipeline } from 'https://cdn.jsdelivr.net/npm/@xenova/transformers@3.0.0-alpha.0';
</script>
```

Expand Down Expand Up @@ -134,8 +134,7 @@ Check out the Transformers.js [template](https://huggingface.co/new-space?templa



By default, Transformers.js uses [hosted pretrained models](https://huggingface.co/models?library=transformers.js) and [precompiled WASM binaries](https://cdn.jsdelivr.net/npm/@xenova/[email protected]/dist/), which should work out-of-the-box. You can customize this as follows:

By default, Transformers.js uses [hosted pretrained models](https://huggingface.co/models?library=transformers.js) and [precompiled WASM binaries](https://cdn.jsdelivr.net/npm/@xenova/[email protected]/dist/), which should work out-of-the-box. You can customize this as follows:

### Settings

Expand Down
2 changes: 1 addition & 1 deletion docs/snippets/2_installation.snippet
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,6 @@ npm i @xenova/transformers
Alternatively, you can use it in vanilla JS, without any bundler, by using a CDN or static hosting. For example, using [ES Modules](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Modules), you can import the library with:
```html
<script type="module">
import { pipeline } from 'https://cdn.jsdelivr.net/npm/@xenova/transformers@2.16.0';
import { pipeline } from 'https://cdn.jsdelivr.net/npm/@xenova/transformers@3.0.0-alpha.0';
</script>
```
3 changes: 1 addition & 2 deletions docs/snippets/4_custom-usage.snippet
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@


By default, Transformers.js uses [hosted pretrained models](https://huggingface.co/models?library=transformers.js) and [precompiled WASM binaries](https://cdn.jsdelivr.net/npm/@xenova/[email protected]/dist/), which should work out-of-the-box. You can customize this as follows:

By default, Transformers.js uses [hosted pretrained models](https://huggingface.co/models?library=transformers.js) and [precompiled WASM binaries](https://cdn.jsdelivr.net/npm/@xenova/[email protected]/dist/), which should work out-of-the-box. You can customize this as follows:

### Settings

Expand Down
24 changes: 24 additions & 0 deletions examples/webgpu-embedding-benchmark/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Logs
logs
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
pnpm-debug.log*
lerna-debug.log*

node_modules
dist
dist-ssr
*.local

# Editor directories and files
.vscode/*
!.vscode/extensions.json
.idea
.DS_Store
*.suo
*.ntvs*
*.njsproj
*.sln
*.sw?
46 changes: 46 additions & 0 deletions examples/webgpu-embedding-benchmark/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
<!DOCTYPE html>
<html lang="en">

<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Transformers.js | WebGPU Benchmark</title>
</head>

<body>
<h1>
<a href="http://github.com/xenova/transformers.js" target="_blank">πŸ€— Transformers.js</a> WebGPU Benchmark
</h1>
<p>
This benchmark measures the execution time of <a
href="https://huggingface.co/Xenova/all-MiniLM-L6-v2" target="_blank">Xenova/all-MiniLM-L6-v2</a> (bert-based embedding model)
using the WASM and WebGPU execution providers across different batch sizes.
</p>
<div id="chart-container">
<canvas id="chart"></canvas>
</div>
<div>
<button id="start" disabled>Start Benchmark</button>
<button id="stop" disabled>Stop Benchmark</button>
</div>
<label id="status"></label>
<details open>
<summary>Options</summary>
<div>
<label>Batch sizes</label>
<input id="batch-sizes" value="1, 2, 4, 8, 16, 32" />
</div>
<div>
<label>Sequence length</label>
<input id="sequence-length" type="number" min="1" max="512" value="512" />
</div>
<div>
<input id="x-scale" type="checkbox" />
<label>Log scale (x)</label>
<input id="y-scale" type="checkbox" />
<label>Log scale (y)</label>
</div>
</details>
<script type="module" src="/main.js"></script>
</body>
</html>
255 changes: 255 additions & 0 deletions examples/webgpu-embedding-benchmark/main.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,255 @@
import './style.css';
import { env, AutoModel, ones } from '@xenova/transformers';
import Chart from 'chart.js/auto';

// Throw an error if WebGPU is not supported
if (!navigator.gpu) {
const err = 'WebGPU is not supported by this browser.';
alert(err)
throw Error(err);
}

// Proxy the WASM backend to prevent the UI from freezing
xenova marked this conversation as resolved.
Show resolved Hide resolved
env.backends.onnx.wasm.wasmPaths = 'https://cdn.jsdelivr.net/npm/[email protected]/dist/';
env.backends.onnx.wasm.numThreads = 1;

// Reference the elements that we will need
const ctx = document.getElementById('chart');
const batchSizes = document.getElementById('batch-sizes');
const xscale = document.getElementById('x-scale');
const yscale = document.getElementById('y-scale');
const sequenceLength = document.getElementById('sequence-length');
const status = document.getElementById('status');
const start = document.getElementById('start');
const stop = document.getElementById('stop');

// Benchmark settings
const NUM_WARMUP_STEPS = 3;
const QUANTIZED = false;
const MODEL_ID = 'Xenova/all-MiniLM-L6-v2';

// Chart configuration
const config = {
type: 'line',
data: {
labels: [],
datasets: [{
label: 'WASM',
data: [],
borderColor: 'red',
backgroundColor: 'rgba(255, 0, 0, 0.5)',
}, {
label: 'WebGPU',
data: [],
borderColor: 'blue',
backgroundColor: 'rgba(0, 0, 255, 0.5)',
}]
},
options: {
responsive: true,
maintainAspectRatio: false,
plugins: {
legend: {
position: 'top',
},
},
scales: {
x: {
title: {
display: true,
text: 'Batch size',
},
min: 1,
},
y: {
title: {
display: true,
text: 'Time (ms)',
},
}
}
},
};

const toggleScale = (chart, axis, enabled) => {
chart.options.scales[axis].type = enabled ? 'logarithmic' : 'linear';
chart.update();
}

xscale.addEventListener('change', () => toggleScale(chart, 'x', xscale.checked));
yscale.addEventListener('change', () => toggleScale(chart, 'y', yscale.checked));

const chart = new Chart(ctx, config);

status.textContent = 'Loading model...';

let model_CPU;
try {
model_CPU = await AutoModel.from_pretrained(MODEL_ID, {
quantized: QUANTIZED,
device: 'webgpu'
});
} catch (err) {
status.textContent = err.message;
alert(err.message)
throw err;
}

let model_GPU;
try {
model_GPU = await AutoModel.from_pretrained(MODEL_ID, {
quantized: QUANTIZED,
session_options: {
executionProviders: ['webgpu']
}
});
} catch (err) {
status.textContent = err.message;
alert(err.message)
throw err;
}

let adapterInfo;
try {
// Shouldn't fail since the WebGPU model has loaded successfully
const adapter = await navigator.gpu.requestAdapter();
adapterInfo = await adapter.requestAdapterInfo();
} catch (err) {
adapterInfo = {};
}

status.textContent = 'Ready';

let interrupted = false;
start.addEventListener('click', async () => {
start.disabled = true;
stop.disabled = false;
interrupted = false;

// Reset
chart.data.labels = [];
for (let i = 0; i < chart.data.datasets; ++i) {
chart.data.datasets[i].data = [];
}
chart.update();

const seqLength = parseInt(sequenceLength.value);

status.textContent = 'Warming up...';

const generateDummyInputs = (batch_size) => {

const inputs = ones([batch_size, seqLength]);

const model_inputs = {
input_ids: inputs,
attention_mask: inputs,
}
return model_inputs;
}

// Warm up: This is important for the WebGPU execution provider, which compiles the shaders on first load
for (let i = 0; i < NUM_WARMUP_STEPS; ++i) {
const model_inputs = generateDummyInputs(1);
await model_CPU(model_inputs);
await model_GPU(model_inputs);
}

status.textContent = 'Running benchmark...';

const batch_sizes = batchSizes.value.split(',').map(x => parseInt(x)).filter(x => x);

for (const batch_size of batch_sizes) {
if (interrupted) break;

const model_inputs = generateDummyInputs(batch_size);

let wasmTime;
{ // Run WASM
const start = performance.now();
await model_CPU(model_inputs);
const end = performance.now();
wasmTime = end - start;
}

let webGPUTime;
{ // Run WebGPU
const start = performance.now();
await model_GPU(model_inputs);
const end = performance.now();
webGPUTime = end - start;
}
chart.data.labels.push(batch_size);
chart.data.datasets[0].data.push(wasmTime);
chart.data.datasets[1].data.push(webGPUTime);
chart.update();
}

// Calculate max speedup:
if (chart.data.labels.length === 0) return;

const table = generateResultsTable(chart.data, seqLength);

const speedup = chart.data.datasets[0].data.at(-1) / chart.data.datasets[1].data.at(-1);
const roundedSpeedup = speedup.toFixed(2);
const params = new URLSearchParams({
title: `⚑ WebGPU Benchmark Results (${roundedSpeedup}x speedup)`,
description: table.outerHTML,
});

const paramsStr = params.toString();
status.innerHTML = `⚑ Done! WebGPU is <strong>${roundedSpeedup}x</strong> faster! <a href="https://huggingface.co/spaces/Xenova/webgpu-embedding-benchmark/discussions/new?${paramsStr}" target="_blank">Share results</a>`;
start.disabled = false;
});
start.disabled = false;

stop.addEventListener('click', () => {
status.textContent = 'Stopping...';
interrupted = true;
stop.disabled = true;
});

function generateResultsTable(data, sequence_length) {
const datasets = data.datasets.map(d => d.data);
const batch_sizes = data.labels;

const container = document.createElement('div');

const table = document.createElement('table');
const thead = table.createTHead();
const tbody = table.createTBody();

// Add header row
const headerRow = thead.insertRow();
headerRow.insertCell().textContent = 'Batch Size';
headerRow.insertCell().textContent = `WASM (ms)`;
headerRow.insertCell().textContent = `WebGPU (ms)`;

// Add data rows
batch_sizes.forEach((batchSize, rowIndex) => {
const row = tbody.insertRow();
row.insertCell().textContent = batchSize;
datasets.forEach(dataset => {
row.insertCell().textContent = dataset[rowIndex].toFixed(2);
});
});

container.appendChild(table);

const createBulletPoint = (text) => {
const li = document.createElement('li');
li.textContent = text;
return li;
}

// Add other information
const info = document.createElement('ul');
info.appendChild(createBulletPoint(`Model: ${MODEL_ID}`));
info.appendChild(createBulletPoint(`Quantized: ${QUANTIZED}`));
info.appendChild(createBulletPoint(`Sequence length: ${sequence_length}`));
info.appendChild(createBulletPoint(`Browser: ${navigator.userAgent}`));
info.appendChild(createBulletPoint(`GPU: vendor=${adapterInfo.vendor}, architecture=${adapterInfo.architecture}, device=${adapterInfo.device}, description=${adapterInfo.description}`));
container.appendChild(info);

return container;
}
Loading
Loading