Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed many accessibility issues. #20990

Merged
merged 2 commits into from
Jun 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -57,14 +57,29 @@
}
}
};
const toggleScroll = () => {
if (scrollerRef) {
const currentState = window.getComputedStyle(scrollerRef).animationPlayState;
scrollerRef.style.animationPlayState = currentState === 'running' ? 'paused' : 'running';
}
};
const handleKeyDown = (event: { key: string; preventDefault: () => void; }) => {
if (event.key === 'Enter' || event.key === ' ') {
event.preventDefault(); // Prevent default spacebar scrolling behavior
toggleScroll();
}
};
</script>

<div bind:this={containerRef} class={cn('scroller relative z-2 overflow-hidden ', className)}>
<button class="hover:bg-primary focus:bg-primary menu-item py-2 sr-only focus:not-sr-only" on:keydown={handleKeyDown} on:click={toggleScroll}>Toggle scrolling</button>
<ul
bind:this={scrollerRef}
class={cn(
' flex w-max min-w-full shrink-0 flex-nowrap gap-4 py-4',
start && 'animate-scroll ',
start && 'animate-scroll',
pauseOnHover && 'hover:[animation-play-state:paused]'
)}
>
Expand Down
2 changes: 1 addition & 1 deletion src/routes/+layout.svelte
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
<Header />
{/if}
{#key data.pathname}
<div in:fade={{ duration: 300, delay: 400 }} out:fade={{ duration: 300 }}>
<div id="main-content" in:fade={{ duration: 300, delay: 400 }} out:fade={{ duration: 300 }}>
<slot />
</div>
{/key}
Expand Down
38 changes: 19 additions & 19 deletions src/routes/blogs/accelerating-llama-2/+page.svelte
Original file line number Diff line number Diff line change
Expand Up @@ -45,11 +45,11 @@
<div class="container mx-auto px-4 md:px-8 lg:px-48 pt-8">
<h1 class="text-5xl pb-2">Accelerating LLaMA-2 Inference with ONNX Runtime</h1>
<p class="text-neutral">
By: <a href="https://www.linkedin.com/in/kunal-v-16315b94" class="text-blue-500"
By: <a href="https://www.linkedin.com/in/kunal-v-16315b94" class="text-blue-700"
>Kunal Vaishnavi</a
>
and
<a href="https://www.linkedin.com/in/parinitaparinita/" class="text-blue-500">Parinita Rahi</a>
<a href="https://www.linkedin.com/in/parinitaparinita/" class="text-blue-700">Parinita Rahi</a>
</p>
<p class="text-neutral">
14TH NOVEMBER, 2023 <span class="italic text-stone-500">(Updated 22nd November)</span>
Expand All @@ -70,13 +70,13 @@
quantization updates, and cross-platform usage scenarios.
</p>

<h2 class="text-blue-500 text-3xl mb-4">Background: Llama2 and Microsoft</h2>
<h2 class="text-blue-700 text-3xl mb-4">Background: Llama2 and Microsoft</h2>

<p class="mb-4">
Llama2 is a state-of-the-art open source LLM from Meta ranging in scale from 7B to 70B
parameters (7B, 13B, 70B). Microsoft and Meta <a
href="https://blogs.microsoft.com/blog/2023/07/18/microsoft-and-meta-expand-their-ai-partnership-with-llama-2-on-azure-and-windows/"
class="text-blue-500">announced</a
class="text-blue-700">announced</a
> their AI on Azure and Windows collaboration in July 2023. As part of the announcement, Llama2
was added to the Azure AI model catalog, which serves as a hub of foundation models that empower
developers and machine learning (ML) professionals to easily discover, evaluate, customize, and
Expand All @@ -89,7 +89,7 @@
your costs.
</p>

<h2 class="text-blue-500 text-3xl mb-4">
<h2 class="text-blue-700 text-3xl mb-4">
Faster Inferencing with New ONNX Runtime Optimizations
</h2>

Expand All @@ -115,7 +115,7 @@
</div>
<div class="mt-2 mb-4 text-center">Figure 1: E2E Throughput Comparisons</div>

<h2 class="text-blue-500 text-3xl mb-4">Latency and Throughput</h2>
<h2 class="text-blue-700 text-3xl mb-4">Latency and Throughput</h2>

<p class="mb-4">
The graphs below show latency comparisons between the ONNX Runtime and PyTorch variants of the
Expand Down Expand Up @@ -152,11 +152,11 @@
<p class="mb-4">
More details on these metrics can be found <a
href="https://github.com/microsoft/onnxruntime-inference-examples/blob/main/python/models/llama/README.md"
class="text-blue-500">here</a
class="text-blue-700">here</a
>.
</p>

<h2 class="text-blue-500 text-3xl mb-4">ONNX Runtime with Multi-GPU Inference</h2>
<h2 class="text-blue-700 text-3xl mb-4">ONNX Runtime with Multi-GPU Inference</h2>

<p class="mb-4">
ONNX Runtime supports multi-GPU inference to enable serving large models. Even in FP16
Expand All @@ -165,7 +165,7 @@
</p>

<p class="mb-4">
ONNX Runtime applied <a href="https://arxiv.org/pdf/1909.08053.pdf" class="text-blue-500"
ONNX Runtime applied <a href="https://arxiv.org/pdf/1909.08053.pdf" class="text-blue-700"
>Megatron-LM</a
>
Tensor Parallelism on the 70B model to split the original model weight onto different GPUs. Megatron
Expand All @@ -176,7 +176,7 @@
You can find additional example scripts
<a
href="https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/llama/"
class="text-blue-500">here</a
class="text-blue-700">here</a
>.
</p>

Expand All @@ -185,7 +185,7 @@
<figcaption class="mt-2 mb-4 text-center">Figure 4: 70B Llama2 Model Throughput</figcaption>
</figure>

<h2 class="text-blue-500 text-3xl mb-4">ONNX Runtime Optimizations</h2>
<h2 class="text-blue-700 text-3xl mb-4">ONNX Runtime Optimizations</h2>
<figure class="px-10 pt-4">
<img src={figure5} alt="LLaMA-2 Optimization Diagram" />
<figcaption class="mt-2 mb-4 text-center">Figure 5: LLaMA-2 Optimization Diagram</figcaption>
Expand Down Expand Up @@ -252,24 +252,24 @@
calculate the rotary embeddings more efficiently with less memory usage. The rotary embedding
compute kernels also support interleaved and non-interleaved formats to support both the <a
href="https://github.com/microsoft/Llama-2-Onnx"
class="text-blue-500">Microsoft version of LLaMA-2</a
class="text-blue-700">Microsoft version of LLaMA-2</a
>
and the Hugging Face version of LLaMA-2 respectively while sharing the same calculations.
</p>

<p class="mb-4">
The optimizations work for the <a
href="https://huggingface.co/meta-llama"
class="text-blue-500">Hugging Face versions</a
class="text-blue-700">Hugging Face versions</a
>
(models ending with <i>-hf</i>) and the Microsoft versions. You can download the optimized HF
versions from
<a href="https://github.com/microsoft/Llama-2-Onnx/tree/main-CUDA_CPU" class="text-blue-500"
<a href="https://github.com/microsoft/Llama-2-Onnx/tree/main-CUDA_CPU" class="text-blue-700"
>Microsoft's LLaMA-2 ONNX repository</a
>. Stay tuned for newer Microsoft versions coming soon!
</p>

<h2 class="text-blue-500 text-3xl mb-4">Optimize your own model using Olive</h2>
<h2 class="text-blue-700 text-3xl mb-4">Optimize your own model using Olive</h2>

<p class="mb-4">
Olive is a hardware-aware model optimization tool that incorporates advanced techniques such
Expand All @@ -281,25 +281,25 @@
<p class="mb-4">
Here is an example of <a
href="https://github.com/microsoft/Olive/tree/main/examples/llama2"
class="text-blue-500">Llama2 optimization with Olive</a
class="text-blue-700">Llama2 optimization with Olive</a
>, which harnesses ONNX Runtime optimizations highlighted in this blog. Distinct optimization
flows cater to various requirements. For instance, you have the flexibility to choose
different data types for quantization in CPU and GPU inference, based on your accuracy
tolerance. Additionally, you can fine-tune your own Llama2 model with Olive-QLoRa on client
GPUs and perform inference with ONNX Runtime optimizations.
</p>

<h2 class="text-blue-500 text-3xl mb-4">Usage Example</h2>
<h2 class="text-blue-700 text-3xl mb-4">Usage Example</h2>

<p class="mb-4">
Here is a <a
href="https://github.com/microsoft/onnxruntime-inference-examples/blob/main/python/models/llama/LLaMA-2%20E2E%20Notebook.ipynb"
class="text-blue-500">sample notebook</a
class="text-blue-700">sample notebook</a
> that shows you an end-to-end example of how you can use the above ONNX Runtime optimizations
in your application.
</p>

<h2 class="text-blue-500 text-3xl mb-4">Conclusion</h2>
<h2 class="text-blue-700 text-3xl mb-4">Conclusion</h2>

<p class="mb-4">
The advancements discussed in this blog provide faster Llama2 inferencing with ONNX Runtime,
Expand Down
2 changes: 1 addition & 1 deletion src/routes/blogs/blog-post-featured.svelte
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
<h2 class="card-title">{title}</h2>
<p>{description}</p>
<img class="rounded" src={image} alt={imgalt} />
<div class="text-right text-blue-500">
<div class="text-right text-blue-700">
{date}
</div>
</div>
Expand Down
2 changes: 1 addition & 1 deletion src/routes/blogs/blog-post.svelte
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
<div class="card-body">
<h2 class="card-title">{title}</h2>
<p>{description}</p>
<p class="text-blue-500 text-right">
<p class="text-blue-700 text-right">
{date}
</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion src/routes/blogs/post.svelte
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@
<p class="inline">By:</p>
{/if}
{#each authors as author, i}
<a href={authorsLink[i]} class="text-blue-500">{author}</a>{i + 1 === authors.length
<a href={authorsLink[i]} class="text-blue-700">{author}</a>{i + 1 === authors.length
? ''
: ', '}
{/each}
Expand Down
30 changes: 15 additions & 15 deletions src/routes/blogs/pytorch-on-the-edge/+page.svelte
Original file line number Diff line number Diff line change
Expand Up @@ -179,9 +179,9 @@ fun run(audioTensor: OnnxTensor): Result {
<div class="container mx-auto px-4 md:px-8 lg:px-48 pt-8">
<h1 class="text-5xl pb-2">Run PyTorch models on the edge</h1>
<p class="text-neutral">
By: <a href="https://www.linkedin.com/in/natkershaw/" class="text-blue-500">Natalie Kershaw</a>
By: <a href="https://www.linkedin.com/in/natkershaw/" class="text-blue-700">Natalie Kershaw</a>
and
<a href="https://www.linkedin.com/in/prasanthpulavarthi/" class="text-blue-500"
<a href="https://www.linkedin.com/in/prasanthpulavarthi/" class="text-blue-700"
>Prasanth Pulavarthi</a
>
</p>
Expand Down Expand Up @@ -217,12 +217,12 @@ fun run(audioTensor: OnnxTensor): Result {
anywhere that is outside of the cloud, ranging from large, well-resourced personal computers
to small footprint devices such as mobile phones. This has been a challenging task to
accomplish in the past, but new advances in model optimization and software like
<a href="https://onnxruntime.ai/pytorch" class="text-blue-500">ONNX Runtime</a>
<a href="https://onnxruntime.ai/pytorch" class="text-blue-700">ONNX Runtime</a>
make it more feasible - even for new generative AI and large language models like Stable Diffusion,
Whisper, and Llama2.
</p>

<h2 class="text-blue-500 text-3xl mb-4">Considerations for PyTorch models on the edge</h2>
<h2 class="text-blue-700 text-3xl mb-4">Considerations for PyTorch models on the edge</h2>

<p class="mb-4">
There are several factors to keep in mind when thinking about running a PyTorch model on the
Expand Down Expand Up @@ -292,7 +292,7 @@ fun run(audioTensor: OnnxTensor): Result {
</li>
</ul>

<h2 class="text-blue-500 text-3xl mb-4">Tools for PyTorch models on the edge</h2>
<h2 class="text-blue-700 text-3xl mb-4">Tools for PyTorch models on the edge</h2>

<p class="mb-4">
We mentioned ONNX Runtime several times above. ONNX Runtime is a compact, standards-based
Expand All @@ -305,7 +305,7 @@ fun run(audioTensor: OnnxTensor): Result {
format that doesn't require the PyTorch framework and its gigabytes of dependencies. PyTorch
has thought about this and includes an API that enables exactly this - <a
href="https://pytorch.org/docs/stable/onnx.html"
class="text-blue-500">torch.onnx</a
class="text-blue-700">torch.onnx</a
>. <a href="https://onnx.ai/">ONNX</a> is an open standard that defines the operators that make
up models. The PyTorch ONNX APIs take the Pythonic PyTorch code and turn it into a functional
graph that captures the operators that are needed to run the model without Python. As with everything
Expand All @@ -318,7 +318,7 @@ fun run(audioTensor: OnnxTensor): Result {
The popular Hugging Face library also has APIs that build on top of this torch.onnx
functionality to export models to the ONNX format. Over <a
href="https://huggingface.co/blog/ort-accelerating-hf-models"
class="text-blue-500">130,000 models</a
class="text-blue-700">130,000 models</a
> are supported making it very likely that the model you care about is one of them.
</p>

Expand All @@ -328,7 +328,7 @@ fun run(audioTensor: OnnxTensor): Result {
and web browsers) via various languages (from C# to JavaScript to Swift).
</p>

<h2 class="text-blue-500 text-3xl mb-4">Examples of PyTorch models on the edge</h2>
<h2 class="text-blue-700 text-3xl mb-4">Examples of PyTorch models on the edge</h2>

<h3 class=" text-2xl mb-2">Stable Diffusion on Windows</h3>

Expand All @@ -345,7 +345,7 @@ fun run(audioTensor: OnnxTensor): Result {
<p class="mb-4">
You don't have to export the fifth model, ClipTokenizer, as it is available in <a
href="https://onnxruntime.ai/docs/extensions"
class="text-blue-500">ONNX Runtime extensions</a
class="text-blue-700">ONNX Runtime extensions</a
>, a library for pre and post processing PyTorch models.
</p>

Expand All @@ -366,15 +366,15 @@ fun run(audioTensor: OnnxTensor): Result {
<p class="mb-4">
You can build the application and run it on Windows with the detailed steps shown in this <a
href="https://onnxruntime.ai/docs/tutorials/csharp/stable-diffusion-csharp.html"
class="text-blue-500">tutorial</a
class="text-blue-700">tutorial</a
>.
</p>

<h3 class=" text-2xl mb-2">Text generation in the browser</h3>

<p class="mb-4">
Running a PyTorch model locally in the browser is not only possible but super simple with
the <a href="https://huggingface.co/docs/transformers.js/index" class="text-blue-500"
the <a href="https://huggingface.co/docs/transformers.js/index" class="text-blue-700"
>transformers.js</a
> library. Transformers.js uses ONNX Runtime Web as its backend. Many models are already converted
to ONNX and served by the tranformers.js CDN, making inference in the browser a matter of writing
Expand Down Expand Up @@ -407,7 +407,7 @@ fun run(audioTensor: OnnxTensor): Result {
All components of the Whisper Tiny model (audio decoder, encoder, decoder, and text sequence
generation) can be composed and exported to a single ONNX model using the <a
href="https://github.com/microsoft/Olive/tree/main/examples/whisper"
class="text-blue-500">Olive framework</a
class="text-blue-700">Olive framework</a
>. To run this model as part of a mobile application, you can use ONNX Runtime Mobile, which
supports Android, iOS, react-native, and MAUI/Xamarin.
</p>
Expand All @@ -420,7 +420,7 @@ fun run(audioTensor: OnnxTensor): Result {
<p class="mb-4">
The relevant snippet of a example <a
href="https://github.com/microsoft/onnxruntime-inference-examples/tree/main/mobile/examples/speech_recognition"
class="text-blue-500">Android mobile app</a
class="text-blue-700">Android mobile app</a
> that performs speech transcription on short samples of audio is shown below:
</p>
<Highlight language={kotlin} code={mobilecode} />
Expand Down Expand Up @@ -476,11 +476,11 @@ fun run(audioTensor: OnnxTensor): Result {
<p class="mb-4">
You can read the full <a
href="https://onnxruntime.ai/docs/tutorials/on-device-training/ios-app.html"
class="text-blue-500">Speaker Verification tutorial</a
class="text-blue-700">Speaker Verification tutorial</a
>, and
<a
href="https://github.com/microsoft/onnxruntime-training-examples/tree/master/on_device_training/mobile/ios"
class="text-blue-500">build and run the application from source</a
class="text-blue-700">build and run the application from source</a
>.
</p>

Expand Down
Loading
Loading