Merge main #19918

raoanag · 2024-03-14T16:40:05Z

Description

Motivation and Context

- Use the function `GetVecUint32FromVecInt64` in helper.h to replace `transform`. - Change some `int32_t` to `uint32_t`. - Remove a useless `temp`.

### Description Building on g++ 13.2.0 results in -Wstringop-overread errors on Linux. This commit addresses the flatbuffer build issue with the following changes: 1. Remove the Werror flag in the flarbuffer patch. 2. Add a compilation option to suppress the 'stringop-overflow' error in the Flatbuffers within the xnnpack provider. ### Motivation and Context google/flatbuffers#8119 #19239 Signed-off-by: Phoebe Chen <[email protected]>

### Description Adds type/shape inferencing support for MSFT domain QuantizeLinear and DequantizeLinear operators to symbolic_shape_infer.py ### Motivation and Context Need a way to infer the types and shapes of Q/DQ ops in models that use the MSFT domain versions (e.g., int16 quantization).

Disable Neural Speed to prevent the operation following MatMulNBits from significantly slowing down.

### Description  Increment num_resolves_ inside the graph resolve finalization function so the subgraphs have the same value. This prevents incorrect output regarding removing unused initializers. ### Motivation and Context  #19141

### Description  Refactor the VAIEP to use MSFT's standalone API ### Motivation and Context  Vitis ONNX RT VAI should switch to using the standalone API for ONNX EPs in order to decouple the EP from onnxruntime.dll and the providers.dll. This will help to simplify customer deployment of applications and use cases that need to share their onnxruntime.dll with other applications. --------- Co-authored-by: Zhenze Wang <[email protected]> Co-authored-by: zz002 <[email protected]>

### Description  Register resize-18 and -19, which will be lit up automatically when dml feature level bumps up to 6300. It's worth noting that DML has a different implementation for antialias than does ORT CPU. DML does iterative downsampling whenever the scale factor is less than 0.5. This is equivalent to performing resize with a variable-sized input window (also equivalent to mip mapping). ORT takes a different approach, using the same convolution approach as PIL. The two implementations approach each other in certain cases (with iota-generated data) but they usually aren't perfectly equivalent. ### Motivation and Context  --------- Co-authored-by: Linnea May <[email protected]>

### Description Multi-partition support for context binary cache feature 1. In QNNEP create the list of EPContext nodes if ep_context_enable is enabled, so that it can dump the model with multiple partitions 2. Extend context loading part to support multiple EPContext nodes ### Motivation and Context It only support single partition before this changes. There's graph partition limitation for context cache feature after this change.

…19276) ### Description When USE_ORTMODULE_TRITON is set to 1 but there's no triton library, triton function is silently turned off. This adds a warning

### Description Update the dependency of `oneDNN` to v3.0.1, which fixes a minor bug hindering gcc 13. ### Motivation and Context Referring to [oneDNN-1548](oneapi-src/oneDNN#1548). - When building with `--use_dnnl` using gcc 13.x, it will fail due to this upstream issue. - This is fixed in `v3.0.1` [tag](https://github.com/oneapi-src/oneDNN/tree/v3.0.1) by [this commit](oneapi-src/oneDNN@1d7971c).

…time (#19325) ### Description  ### Motivation and Context

### Description The test creates millions of threads. This change is to avoid that by using an existing thread pool. ### Motivation and Context

### Description  Resolving compilation errors when using USE_VITISAI ### Motivation and Context  There will be compilation errors when USE_VITISAI is enabled This is in addition to the #19058 Co-authored-by: Zhenze Wang <[email protected]>

### Description Currently, ORT will fail a build when the flag DEBUG_GENERATION is set to 1 (used to debug BeamSearch and GreedySearch) in [console_dumper.h](https://github.com/microsoft/onnxruntime/blob/3b63d85c253c50099c70ba0db6c141b842bc7cda/onnxruntime/contrib_ops/cpu/utils/console_dumper.h#L12) with the following error: `onnxruntime/onnxruntime/contrib_ops/cpu/transformers/logits_processor.h:270:15: error: ‘DumpScores’ was not declared in this scope` This is because it is defined in `logits_processor.cc`, and a debugging artifact was passed in an earlier PR where this function is called from `logits_processor.h` before it is defined [[link](https://github.com/microsoft/onnxruntime/blob/3a2ab1963a195fe8df59e6220d3f191e5dfe80ee/onnxruntime/contrib_ops/cpu/transformers/logits_processor.h#L270)]. Builds with the flag have been broken since that PR was merged. This PR moves DumpScores() definition from `logits_processor.cc` to `logits_processor.h` so that all debug statements can be used correctly in `logits_processor.cc` and `logits_processor.h` and build succeeds with this debug flag. --------- Co-authored-by: Peter McAughan <[email protected]>

### Description - When converting ONNX split sizes to QNN split indices, do not include the split at index 0. QNN 2.19 assumes index 0 is implicit and throws a validation error if provided. - Fix bug when using an ONNX Split operator with a `num_outputs` attribute that does not evenly divide into `shape[axis]`. The ONNX spec states that the last chunk should be smaller, but QNN EP made the last chunk larger. - Fix bug when using an ONNX Split operator with a `split` input. QNN EP was incorrectly passing the split sizes as split indices without conversion. ### Motivation and Context QNN SDK 2.19 updated validation criteria for Split operators. QNN EP was previously passing a split index that should have been implicit. Also, discovered a bugs when using `num_outputs` attribute and `split` input.

Use `MaskingSpecialization::MaskOutUpperTriangle` to support causal mask in ck implementation.

### Description This PR fixes below errors when enable webgpu profiling: ``` TypeError: Cannot read properties of undefined (reading 'push') ```

### Description support external data in npm test. This allows test runner to detect whether an external data is available in the test folder, and if it is, load it as external data automatically. this feature does not parse every model to figure out whether the model has external data. the following comments in code explained how to determine whether should parse the model file. ```js // for performance consideration, we do not parse every model. when we think it's likely to have external // data, we will parse it. We think it's "likely" when one of the following conditions is met: // 1. any file in the same folder has the similar file name as the model file // (e.g., model file is "model_abc.onnx", and there is a file "model_abc.pb" or "model_abc.onnx.data") // 2. the file size is larger than 1GB ```

### Description This PR 1) adds LeakyRelu activation for fusedConv; 2) makes `vec4<f16>` value work with `float32` uniforms attributes. For example: `clamp(value, vec4<f16>(uniforms.clip_min), vec4<f16>(uniforms.clip_max)` will throw compilation errors since `uniforms.clip_min` and `uniforms.clip_min` are `f32` not `f16`. So we need to change it to `clamp(value, vec4<f16>(f16(uniforms.clip_min)), vec4<f16>(f16(uniforms.clip_max))` And above problem was introduced when we make activation attributes as uniforms instead of constant. BTW, after adding LeakyRelu, `realesrgan-t256` model can pass.

Fix for #19376 - Use absolute import instead of relative import for now. - Fix some typo

### Description  Setup usage of coremltools via dependencies instead of copying files. Pull in some changes from #19347 in preparation for supporting ML Program and enabling building the ML Model on all platforms to make development and testing of CoreML EP code easier. - Update to coremltools 7.1 - Add patch for changes required for cross platform build of ML Program related code - Generate coreml proto files on all platforms - mainly to test these changes work everywhere, as the proto files will be used on all platforms when #19347 is checked in - rename onnxruntime_coreml_proto target to coreml_proto as it contains purely coreml protobuf code with no ORT related chagnes ### Motivation and Context  Improve setup.

Fix pytest version to 7.4.4, higher version will cause error `from onnxruntime.capi import onnxruntime_validation ModuleNotFoundError: No module named 'onnxruntime.capi'`

### Description 1. make parity_check use local model to avoid using hf token 2. del the model didn't work because it tried to del the object define out of the function scope. So it caused out of memory in A10. 3. In fact, 16G GPU memory (one T4) is enough. But the conversion process always be killed in T4 and it works on A10/24G. Standard_NC4as_T4_v3 has 28G CPU memory Standard_NV36ads_A10_v5 has 440G memory. It looks that the model conversion needs very huge memory. ### Motivation and Context Last time, I came across some issues in convert_to_onnx.py so I use the onnx model in https://github.com/microsoft/Llama-2-Onnx for testing. Now, these issues could be fixed. So I use onnx model generated by this repo and the CI can cover the model conversion.

Bumps [gradle/gradle-build-action](https://github.com/gradle/gradle-build-action) from 2 to 3. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/gradle/gradle-build-action/releases">gradle/gradle-build-action's releases</a>.</em></p> <blockquote> <h2>v3.0.0-rc.1</h2> <p>First release candidate of <code>gradle/[email protected]</code>. This release candidate will the first release available under the <code>v3</code> version tag.</p> <blockquote> <p>[!IMPORTANT] As of <code>v3</code> this action has been superceded by <code>gradle/actions/setup-gradle</code>. Any workflow that uses <code>gradle/gradle-build-action@v3</code> will transparently delegate to <code>gradle/actions/setup-gradle@v3</code>.</p> <p>Users are encouraged to update their workflows, replacing:</p> <pre><code>uses: gradle/gradle-build-action@v3 </code></pre> <p>with</p> <pre><code>uses: gradle/actions/setup-gradle@v3 </code></pre> <p>See the <a href="https://github.com/gradle/actions/tree/main/setup-gradle">setup-gradle documentation</a> for up-to-date documentation for <code>gradle/actons/setup-gradle</code>.</p> </blockquote> <h2>Changes from <code>gradle-build-action@v2</code></h2> <p>This release brings some useful and much requested features, including:</p> <ul> <li>save and restore the Gradle configuration-cache data</li> <li>add the Job summary content as a PR comment</li> <li>easily publish Build Scans® to the free <a href="https://scans.gradle.com">Gradle Build Scan service</a></li> <li>compatibility with Node 20</li> </ul> <p>The only major breaking change from <code>[email protected]</code> is the update to require a Node 20 runtime environment. Aside from that change, this release should generally serve as a drop-in replacement for <code>gradle-build-action@v2</code>.</p> <h3>Changelog</h3> <ul> <li>[NEW] - Run with NodeJs 20.x (<a href="https://redirect.github.com/gradle/gradle-build-action/issues/946">gradle/gradle-build-action#946</a>)</li> <li>[NEW] - Support for save & restore of configuration-cache data (<a href="https://redirect.github.com/gradle/gradle-build-action/issues/966">gradle/gradle-build-action#966</a>)</li> <li>[NEW] - Support for automatic adding PR comment with Job Summary content (<a href="https://redirect.github.com/gradle/gradle-build-action/issues/1020">gradle/gradle-build-action#1020</a>)</li> <li>[NEW] - Make it easy to publish a Build Scan® to <a href="https://scans.gradle.com">https://scans.gradle.com</a> (<a href="https://redirect.github.com/gradle/gradle-build-action/issues/1044">gradle/gradle-build-action#1044</a>)</li> <li>[NEW] - Added <code>dependency-graph-continue-on-failure</code> input, which can be set to <code>false</code> to force the Job to fail when dependency graph submission fails (<a href="https://redirect.github.com/gradle/gradle-build-action/issues/1036">gradle/gradle-build-action#1036</a>). Failure modes include: <ul> <li>Fail build step if version of Gradle being executed is not supported for dependency-graph generation (<a href="https://redirect.github.com/gradle/gradle-build-action/issues/1034">gradle/gradle-build-action#1034</a>)</li> <li>Fail job if permissions are insufficient to submit dependency graph via Dependency Submission API (<a href="https://redirect.github.com/gradle/gradle-build-action/issues/997">gradle/gradle-build-action#997</a>)</li> </ul> </li> <li>[NEW] - Add <code>dependency-graph: clear</code> option to clear any dependency-graph previously submitted by the job</li> <li>[FIX] Allow cache entries to be reused by jobs with the same ID in different workflows (<a href="https://redirect.github.com/gradle/gradle-build-action/issues/1017">gradle/gradle-build-action#1017</a>) <ul> <li>Workflow name remains part of the cache key, but cache entries generated by the same job id in a different workflow may be restored</li> </ul> </li> <li>[FIX] Register pre-installed JDKs in Maven toolchains.xml file (<a href="https://redirect.github.com/gradle/gradle-build-action/issues/1024">gradle/gradle-build-action#1024</a>) <ul> <li>This allows pre-installed JDKs to be auto-detected by Gradle Toolchain support on Windows</li> </ul> </li> <li>[FIX] - Update the Gradle Enterprise injection configuration for product rename to Develocity (<a href="https://redirect.github.com/gradle/gradle-build-action/issues/995">gradle/gradle-build-action#995</a>)</li> <li>[FIX] - Avoid submitting an empty dependency graph when state is loaded from configuration-cache</li> <li>[DEPRECATION] - Deprecation of the arguments parameter (<a href="https://redirect.github.com/gradle/gradle-build-action/issues/996">gradle/gradle-build-action#996</a>)</li> <li>[BREAKING CHANGE] - Remove the <code>gradle-executable</code> input parameter. Use a separate workflow Step to execute a Gradle from a custom location.</li> </ul>  </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/gradle/gradle-build-action/commit/4a8703fa348fe99fdb9d2ac233732f7dccea5437"><code>4a8703f</code></a> Delegate to '[email protected]'</li> <li><a href="https://github.com/gradle/gradle-build-action/commit/4a39eedb8c843f5dbd9abebfd404ae6e947328dc"><code>4a39eed</code></a> Mention setup-gradle in README</li> <li><a href="https://github.com/gradle/gradle-build-action/commit/272883a7ba6334b53c9c43b570853bc46021955b"><code>272883a</code></a> Remove all action sources: these have been migrated to 'gradle/actions'</li> <li><a href="https://github.com/gradle/gradle-build-action/commit/2a8bfcf2313611da65fd8cb2d81f50d99cb74ca0"><code>2a8bfcf</code></a> Delegate action implementation to gradle/actions/setup-gradle</li> <li><a href="https://github.com/gradle/gradle-build-action/commit/e1ada08a9a43fad9770411d5dd099f25ece2569d"><code>e1ada08</code></a> Bump the github-actions group with 1 update (<a href="https://redirect.github.com/gradle/gradle-build-action/issues/1047">#1047</a>)</li> <li><a href="https://github.com/gradle/gradle-build-action/commit/a8e3e5e2b4235aa45b6683dd85088aa7e737de34"><code>a8e3e5e</code></a> Apply dependency version updates</li> <li><a href="https://github.com/gradle/gradle-build-action/commit/2be01ca1c632ae5a688f391acd726cf89c392794"><code>2be01ca</code></a> Build outputs</li> <li><a href="https://github.com/gradle/gradle-build-action/commit/a00827eebb1e3036a35b5705ca9fc36a0f0ff33d"><code>a00827e</code></a> Bump the npm-dependencies group with 7 updates</li> <li><a href="https://github.com/gradle/gradle-build-action/commit/ad80850e980287e8a0b25382843366a43d8694dd"><code>ad80850</code></a> Bump the github-actions group with 2 updates</li> <li><a href="https://github.com/gradle/gradle-build-action/commit/bd6d0a74d4407cffbe7946377ff9dd004fae9570"><code>bd6d0a7</code></a> Configure explicit java version for config-cache test</li> <li>Additional commits viewable in <a href="https://github.com/gradle/gradle-build-action/compare/v2...v3">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=gradle/gradle-build-action&package-manager=github_actions&previous-version=2&new-version=3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

### Description  This PR adds onnx conversion script for dynamo exported phi2, optimization script, and inference example script A readme file is added as documentation. https://github.com/microsoft/onnxruntime/tree/wangye/phi2_doc/onnxruntime/python/tools/transformers/models/phi2#readme ### Motivation and Context  --------- Co-authored-by: Edward Chen <[email protected]>

### Description  Add ATen fallback support for bicubic interpolation algorithm. ### Motivation and Context  Required for facebook/dinov2 model architecture as part of ONNX Runtime integration with AML Vision models.

### Description fix output shape inference packed gqa

This pull request includes modifications to the `c-api-cpu.yml` Azure Pipelines configuration file. The changes mainly revolve around the Node.js packaging stage and the handling of Node.js artifacts. The most significant changes include renaming the Node.js packaging stage, adding a new dependency to the stage, changing artifact names, adding a new script to list Node.js artifacts, and updating the source folder for copying NuGet binaries. Changes in Node.js packaging: * [`tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml`](diffhunk://#diff-00815920cc190d10fdebceac0c3a4b8a59e408684ae38177dfe7f96cae276c59L503-R508): Renamed the Node.js packaging stage from `Nodejs_Packaging_CPU` to `Nodejs_Packaging` and added `Windows_CI_GPU_DML_Dev` as a new dependency to the stage. Changes in handling of Node.js artifacts: * [`tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml`](diffhunk://#diff-00815920cc190d10fdebceac0c3a4b8a59e408684ae38177dfe7f96cae276c59L568-R569): Changed the artifact name from `drop-onnxruntime-nodejs-win-x64` to `drop-onnxruntime-nodejs-win-x64-dml` in the task to download pipeline artifacts for Windows x64. * [`tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml`](diffhunk://#diff-00815920cc190d10fdebceac0c3a4b8a59e408684ae38177dfe7f96cae276c59R595-R598): Added a new script to list Node.js artifacts from the directory `$(Build.BinariesDirectory)/nodejs-artifacts/win32/x64/`. * [`tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml`](diffhunk://#diff-00815920cc190d10fdebceac0c3a4b8a59e408684ae38177dfe7f96cae276c59L635-R640): Updated the source folder from `$(Build.BinariesDirectory)\RelWithDebInfo\RelWithDebInfo\nuget-artifacts\onnxruntime-win-x64\lib` to `$(Build.BinariesDirectory)\nodejs-artifacts\win32\x64` in the task to copy NuGet binaries to the directory `$(Build.SourcesDirectory)\js\node\bin\napi-v3\win32\x64`. --------- Co-authored-by: Yulong Wang <[email protected]>

### Description Enable float32 model with FP16 precision for QNN HTP backend

### Description  ### Motivation and Context --extra-index-url is not allowed by injected Secure Supply Chain Step in packaging pipelines. ``` > Starting Multifeed Python Security Analysis: ##[warning]tools/ci_build/github/azure-pipelines/bigmodels-ci-pipeline.yml - Found "extra-index-url". (https://aka.ms/cfs/pypi) ``` And those 2 packages can be installed from PyPI as well now. Co-authored-by: Yi Zhang <[email protected]>

### Description Check the onnx node tests and model tests worked ### Motivation and Context onnx node test data and model data are mount in one dir. And onnxruntime_test_all search the dir and load the data. If the dir does exist or there's some change in onnxruntime_test_all, those tests may not be executed. For example, all onnx node test data is 32M. It's hardly for us aware of the regression. So I add the simple check to ensure those tests are executed. --------- Co-authored-by: Yi Zhang <[email protected]>

### Motivation and Context Routing updates

### Description Use vec<2> or vec<4>, operands in MatMulNBits ### Motivation and Context Improve performance

### Description ### Motivation and Context Linux GPU test on A10 isn't very stable

### Description the `npm test` flags are difficult to memorize, because they are different to the `ort.env` flags. This change makes those flags align with ort JS API. eg. `--wasm-enable-proxy` became `--wasm.proxy`. Old flags are marked as deprecated except `-x` (as a shortcut of `--wasm.numThreads`)

### Description GQA Rotary Dimension 1 incorrectly assumed to be based on head size. ### Motivation and Context This change should enable us to run phi-2 with GQA and Rotary Embedding fused.

### Description This PR updates the replacement of MultiHeadAttention (MHA) with GroupQueryAttention (GQA). It is related to the changes in [this PR](#18906). ### Motivation and Context The updated replacement of MHA with GQA includes the following fusion changes. - Apply sliding window within GQA - Fuse the rotary embeddings within GQA - Fuse the 3 MatMuls into 1 packed MatMul if possible - Fuse the 3 Adds into 1 packed Add if possible

Seed for DynamicQuantizeMatMul tests to avoid pipeline failures with marginal mismatches.

Support LRN NHWC in the CUDA EP. ### Motivation and Context Add support for all NHWC OPs to avoid NHWC/NCHW Layout transformation

### Description Fix WAI build by only conditionally copying linker flags ### Motivation and Context I broke the WAI build that contains ORT on ARM64

### Description  add new API KernelContext_GetScratchBuffer to get scratch buffer from kernel context ### Motivation and Context  add new API KernelContext_GetScratchBuffer to get scratch buffer from kernel context which will be used in ORT extension project for GroupQueryAttention custom op

…19903) ### Description Copies the `QNN_HOME/lib/hexagon-v73/unsigned/libqnnhtpv73.cat` file from QNN SDK to the unittest build directory. This is necessary in order to be able to load the `libQnnHtpV73Skel.so` file on Windows for modern versions of QNN SDK. ### Motivation and Context A [digitally-signed catalog file](https://learn.microsoft.com/en-us/windows-hardware/drivers/install/catalog-files) (.cat) can be used as a digital signature for an arbitrary collection of files.

Building onnxruntime ROCm EP with --enable_nccl --use_mpi fails due to inclusion of MOE source files but MOE is not supported. The error observed is `error: contrib_ops/rocm/moe/ft_moe/moe_kernel.h: No such file or directory` The fix is to exclude collective/sharded_moe.* files when nccl is requested.

### Description  Update to .net8. Didn't want to build with the latest VS2022 using net6 (which was EOL last year). ### Motivation and Context

### Refine logging for execution plan print Printing NodeIndex only is not enough for us to debug the execution order. keep original behaviour for ORT_MINIMAL_BUILD build in case of any CPU memory concerns. ### Motivation and Context

### Description  Add 2 C API for ORT extension: - KernelInfo_GetAllocator - OrtCustomOp::GetMayInplace ### Motivation and Context  Add 2 C API for ORT extension project, which will leverage these 2 APIs for GroupQueryAttention custom op.

…ad TRT binaries in every build (#19909) ### Description Change nuget pipeline's "Final_Jar_Testing_Windows_GPU" job to download TRT binaries in every build. Now all the other build jobs are already doing this. This is the only one left. ### Motivation and Context As a follow up of #19118

### Description Add a patch for Windows ARM64EC ### Motivation and Context Will need more changes in onnxruntime/core/common/cpuid_arch_definition.h and onnxruntime/core/common/cpuid_info.cc

…WAIold

baijumeswani and others added 30 commits January 30, 2024 22:11

Introduce a Nominal Checkpoint for On-Device Training (#19232)

3262e8d

[WebNN EP] Use GetVecUint32FromVecInt64 to simplify the code (#19324)

d87f73a

- Use the function `GetVecUint32FromVecInt64` in helper.h to replace `transform`. - Change some `int32_t` to `uint32_t`. - Remove a useless `temp`.

Turn off Neural Speed to avoid slowdowns (#19265)

55b60d8

Disable Neural Speed to prevent the operation following MatMulNBits from significantly slowing down.

Give a triton library missing warning instead of silently turn off (#…

3194818

…19276) ### Description When USE_ORTMODULE_TRITON is set to 1 but there's no triton library, triton function is silently turned off. This adds a warning

Improve MatMulNBits test (#19378)

13ad922

### Description The test creates millions of threads. This change is to avoid that by using an existing thread pool. ### Motivation and Context

[js/webgpu] Refactor createTensorShapeVariables (#18883)

3a2ab19

[ROCm] CK implementation support causal mask (#18943)

9139bdd

Use `MaskingSpecialization::MaskOutUpperTriangle` to support causal mask in ck implementation.

[js/webgpu] Fix the undefined push error (#19366)

efc17e7

### Description This PR fixes below errors when enable webgpu profiling: ``` TypeError: Cannot read properties of undefined (reading 'push') ```

update import in convert_generation.py (#19385)

18c3acb

Fix for #19376 - Use absolute import instead of relative import for now. - Fix some typo

[ROCm] Fix CI pipeline by fixing pytest version (#19407)

0cba56e

Fix pytest version to 7.4.4, higher version will cause error `from onnxruntime.capi import onnxruntime_validation ModuleNotFoundError: No module named 'onnxruntime.capi'`

fix output shape inference packed gqa (#19374)

d2d9b5b

### Description fix output shape inference packed gqa

HectorSVC and others added 22 commits March 13, 2024 08:35

Enable float32 model with FP16 precision for QNN HTP backend (#19863)

60ad6c6

### Description Enable float32 model with FP16 precision for QNN HTP backend

Bump ruff to 0.3.2 and black to 24 (#19878)

faea42a

### Motivation and Context Routing updates

Fix Android CI pipeline (#19877)

ed306b4

[JS/WebGPU] Optimize MatMulNBits (#19852)

ed250b8

### Description Use vec<2> or vec<4>, operands in MatMulNBits ### Motivation and Context Improve performance

reuse T4 on Linux GPU (#19879)

d5d9dbd

### Description ### Motivation and Context Linux GPU test on A10 isn't very stable

fix gqa rotary dim 1 (#19874)

8eb49c5

### Description GQA Rotary Dimension 1 incorrectly assumed to be based on head size. ### Motivation and Context This change should enable us to run phi-2 with GQA and Rotary Embedding fused.

Set seed for DynamicQuantizeMatMul tests (#19896)

9f08f8d

Seed for DynamicQuantizeMatMul tests to avoid pipeline failures with marginal mismatches.

Add support for LRN NHWC OPs (#19866)

f42e6ad

Support LRN NHWC in the CUDA EP. ### Motivation and Context Add support for all NHWC OPs to avoid NHWC/NCHW Layout transformation

[CP] Fix for xfgcheck and Fix WAI ARM64 build (#19634) (#19644)

18ad858

### Description Fix WAI build by only conditionally copying linker flags ### Motivation and Context I broke the WAI build that contains ORT on ARM64

Add a build patch for Windows ARM64EC (#19898)

1fb6cbd

### Description Add a patch for Windows ARM64EC ### Motivation and Context Will need more changes in onnxruntime/core/common/cpuid_arch_definition.h and onnxruntime/core/common/cpuid_info.cc

Merge branch 'main' of https://github.com/microsoft/onnxruntime into …

d190b42

…WAIold

raoanag changed the base branch from main to WindowsAI-Old March 14, 2024 16:40

raoanag requested a review from sumitsays March 15, 2024 00:53

raoanag marked this pull request as ready for review March 15, 2024 00:53

raoanag requested review from a team as code owners March 15, 2024 00:53

raoanag closed this Mar 15, 2024

raoanag deleted the WAIold branch March 15, 2024 00:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge main #19918

Merge main #19918

raoanag commented Mar 14, 2024

Merge main #19918

Merge main #19918

Conversation

raoanag commented Mar 14, 2024

Description

Motivation and Context