Skip to content

Commit

Permalink
Cherry-picks for 1.16.1 release (#17741)
Browse files Browse the repository at this point in the history
Cherry-pick the following PRs to the release branch:

Fix: Fail to skip disabledmodel in winml (#17728) 
Move dotnet build and test into docker in Linux CPU CI (#17417) 
Run Nuget_Test_Linux_GPU in container (#17452) 
Run Final_Jar_Testing_Linux_GPU in docker (#17533) 
TreeEnsemble speed up (#17449) 
Remove onnxruntime extensions from list of gitmodules (#17615) 
Include onnxruntime_float16.h in the package. (#17637) 
Fix static quantization for QDQ and Percentile distribution (#17649) 
[TensorRT EP] Back out the PerThreadContext (#17690) 
Update nodejs to 18.x (#17657) 
Update linux-wasm-ci.yml: remove the ln command (#17735)
  • Loading branch information
snnn authored Oct 2, 2023
1 parent e7a0495 commit 264a740
Show file tree
Hide file tree
Showing 79 changed files with 14,781 additions and 635 deletions.
3 changes: 0 additions & 3 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,3 @@
path = cmake/external/emsdk
url = https://github.com/emscripten-core/emsdk.git
branch = 3.1.44
[submodule "cmake/external/onnxruntime-extensions"]
path = cmake/external/onnxruntime-extensions
url = https://github.com/microsoft/onnxruntime-extensions.git
2 changes: 1 addition & 1 deletion VERSION_NUMBER
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.16.0
1.16.1
5 changes: 5 additions & 0 deletions docs/python/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,11 @@ For more information on ONNX Runtime, please see `aka.ms/onnxruntime <https://ak
Changes
-------

1.16.1
^^^^^^

Release Notes : https://github.com/Microsoft/onnxruntime/releases/tag/v1.16.1

1.16.0
^^^^^^

Expand Down
2 changes: 1 addition & 1 deletion js/common/lib/version.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@
// This file is generated by /js/scripts/update-version.ts
// Do not modify file content manually.

export const version = '1.16.0';
export const version = '1.16.1';
4 changes: 2 additions & 2 deletions js/common/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion js/common/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"license": "MIT",
"type": "module",
"name": "onnxruntime-common",
"version": "1.16.0",
"version": "1.16.1",
"repository": {
"url": "https://github.com/Microsoft/onnxruntime.git",
"type": "git"
Expand Down
2 changes: 1 addition & 1 deletion js/node/lib/version.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@
// This file is generated by /js/scripts/update-version.ts
// Do not modify file content manually.

export const version = '1.16.0';
export const version = '1.16.1';
6 changes: 3 additions & 3 deletions js/node/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion js/node/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
3
]
},
"version": "1.16.0",
"version": "1.16.1",
"dependencies": {
"onnxruntime-common": "file:../common"
},
Expand Down
4 changes: 3 additions & 1 deletion js/react_native/lib/backend.ts
Original file line number Diff line number Diff line change
Expand Up @@ -66,12 +66,14 @@ class OnnxruntimeSessionHandler implements SessionHandler {
let results: Binding.ModelLoadInfoType;
// load a model
if (typeof this.#pathOrBuffer === 'string') {
// load model from model path
results = await this.#inferenceSession.loadModel(normalizePath(this.#pathOrBuffer), options);
} else {
// load model from buffer
if (!this.#inferenceSession.loadModelFromBlob) {
throw new Error('Native module method "loadModelFromBlob" is not defined');
}
const modelBlob = jsiHelper.storeArrayBuffer(this.#pathOrBuffer);
const modelBlob = jsiHelper.storeArrayBuffer(this.#pathOrBuffer.buffer);
results = await this.#inferenceSession.loadModelFromBlob(modelBlob, options);
}
// resolve promise if onnxruntime session is successfully created
Expand Down
2 changes: 1 addition & 1 deletion js/react_native/lib/version.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@
// This file is generated by /js/scripts/update-version.ts
// Do not modify file content manually.

export const version = '1.16.0';
export const version = '1.16.1';
2 changes: 1 addition & 1 deletion js/react_native/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
"registry": "https://registry.npmjs.org/"
},
"source": "lib/index",
"version": "1.16.0",
"version": "1.16.1",
"main": "dist/commonjs/index",
"homepage": "https://github.com/microsoft/onnxruntime/blob/main/js/react_native/README.md",
"files": [
Expand Down
2 changes: 1 addition & 1 deletion js/react_native/yarn.lock
Original file line number Diff line number Diff line change
Expand Up @@ -5188,7 +5188,7 @@ onetime@^5.1.0, onetime@^5.1.2:
mimic-fn "^2.1.0"

"onnxruntime-common@file:../common":
version "1.16.0"
version "1.16.1"

open@^6.2.0:
version "6.4.0"
Expand Down
2 changes: 1 addition & 1 deletion js/web/lib/version.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@
// This file is generated by /js/scripts/update-version.ts
// Do not modify file content manually.

export const version = '1.16.0';
export const version = '1.16.1';
6 changes: 3 additions & 3 deletions js/web/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion js/web/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
"type": "git"
},
"author": "fs-eire",
"version": "1.16.0",
"version": "1.16.1",
"jsdelivr": "dist/ort.min.js",
"dependencies": {
"flatbuffers": "^1.12.0",
Expand Down
2 changes: 1 addition & 1 deletion onnxruntime/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
For more information on ONNX Runtime, please see `aka.ms/onnxruntime <https://aka.ms/onnxruntime/>`_
or the `Github project <https://github.com/microsoft/onnxruntime/>`_.
"""
__version__ = "1.16.0"
__version__ = "1.16.1"
__author__ = "Microsoft"

# we need to do device version validation (for example to check Cuda version for an onnxruntime-training package).
Expand Down
42 changes: 22 additions & 20 deletions onnxruntime/contrib_ops/cuda/bert/attention.cc
Original file line number Diff line number Diff line change
Expand Up @@ -140,27 +140,29 @@ Status Attention<T>::ComputeInternal(OpKernelContext* context) const {
#endif

if (!use_flash_attention) {
if (is_unidirectional_ && enable_fused_causal_attention_) { // GPT
// GPT fused kernels requires left side padding. mask can be:
// none (no padding), 1D sequence lengths or 2d mask.
// Fused kernels don't support different sequence lengths of q and kv, so only apply to the first token
// where past state is empty.
bool is_mask_2d_key_padding = parameters.mask_type == AttentionMaskType::MASK_2D_KEY_PADDING;
bool use_causal_fused_runner = (nullptr == mask_index || is_mask_1d_seq_len || is_mask_2d_key_padding) &&
nullptr == relative_position_bias &&
parameters.past_sequence_length == 0 &&
parameters.hidden_size == parameters.v_hidden_size &&
FusedMHARunnerFP16v2::is_supported(sm, parameters.head_size, sequence_length,
enable_trt_flash_attention_, true);
if (use_causal_fused_runner) {
// Here we assume that num_heads, head_size and is_unidirectional does not change for an Attention node.
if (nullptr == fused_fp16_runner_.get()) {
fused_fp16_runner_ = FusedMHARunnerFP16v2::Create(num_heads_, parameters.head_size, sm, is_unidirectional_,
enable_trt_flash_attention_, parameters.scale);
if (is_unidirectional_) { // GPT
if (enable_fused_causal_attention_) {
// GPT fused kernels requires left side padding. mask can be:
// none (no padding), 1D sequence lengths or 2d mask.
// Fused kernels don't support different sequence lengths of q and kv, so only apply to the first token
// where past state is empty.
bool is_mask_2d_key_padding = parameters.mask_type == AttentionMaskType::MASK_2D_KEY_PADDING;
bool use_causal_fused_runner = (nullptr == mask_index || is_mask_1d_seq_len || is_mask_2d_key_padding) &&
nullptr == relative_position_bias &&
parameters.past_sequence_length == 0 &&
parameters.hidden_size == parameters.v_hidden_size &&
FusedMHARunnerFP16v2::is_supported(sm, parameters.head_size, sequence_length,
enable_trt_flash_attention_, true);
if (use_causal_fused_runner) {
// Here we assume that num_heads, head_size and is_unidirectional does not change for an Attention node.
if (nullptr == fused_fp16_runner_.get()) {
fused_fp16_runner_ = FusedMHARunnerFP16v2::Create(num_heads_, parameters.head_size, sm, is_unidirectional_,
enable_trt_flash_attention_, parameters.scale);
}

// Here we assume all causal kernels can be loaded into shared memory. TODO: add a function to check.
fused_runner = fused_fp16_runner_.get();
}

// Here we assume all causal kernels can be loaded into shared memory. TODO: add a function to check.
fused_runner = fused_fp16_runner_.get();
}
} else { // BERT
bool use_fused_runner = !disable_fused_self_attention_ &&
Expand Down
49 changes: 28 additions & 21 deletions onnxruntime/core/providers/cpu/ml/tree_ensemble_aggregator.h
Original file line number Diff line number Diff line change
Expand Up @@ -64,31 +64,38 @@ enum MissingTrack : uint8_t {
kFalse = 0
};

template <typename T>
struct TreeNodeElement;

template <typename T>
union PtrOrWeight {
TreeNodeElement<T>* ptr;
struct WeightData {
int32_t weight;
int32_t n_weights;
} weight_data;
};

template <typename T>
struct TreeNodeElement {
int feature_id;

// Stores the node threshold or the weights if the tree has one target.
T value_or_unique_weight;

// onnx specification says hitrates is used to store information about the node,
// The onnx specification says hitrates is used to store information about the node,
// but this information is not used for inference.
// T hitrates;

// True node, false node are obtained by computing `this + truenode_inc_or_first_weight`,
// `this + falsenode_inc_or_n_weights` if the node is not a leaf.
// In case of a leaf, these attributes are used to indicate the position of the weight
// in array `TreeEnsembleCommon::weights_`. If the number of targets or classes is one,
// the weight is also stored in `value_or_unique_weight`.
// This implementation assumes a tree has less than 2^31 nodes,
// and the total number of leave in the set of trees is below 2^31.
// A node cannot point to itself.
int32_t truenode_inc_or_first_weight;
// In case of a leaf, the following attribute indicates the number of weights
// in array `TreeEnsembleCommon::weights_`. If not a leaf, it indicates
// `this + falsenode_inc_or_n_weights` is the false node.
// A node cannot point to itself.
int32_t falsenode_inc_or_n_weights;
// PtrOrWeight acts as a tagged union, with the "tag" being whether the node is a leaf or not (see `is_not_leaf`).

// If it is not a leaf, it is a pointer to the true child node when traversing the decision tree. The false branch is
// always 1 position away from the TreeNodeElement in practice in `TreeEnsembleCommon::nodes_` so it is not stored.

// If it is a leaf, it contains `weight` and `n_weights` attributes which are used to indicate the position of the
// weight in array `TreeEnsembleCommon::weights_`. If the number of targets or classes is one, the weight is also
// stored in `value_or_unique_weight`.
PtrOrWeight<T> truenode_or_weight;
uint8_t flags;

inline NODE_MODE mode() const { return NODE_MODE(flags & 0xF); }
Expand Down Expand Up @@ -189,8 +196,8 @@ class TreeAggregatorSum : public TreeAggregator<InputType, ThresholdType, Output
void ProcessTreeNodePrediction(InlinedVector<ScoreValue<ThresholdType>>& predictions,
const TreeNodeElement<ThresholdType>& root,
gsl::span<const SparseValue<ThresholdType>> weights) const {
auto it = weights.begin() + root.truenode_inc_or_first_weight;
for (int32_t i = 0; i < root.falsenode_inc_or_n_weights; ++i, ++it) {
auto it = weights.begin() + root.truenode_or_weight.weight_data.weight;
for (int32_t i = 0; i < root.truenode_or_weight.weight_data.n_weights; ++i, ++it) {
ORT_ENFORCE(it->i < (int64_t)predictions.size());
predictions[onnxruntime::narrow<size_t>(it->i)].score += it->value;
predictions[onnxruntime::narrow<size_t>(it->i)].has_score = 1;
Expand Down Expand Up @@ -292,8 +299,8 @@ class TreeAggregatorMin : public TreeAggregator<InputType, ThresholdType, Output
void ProcessTreeNodePrediction(InlinedVector<ScoreValue<ThresholdType>>& predictions,
const TreeNodeElement<ThresholdType>& root,
gsl::span<const SparseValue<ThresholdType>> weights) const {
auto it = weights.begin() + root.truenode_inc_or_first_weight;
for (int32_t i = 0; i < root.falsenode_inc_or_n_weights; ++i, ++it) {
auto it = weights.begin() + root.truenode_or_weight.weight_data.weight;
for (int32_t i = 0; i < root.truenode_or_weight.weight_data.n_weights; ++i, ++it) {
predictions[onnxruntime::narrow<size_t>(it->i)].score =
(!predictions[onnxruntime::narrow<size_t>(it->i)].has_score || it->value < predictions[onnxruntime::narrow<size_t>(it->i)].score)
? it->value
Expand Down Expand Up @@ -349,8 +356,8 @@ class TreeAggregatorMax : public TreeAggregator<InputType, ThresholdType, Output
void ProcessTreeNodePrediction(InlinedVector<ScoreValue<ThresholdType>>& predictions,
const TreeNodeElement<ThresholdType>& root,
gsl::span<const SparseValue<ThresholdType>> weights) const {
auto it = weights.begin() + root.truenode_inc_or_first_weight;
for (int32_t i = 0; i < root.falsenode_inc_or_n_weights; ++i, ++it) {
auto it = weights.begin() + root.truenode_or_weight.weight_data.weight;
for (int32_t i = 0; i < root.truenode_or_weight.weight_data.n_weights; ++i, ++it) {
predictions[onnxruntime::narrow<size_t>(it->i)].score =
(!predictions[onnxruntime::narrow<size_t>(it->i)].has_score || it->value > predictions[onnxruntime::narrow<size_t>(it->i)].score)
? it->value
Expand Down
Loading

0 comments on commit 264a740

Please sign in to comment.