[pull] master from tensorflow:master #238

pull · 2024-11-15T22:59:38Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

Imported from GitHub PR openxla/xla#19426 After this change to the test inputs openxla/xla@b10653f "too many blocks" exception is not getting triggered anymore (shape is not big enough). Due to the low importance of the test, it was decided to disable it. Copybara import of the project: -- ee36ca0f0c0dddb8db724327af78f0b59de3903b by Milica Makevic <[email protected]>: Disable gpu_too_many_blocks_test for rocm Merging this change closes #19426 PiperOrigin-RevId: 697974812

PiperOrigin-RevId: 697976864

PiperOrigin-RevId: 697988497

Change existing pytorch composites to unify the upsample-bilinear composites from JAX and PyTorch. PiperOrigin-RevId: 698012187

Updates LLVM usage to match [b03a747fc0fc](llvm/llvm-project@b03a747fc0fc) PiperOrigin-RevId: 698047676

For Linux platforms the build rule generates the `auditwheel show` log in the output regardless of compliance check flag. PiperOrigin-RevId: 698048983

PiperOrigin-RevId: 698049836

…e ALL broadcast-like inputs on TFLite ops that support implicit broadcasting PiperOrigin-RevId: 698054216

* Change default QNN graph config to use HTP FP16 precision backend config, this is required to correctly compile FP32 OPs. * Create 1-element 1D tensor out of scalar value, QNN OP always use ranked tensor type as input. PiperOrigin-RevId: 698081261

… the .td definition. PiperOrigin-RevId: 698082807

* Add FC Op legalization and test data. * Add Select/Select_v2 Op legalization. * Mics cleanups. PiperOrigin-RevId: 698094953

PiperOrigin-RevId: 698111562

…rializing any modules. Also pulled the deserialization a little further up the stack and only do it if the input doesn't already have a full module op. PiperOrigin-RevId: 698116466

PiperOrigin-RevId: 698121993

PiperOrigin-RevId: 698132925

PiperOrigin-RevId: 698133024

… Todo (resolved) PiperOrigin-RevId: 698133747

This is to fix issue with gsutil which expects Python 3.5-3.11: ``` Error: gsutil requires Python version 2.7 or 3.5-3.11, but a different version is installed. ``` PiperOrigin-RevId: 698134102

PiperOrigin-RevId: 698137647

PiperOrigin-RevId: 698150097

PiperOrigin-RevId: 698163185

…. This Extend() call would also lead to a memory assignment issue since it wasn't accompanied by the necessary chunk commit requests. We also add a VerifyAllocations() function that uses a BufferIntervalTree to check for overlapping Allocations before scheduling the asynchronous copies. This is an extra check for the correctness of MsaAlgorithm allocations, and is only applied if options_.verify is enabled in MSA options. options_.verify is disabled by default. PiperOrigin-RevId: 698164396

PiperOrigin-RevId: 698164750

PiperOrigin-RevId: 698164921

This change adds the legalization pass from IFRT to VIFRT. Legalization uses a templated OpConversion class, which is refined via the `IFRT` <-> `VIFRT` and `mlir::Func::*` <-> `VIFRT` op mappings defined in `map_ifrt_to_vifrt.h` The change versions also `mlir::func::FuncOp`, `mlir::func::ReturnOp` and `mlir::func::CallOp` because this provides the following advantages: 1) we can use the templated OpConversion class rather than implementing a separate converter for each op, and 2) we can restrict the surface of possible breaking changes to just builtin types and attributes. Moreover, the change versions `mlir::FunctionType` and `mlir::TypeAttr` in order to be able to use the generic Op converter, and to restrict types allowed in functions (just builtin and IFRT types). PiperOrigin-RevId: 698168526

Also fixed invalid C++ header usage. PiperOrigin-RevId: 698170878

…td definition. PiperOrigin-RevId: 698171237

PiperOrigin-RevId: 698174417

PiperOrigin-RevId: 698189797

Next step is to migrate NcclComm and NcclOwnedComm to std::unique_ptr<Communicator> and proper virtual inheritance. PiperOrigin-RevId: 699233544

…filer_test This was originally proposed in openxla/xla#16102, but I still ran into issue where it failed by slight margin: ``` Expected: (profiler.MeasureClockCyclesPerOp(HloOpcode::kDivide, F64) .value() .clock_cycles()) > (300), actual: 296 vs 300 ``` That said, I ran 1000 tests and did not encounter this issue. Reducing the threshold to 280 since the bound seems very close and flaky test is no good either way. PiperOrigin-RevId: 699233864

… for XLA:CPU PiperOrigin-RevId: 699234540

PiperOrigin-RevId: 699235057

Also fixes a few missing includes. Uses C++ includes instead or C ones. PiperOrigin-RevId: 699237969

PiperOrigin-RevId: 699238045

Define APIs for compiling LLVM modules to functions required by the XLA:CPU runtime: kernels, comparators, etc. Implementation largely exists as SimpleOrcJit in service/cpu, but it's tightly coupled with "legacy" XLA. PiperOrigin-RevId: 699239722

PiperOrigin-RevId: 699242142

PiperOrigin-RevId: 699247019

* De-dupe logic in test common and model_buffer. * Factor out the flatbuffer model wrapper from the class in test common and move to flatbuffer_tools. * Add some extra helpers for flatbuffers in flatbuffer_tools, and add test. * Hide all the usage of `std::filesystem` stuff in one cc. Technically `<filesystem>` is an unapproved header. * Update model_load to use the flatbuffer tools. * Pull some of the member functions of "model unpacker" out into non-member functions. PiperOrigin-RevId: 699249089

…utions Performance is comparable to the synchronous version. Detailed results (where 'old' is the synchronous execution, 'new' is async execution; both use the same, custom algorithm for transposed conv): name old cpu/op new cpu/op delta BM_Conv1DStrided/process_time 29.4ms ± 6% 29.7ms ± 5% ~ (p=0.841 n=5+5) BM_Conv1DTransposedStrided/process_time 29.6ms ± 2% 30.7ms ± 2% +3.52% (p=0.008 n=5+5) BM_Conv1DTransposedStridedNonDefaultLayout/process_time 28.5ms ± 3% 28.3ms ± 1% ~ (p=0.222 n=5+5) name old time/op new time/op delta BM_Conv1DStrided/process_time 2.68ms ± 7% 2.72ms ± 5% ~ (p=0.548 n=5+5) BM_Conv1DTransposedStrided/process_time 7.91ms ± 3% 7.98ms ± 5% ~ (p=0.548 n=5+5) BM_Conv1DTransposedStridedNonDefaultLayout/process_time 7.00ms ± 2% 7.32ms ± 4% +4.58% (p=0.016 n=5+5) PiperOrigin-RevId: 699250549

Updates LLVM usage to match [556ea5265a25](llvm/llvm-project@556ea5265a25) PiperOrigin-RevId: 699251575

NCCL implementation detail will have private visibility, and for all external users (Thunks etc.) we'll export it via public header that uses xla/core/collectives APIs. PiperOrigin-RevId: 699256314

…ator, encoded as follows ``` _TENSOR_V1_<name>: { TENSOR_SHAPE: Vector<i64>, TENSOR_TYPE: tflite::TensorType (casted to i64), TENSOR_DATA: Vector<f32> or Vector<i64> } ``` PiperOrigin-RevId: 699272982

The pass runs over a VIFRT module, and tries to convert it to a given target version. PiperOrigin-RevId: 699279298

PiperOrigin-RevId: 699279921

PiperOrigin-RevId: 699286343

PiperOrigin-RevId: 699309235

…annot be null PiperOrigin-RevId: 699310290

…ying the naming. PiperOrigin-RevId: 699317601

StreamExecutorGpuClient topology description as well. PiperOrigin-RevId: 699320139

nullptr is handled here. PiperOrigin-RevId: 699323007

Also: * Add some helper functions for checking a litert op matches a tfl op which can can also be re-used in other contexts. * Add some quantization related helper functions to flatbuffer_tools * Update dump for quantization * Move thins around a bit and add quantization stuff to model_util support checks PiperOrigin-RevId: 699333588

PiperOrigin-RevId: 699337598

StreamExecutorGpuTopologyDescription rather than parsing it for every compile. PiperOrigin-RevId: 699344815

… with array output and multiple users. It may trigger compilation error, such as the added test target. PiperOrigin-RevId: 699357851

PiperOrigin-RevId: 699361885

PiperOrigin-RevId: 699397857

PiperOrigin-RevId: 699409569

PiperOrigin-RevId: 699467519

pull bot added the ⤵️ pull label Nov 16, 2024

mmakevic-amd and others added 29 commits November 19, 2024 05:29

Reverts 67267fb

d72c8f9

PiperOrigin-RevId: 697976864

[XLA:GPU] Allow auto layout in multihost HLO runner.

46a2674

PiperOrigin-RevId: 697988497

Fix the attribute name for odml.upsample_bilinear2d composite op.

044fb40

Change existing pytorch composites to unify the upsample-bilinear composites from JAX and PyTorch. PiperOrigin-RevId: 698012187

Integrate LLVM at llvm/llvm-project@b03a747fc0fc

8214168

Updates LLVM usage to match [b03a747fc0fc](llvm/llvm-project@b03a747fc0fc) PiperOrigin-RevId: 698047676

Refactor TF wheel build rule, common python rules and flag names.

2b6976d

For Linux platforms the build rule generates the `auditwheel show` log in the output regardless of compliance check flag. PiperOrigin-RevId: 698048983

Replace MockGpuExecutor with MockStreamExecutor in the only use.

9aba913

PiperOrigin-RevId: 698049836

Refactor and repurpose the existing fold_broadcast_to_pass to handl…

9b76733

…e ALL broadcast-like inputs on TFLite ops that support implicit broadcasting PiperOrigin-RevId: 698054216

Migrate LegalizeTensorListPass to new TFL::Pass mechanism and. remove…

f0b88da

… the .td definition. PiperOrigin-RevId: 698082807

Add two simple legalizations and cleanup.

9147aa3

* Add FC Op legalization and test data. * Add Select/Select_v2 Op legalization. * Mics cleanups. PiperOrigin-RevId: 698094953

Internal change only

8916a40

PiperOrigin-RevId: 698111562

Add a moduleop to the MlirToHloArgs and enable compilation without se…

aa88ff7

…rializing any modules. Also pulled the deserialization a little further up the stack and only do it if the input doesn't already have a full module op. PiperOrigin-RevId: 698116466

Move stable hlo compile test to XLA:CPU public API

7bd28c8

PiperOrigin-RevId: 698121993

Add test cases for QC compiler plugin.

ac16d7f

PiperOrigin-RevId: 698132925

Remove unneeded xla:statusor dependency.

0fd96fa

PiperOrigin-RevId: 698133024

[Upkeep][XLA-Code-Health] Resolve 2 instances of the following issue:…

509b91c

… Todo (resolved) PiperOrigin-RevId: 698133747

Update the default Python version to 3.11

622ea93

This is to fix issue with gsutil which expects Python 3.5-3.11: ``` Error: gsutil requires Python version 2.7 or 3.5-3.11, but a different version is installed. ``` PiperOrigin-RevId: 698134102

Fold FillOp into TFL Ops that support implicit broadcasting.

d040ef2

PiperOrigin-RevId: 698137647

Add a few more mlir based test model

09a9566

PiperOrigin-RevId: 698150097

Add backend kwargs to xla tests.

feeb338

PiperOrigin-RevId: 698163185

Add quantized OP in test data.

41d42d8

PiperOrigin-RevId: 698164750

Move some tests to public XLA:CPU API

9df12f0

PiperOrigin-RevId: 698164921

Add a test to check C header compiler compatibility

af5962f

Also fixed invalid C++ header usage. PiperOrigin-RevId: 698170878

Migrate WhileOutlinePass to new TFL::Pass mechanism and. remove the .…

ac8fea5

…td definition. PiperOrigin-RevId: 698171237

Remove unneeded use of gpu_types.h in topk_kernel_test.cc.

dcf6c7a

PiperOrigin-RevId: 698174417

Reverts feeb338

8ceaba2

PiperOrigin-RevId: 698189797

ezhulenev and others added 30 commits November 22, 2024 11:30

[xla:collectives] Initial xla/core/collectives component commit

224d07c

Next step is to migrate NcclComm and NcclOwnedComm to std::unique_ptr<Communicator> and proper virtual inheritance. PiperOrigin-RevId: 699233544

[xla:cpu] Add a KernelRunner API to codegen testlib and sketch a test…

7c35801

… for XLA:CPU PiperOrigin-RevId: 699234540

Lower the max bytes threshold used by the proto splitter

cba80fe

PiperOrigin-RevId: 699235057

Fix test subgraph creation for StableHLO composite nodes.

8310815

Also fixes a few missing includes. Uses C++ includes instead or C ones. PiperOrigin-RevId: 699237969

[tflite-gpu] Add REDUCE_ALL && REDUCE_ANY to gpu_compatibility

bde84b7

PiperOrigin-RevId: 699238045

Update XNNPACK version

2f87d0f

PiperOrigin-RevId: 699242142

[XLA:GPU] remove channel ID checks in hlo_instructions.cc

72dd3b2

PiperOrigin-RevId: 699247019

Integrate LLVM at llvm/llvm-project@556ea5265a25

f888384

Updates LLVM usage to match [556ea5265a25](llvm/llvm-project@556ea5265a25) PiperOrigin-RevId: 699251575

[xla:collectives] Add backends/gpu/collectives:nccl_communicator

eaf0194

NCCL implementation detail will have private visibility, and for all external users (Thunks etc.) we'll export it via public header that uses xla/core/collectives APIs. PiperOrigin-RevId: 699256314

Add support for TensorV1Attr in flatbuffer_export and flatbuffer_oper…

807e6fd

…ator, encoded as follows ``` _TENSOR_V1_<name>: { TENSOR_SHAPE: Vector<i64>, TENSOR_TYPE: tflite::TensorType (casted to i64), TENSOR_DATA: Vector<f32> or Vector<i64> } ``` PiperOrigin-RevId: 699272982

[IFRT] Add VIFRT pass for converting between VIFRT versions.

65f6a91

The pass runs over a VIFRT module, and tries to convert it to a given target version. PiperOrigin-RevId: 699279298

[xla:collectives] Use NcclCommunicator in NcclApi implementation

9e800e8

PiperOrigin-RevId: 699279921

[xla:collectives] Remove unused CommDestroy

d1ebdd9

PiperOrigin-RevId: 699286343

[Code-Health] Resolve the following technical debt issue: Todo(resolved)

d479650

PiperOrigin-RevId: 699309235

Use absl::Nonnull to indicate that sharding in xla::ifrt::ArraySpec c…

45abed3

…annot be null PiperOrigin-RevId: 699310290

Add all the quantized models to test_models constants and try to unif…

b9f2ce2

…ying the naming. PiperOrigin-RevId: 699317601

Update target_config to be a text proto and populate it on the

c9d9178

StreamExecutorGpuClient topology description as well. PiperOrigin-RevId: 699320139

Remove absl::Nonnull from AbslStringify

00a86aa

nullptr is handled here. PiperOrigin-RevId: 699323007

[xla:collectives] NFC: Remove communicator aliases from NcclApi

9f1c1aa

PiperOrigin-RevId: 699337598

Add target_config as an optional field of

2358eff

StreamExecutorGpuTopologyDescription rather than parsing it for every compile. PiperOrigin-RevId: 699344815

The MoveUserInstructionsIn cannot handle the conditional operations…

f06a547

… with array output and multiple users. It may trigger compilation error, such as the added test target. PiperOrigin-RevId: 699357851

internal visibility change

6a6dd7c

PiperOrigin-RevId: 699361885

Move BatchedGatherScatterNormalizer from pre-SPMD for pose-SPMD.

c3fd63e

PiperOrigin-RevId: 699397857

Pull the zip functions into a public header

d13a02a

PiperOrigin-RevId: 699409569

Automated Code Change

5ca90b0

PiperOrigin-RevId: 699467519

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from tensorflow:master #238

[pull] master from tensorflow:master #238

pull bot commented Nov 15, 2024 •

edited

Loading

[pull] master from tensorflow:master #238

Are you sure you want to change the base?

[pull] master from tensorflow:master #238

Conversation

pull bot commented Nov 15, 2024 • edited Loading

pull bot commented Nov 15, 2024 •

edited

Loading