-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pull] master from tensorflow:master #238
base: master
Are you sure you want to change the base?
Commits on Nov 19, 2024
-
Replace MockGpuExecutor with MockStreamExecutor in the only use.
PiperOrigin-RevId: 698049836
Configuration menu - View commit details
-
Copy full SHA for 9aba913 - Browse repository at this point
Copy the full SHA 9aba913View commit details -
Refactor and repurpose the existing
fold_broadcast_to_pass
to handl……e ALL broadcast-like inputs on TFLite ops that support implicit broadcasting PiperOrigin-RevId: 698054216
Configuration menu - View commit details
-
Copy full SHA for 9b76733 - Browse repository at this point
Copy the full SHA 9b76733View commit details -
Improve QC compiler plugin configurations.
* Change default QNN graph config to use HTP FP16 precision backend config, this is required to correctly compile FP32 OPs. * Create 1-element 1D tensor out of scalar value, QNN OP always use ranked tensor type as input. PiperOrigin-RevId: 698081261
Configuration menu - View commit details
-
Copy full SHA for 8a87e42 - Browse repository at this point
Copy the full SHA 8a87e42View commit details -
Migrate LegalizeTensorListPass to new TFL::Pass mechanism and. remove…
… the .td definition. PiperOrigin-RevId: 698082807
Configuration menu - View commit details
-
Copy full SHA for f0b88da - Browse repository at this point
Copy the full SHA f0b88daView commit details -
Add two simple legalizations and cleanup.
* Add FC Op legalization and test data. * Add Select/Select_v2 Op legalization. * Mics cleanups. PiperOrigin-RevId: 698094953
Configuration menu - View commit details
-
Copy full SHA for 9147aa3 - Browse repository at this point
Copy the full SHA 9147aa3View commit details -
PiperOrigin-RevId: 698111562
Configuration menu - View commit details
-
Copy full SHA for 8916a40 - Browse repository at this point
Copy the full SHA 8916a40View commit details -
Add a moduleop to the MlirToHloArgs and enable compilation without se…
…rializing any modules. Also pulled the deserialization a little further up the stack and only do it if the input doesn't already have a full module op. PiperOrigin-RevId: 698116466
Configuration menu - View commit details
-
Copy full SHA for aa88ff7 - Browse repository at this point
Copy the full SHA aa88ff7View commit details -
Move stable hlo compile test to XLA:CPU public API
PiperOrigin-RevId: 698121993
Configuration menu - View commit details
-
Copy full SHA for 7bd28c8 - Browse repository at this point
Copy the full SHA 7bd28c8View commit details -
Add test cases for QC compiler plugin.
PiperOrigin-RevId: 698132925
Configuration menu - View commit details
-
Copy full SHA for ac16d7f - Browse repository at this point
Copy the full SHA ac16d7fView commit details -
Remove unneeded xla:statusor dependency.
PiperOrigin-RevId: 698133024
Configuration menu - View commit details
-
Copy full SHA for 0fd96fa - Browse repository at this point
Copy the full SHA 0fd96faView commit details -
[Upkeep][XLA-Code-Health] Resolve 2 instances of the following issue:…
… Todo (resolved) PiperOrigin-RevId: 698133747
Configuration menu - View commit details
-
Copy full SHA for 509b91c - Browse repository at this point
Copy the full SHA 509b91cView commit details -
Update the default Python version to 3.11
This is to fix issue with gsutil which expects Python 3.5-3.11: ``` Error: gsutil requires Python version 2.7 or 3.5-3.11, but a different version is installed. ``` PiperOrigin-RevId: 698134102
Configuration menu - View commit details
-
Copy full SHA for 622ea93 - Browse repository at this point
Copy the full SHA 622ea93View commit details -
Fold FillOp into TFL Ops that support implicit broadcasting.
PiperOrigin-RevId: 698137647
Configuration menu - View commit details
-
Copy full SHA for d040ef2 - Browse repository at this point
Copy the full SHA d040ef2View commit details -
Add a few more mlir based test model
PiperOrigin-RevId: 698150097
Configuration menu - View commit details
-
Copy full SHA for 09a9566 - Browse repository at this point
Copy the full SHA 09a9566View commit details -
Add backend kwargs to xla tests.
PiperOrigin-RevId: 698163185
Configuration menu - View commit details
-
Copy full SHA for feeb338 - Browse repository at this point
Copy the full SHA feeb338View commit details -
[XLA:MSA] Remove unnecessary Extend() call in memory space assignment…
…. This Extend() call would also lead to a memory assignment issue since it wasn't accompanied by the necessary chunk commit requests. We also add a VerifyAllocations() function that uses a BufferIntervalTree to check for overlapping Allocations before scheduling the asynchronous copies. This is an extra check for the correctness of MsaAlgorithm allocations, and is only applied if options_.verify is enabled in MSA options. options_.verify is disabled by default. PiperOrigin-RevId: 698164396
Configuration menu - View commit details
-
Copy full SHA for 37fa2bb - Browse repository at this point
Copy the full SHA 37fa2bbView commit details -
Add quantized OP in test data.
PiperOrigin-RevId: 698164750
Configuration menu - View commit details
-
Copy full SHA for 41d42d8 - Browse repository at this point
Copy the full SHA 41d42d8View commit details -
Move some tests to public XLA:CPU API
PiperOrigin-RevId: 698164921
Configuration menu - View commit details
-
Copy full SHA for 9df12f0 - Browse repository at this point
Copy the full SHA 9df12f0View commit details -
[IFRT] Legalize IFRT dialect into VIFRT dialect.
This change adds the legalization pass from IFRT to VIFRT. Legalization uses a templated OpConversion class, which is refined via the `IFRT` <-> `VIFRT` and `mlir::Func::*` <-> `VIFRT` op mappings defined in `map_ifrt_to_vifrt.h` The change versions also `mlir::func::FuncOp`, `mlir::func::ReturnOp` and `mlir::func::CallOp` because this provides the following advantages: 1) we can use the templated OpConversion class rather than implementing a separate converter for each op, and 2) we can restrict the surface of possible breaking changes to just builtin types and attributes. Moreover, the change versions `mlir::FunctionType` and `mlir::TypeAttr` in order to be able to use the generic Op converter, and to restrict types allowed in functions (just builtin and IFRT types). PiperOrigin-RevId: 698168526
Configuration menu - View commit details
-
Copy full SHA for 4f86f56 - Browse repository at this point
Copy the full SHA 4f86f56View commit details
Commits on Nov 20, 2024
-
Add a test to check C header compiler compatibility
Also fixed invalid C++ header usage. PiperOrigin-RevId: 698170878
Configuration menu - View commit details
-
Copy full SHA for af5962f - Browse repository at this point
Copy the full SHA af5962fView commit details -
Migrate WhileOutlinePass to new TFL::Pass mechanism and. remove the .…
…td definition. PiperOrigin-RevId: 698171237
Configuration menu - View commit details
-
Copy full SHA for ac8fea5 - Browse repository at this point
Copy the full SHA ac8fea5View commit details -
Remove unneeded use of gpu_types.h in topk_kernel_test.cc.
PiperOrigin-RevId: 698174417
Configuration menu - View commit details
-
Copy full SHA for dcf6c7a - Browse repository at this point
Copy the full SHA dcf6c7aView commit details -
PiperOrigin-RevId: 698189797
Configuration menu - View commit details
-
Copy full SHA for 8ceaba2 - Browse repository at this point
Copy the full SHA 8ceaba2View commit details -
Remove dead ShapeContainsToken in HLO verifier
PiperOrigin-RevId: 698196106
Configuration menu - View commit details
-
Copy full SHA for caa1197 - Browse repository at this point
Copy the full SHA caa1197View commit details -
legalization_op_config: Delete unused IsOpLegalizedWithMlir.
PiperOrigin-RevId: 698201598
Configuration menu - View commit details
-
Copy full SHA for 6ec3612 - Browse repository at this point
Copy the full SHA 6ec3612View commit details -
Move StableHLO test to public XLA:CPU PJRT plugin
PiperOrigin-RevId: 698212499
Configuration menu - View commit details
-
Copy full SHA for 43227b2 - Browse repository at this point
Copy the full SHA 43227b2View commit details -
PiperOrigin-RevId: 698218778
Configuration menu - View commit details
-
Copy full SHA for 79dee4c - Browse repository at this point
Copy the full SHA 79dee4cView commit details -
Add a new 'priority_merge' mixed priority batching policy.
PiperOrigin-RevId: 698221629
Configuration menu - View commit details
-
Copy full SHA for c3ca149 - Browse repository at this point
Copy the full SHA c3ca149View commit details -
Stop using gpu_types.h where it's not needed.
PiperOrigin-RevId: 698228306
Configuration menu - View commit details
-
Copy full SHA for 5158891 - Browse repository at this point
Copy the full SHA 5158891View commit details -
[Upkeep][XLA-Code-Health] Resolve the following technical debt issue:…
… Todo(resolved) PiperOrigin-RevId: 698230798
Configuration menu - View commit details
-
Copy full SHA for fcbd379 - Browse repository at this point
Copy the full SHA fcbd379View commit details -
Cleanup. Refactor GetGatherScatterBatchParallelDims. No behavior change.
PiperOrigin-RevId: 698230884
Configuration menu - View commit details
-
Copy full SHA for f06fbf0 - Browse repository at this point
Copy the full SHA f06fbf0View commit details -
Update ops-related pbtxt files.
PiperOrigin-RevId: 698237370
Configuration menu - View commit details
-
Copy full SHA for d83bca1 - Browse repository at this point
Copy the full SHA d83bca1View commit details -
Migrate UnfoldLargeSplatConstantPass to new TFL::Pass mechanism and. …
…remove the .td definition. PiperOrigin-RevId: 698241447
Configuration menu - View commit details
-
Copy full SHA for a559db0 - Browse repository at this point
Copy the full SHA a559db0View commit details -
Move compiler plugin unique ptr alias to cc api. Also use string view…
… for bytes return from plugin in tests to avoid copy PiperOrigin-RevId: 698251740
Configuration menu - View commit details
-
Copy full SHA for 42fed64 - Browse repository at this point
Copy the full SHA 42fed64View commit details -
[XLA:SPMD] Add HLO annotation to disable collective matmul in SPMD.
PiperOrigin-RevId: 698271808
Configuration menu - View commit details
-
Copy full SHA for 3efd256 - Browse repository at this point
Copy the full SHA 3efd256View commit details -
Update GraphDef version to 2052.
PiperOrigin-RevId: 698294876
Configuration menu - View commit details
-
Copy full SHA for 0d82241 - Browse repository at this point
Copy the full SHA 0d82241View commit details -
compat: Update forward compatibility horizon to 2024-11-20
PiperOrigin-RevId: 698294898
Configuration menu - View commit details
-
Copy full SHA for e1a2b83 - Browse repository at this point
Copy the full SHA e1a2b83View commit details -
Add a pattern matcher for ragged dot HLO.
PiperOrigin-RevId: 698297679
Configuration menu - View commit details
-
Copy full SHA for 1de2011 - Browse repository at this point
Copy the full SHA 1de2011View commit details -
Remove custom PTX compilation pipeline from RedzoneAllocator
We have support for lowering PTX in the runtime, so we can just use `MultiKernelLoaderSpec` and we get compilation and caching for free. PiperOrigin-RevId: 698297929
Configuration menu - View commit details
-
Copy full SHA for 126b347 - Browse repository at this point
Copy the full SHA 126b347View commit details -
Account for optional channel ID in send/recv error message
PiperOrigin-RevId: 698302393
Configuration menu - View commit details
-
Copy full SHA for 9bd61d5 - Browse repository at this point
Copy the full SHA 9bd61d5View commit details -
Remove custom compilation call from DynamicSharedMemoryTest
This is not needed since the runtime can compile PTX for us. Actually I'm surprised that this even worked because this original code compiled PTX into CUBIN and then forced the CUBIN into the PTX argument in the kernel creation helper. But this is now all fixed. PiperOrigin-RevId: 698303129
Configuration menu - View commit details
-
Copy full SHA for b6a2f70 - Browse repository at this point
Copy the full SHA b6a2f70View commit details -
Make g_trace_filter_bitmap atomic to avoid race across threads.
PiperOrigin-RevId: 698304950
Configuration menu - View commit details
-
Copy full SHA for 23bda2a - Browse repository at this point
Copy the full SHA 23bda2aView commit details -
Remove unused GpuAsmOpts parameter from Cholesky and TriangularSolveT…
…hunks Also the usual drive-by cleanups: - Remove unused includes - Add explicit includes for things we depended on transitively - Clean up dependencies of the build targets PiperOrigin-RevId: 698317755
Configuration menu - View commit details
-
Copy full SHA for fa7483b - Browse repository at this point
Copy the full SHA fa7483bView commit details -
delete hlo-legalize-to-memref-unranked.mlir
func-bufferize pass is removed by llvm/llvm-project@e394fec PiperOrigin-RevId: 698322658
Configuration menu - View commit details
-
Copy full SHA for 0156f11 - Browse repository at this point
Copy the full SHA 0156f11View commit details -
[XLA:GPU] Combine pipelined instructions as much as possible by default.
We turn on previously implemented heuristics by default. PiperOrigin-RevId: 698324486
Configuration menu - View commit details
-
Copy full SHA for d256bac - Browse repository at this point
Copy the full SHA d256bacView commit details -
Move SparseDotMetaEncodingAttr inside xla
PiperOrigin-RevId: 698329980
Configuration menu - View commit details
-
Copy full SHA for 43b2f90 - Browse repository at this point
Copy the full SHA 43b2f90View commit details -
PR #19363: Loop Counter Increment in Collective Pipeliner
Imported from GitHub PR openxla/xla#19363 Sets the loop iteration counter increment in the backward transformation of the collective pipeliner pass to account for cases with non-zero initial value of the loop iteration counter. See #16953 and #18568. Copybara import of the project: -- 06137aa0618d372e2d4badbf16920bead9922cfb by Philipp Hack <[email protected]>: Modifies the loop counter increment set in the backward transformation of the collective pipeliner. -- 6da45bcb26643d8994bf608f05230fa748286b02 by Philipp Hack <[email protected]>: Modifies the loop counter increment set in the backward transformation of the collective pipeliner. Merging this change closes #19363 PiperOrigin-RevId: 698342374
Configuration menu - View commit details
-
Copy full SHA for 09adcb3 - Browse repository at this point
Copy the full SHA 09adcb3View commit details -
Multiple subgraphs may share the same delegate.
Dequantized static data is cached. However, when there are multiple subgraphs, the data is overwritten by each subgraph. PiperOrigin-RevId: 698342673
Configuration menu - View commit details
-
Copy full SHA for 4bc54e6 - Browse repository at this point
Copy the full SHA 4bc54e6View commit details -
PR #19393: [GPU] Horizontal loop fusion: pass bitcasts when looking f…
…or fusion candidates. Imported from GitHub PR openxla/xla#19393 Copybara import of the project: -- ec107a12fbee6826f1f668218b7c7a40f5886420 by Ilia Sergachev <[email protected]>: [GPU] Horizontal loop fusion: pass bitcasts when looking for fusion candidates. -- 71241097ce67412246ec18efca5165619601eace by Ilia Sergachev <[email protected]>: simplify cuDNN norm test Merging this change closes #19393 PiperOrigin-RevId: 698357838
Configuration menu - View commit details
-
Copy full SHA for c76ae32 - Browse repository at this point
Copy the full SHA c76ae32View commit details -
[TritonGPU] Add DotLike trait to SparseDotOp
PiperOrigin-RevId: 698360931
Configuration menu - View commit details
-
Copy full SHA for c36f39c - Browse repository at this point
Copy the full SHA c36f39cView commit details -
Use OpTrait::DotLike to identify dot-like operations
PiperOrigin-RevId: 698372450
Configuration menu - View commit details
-
Copy full SHA for 3d74da5 - Browse repository at this point
Copy the full SHA 3d74da5View commit details -
PR #19484: [ROCm] Fix //xla/tests:complex_unary_op_test and //xla/ser…
…vice/gpu/te… Imported from GitHub PR openxla/xla#19484 …sts:gpu_input_fusible_slice_test Copybara import of the project: -- 0d307384bff386d5182f89ae5a5422f8ca1a1290 by Dragan Mladjenovic <[email protected]>: [ROCm] Fix //xla/tests:complex_unary_op_test and //xla/service/gpu/tests:gpu_input_fusible_slice_test Merging this change closes #19484 PiperOrigin-RevId: 698374588
Configuration menu - View commit details
-
Copy full SHA for 499a186 - Browse repository at this point
Copy the full SHA 499a186View commit details -
[Triton] Restrict block_m to be > 16 in the GEMM autotuner to resolve…
… CUDA_ERROR_ILLEGAL_ADDRESS in (micro)benchmarks with FP8 Triton kernels during exhaustive autotuning. PiperOrigin-RevId: 698387396
Configuration menu - View commit details
-
Copy full SHA for 9ee8948 - Browse repository at this point
Copy the full SHA 9ee8948View commit details -
[XLA:ALGEBRAIC_SIMPLIFIER] Turn constant all-gather into broadcast
PiperOrigin-RevId: 698388778
Configuration menu - View commit details
-
Copy full SHA for 48a554c - Browse repository at this point
Copy the full SHA 48a554cView commit details -
Remove unused and add used headers in hlo_runner_main and create_client
PiperOrigin-RevId: 698388866
Configuration menu - View commit details
-
Copy full SHA for ed9291f - Browse repository at this point
Copy the full SHA ed9291fView commit details -
Merge sparsity_layout.patch into sparse_dot.patch
PiperOrigin-RevId: 698389323
Configuration menu - View commit details
-
Copy full SHA for ad81c08 - Browse repository at this point
Copy the full SHA ad81c08View commit details -
Prevent dequantizing/requantizing
f16
tof32
and back.What this change does is it: 1. Identifies all `kTfLiteBuiltinDequantize` nodes converting `kTfLiteFloat16` to `kTfLiteFloat32` and plugging into a `kTfLiteBuiltinFullyConnected`, `kTfLiteBuiltinConv2d`, or `kTfLiteBuiltinDepthwiseConv2d` node. 2. Re-maps XNNPACK tensors pointing to the `kTfLiteFloat32` output to point to the original `kTfLiteFloat16` input. The `kTfLiteFloat16` weights/filters and biases are handled by XNNPACK directly. PiperOrigin-RevId: 698395748
Configuration menu - View commit details
-
Copy full SHA for 4fc4983 - Browse repository at this point
Copy the full SHA 4fc4983View commit details -
Cleanup. Simplify the gather/scatter related functions in hlo_shardin…
…g_util by using `PropagateShardingAlongDimsAndReplicateOthers`. This is a no-op change. PiperOrigin-RevId: 698403022
Configuration menu - View commit details
-
Copy full SHA for 58b5cf4 - Browse repository at this point
Copy the full SHA 58b5cf4View commit details -
Update docs to make use of new API for adding a TfLiteRegistrationExt…
…ernal to a MutableOpResolver. PiperOrigin-RevId: 698407108
Configuration menu - View commit details
-
Copy full SHA for 86b52e2 - Browse repository at this point
Copy the full SHA 86b52e2View commit details -
PiperOrigin-RevId: 698410181
Configuration menu - View commit details
-
Copy full SHA for f635a02 - Browse repository at this point
Copy the full SHA f635a02View commit details -
[XLA:GPU] Adjust GetNumWarps heuristic in Tiled Cost Model.
We need to adjust the heuristic because before our emitter had an issue that prevented Triton from doing proper layout optimizations. It was fixed in openxla/xla@7280b9a. We needed to use higher number of warps (up to 32) before to cover the lack of layout optimization, but now it can cause performance regressions, because Triton likes to insert shmem usage and barrier syncs. PiperOrigin-RevId: 698416298
Configuration menu - View commit details
-
Copy full SHA for 6bdf799 - Browse repository at this point
Copy the full SHA 6bdf799View commit details -
Do not allow overlap between explicit and implicit batching dims in g…
…ather/scatter instructions. Implicit batching dims are also known as index parallel dims. Update `GetGatherScatterBatchParallelDims` accordingly. The sharding propagation and spmd partitioner will process explicit and implicit batching dims separately. PiperOrigin-RevId: 698421986
Configuration menu - View commit details
-
Copy full SHA for 870f8ff - Browse repository at this point
Copy the full SHA 870f8ffView commit details -
[xla:cpu] Add initial implementation of NanoRt backends for XLA:CPU
Minimal XLA:CPU runtime implementation optimized for low latency inference. -------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------- BM_NanoRtAddScalars 84.8 ns 84.8 ns 8277118 BM_NanoRtFibonacci 81.1 ns 81.1 ns 8468298 BM_PjRtAddScalars 1517 ns 1517 ns 460076 BM_PjRtFibonacci 1523 ns 1523 ns 460415 PiperOrigin-RevId: 698426607
Configuration menu - View commit details
-
Copy full SHA for bbaf53b - Browse repository at this point
Copy the full SHA bbaf53bView commit details -
Remove unneeded xla:status and xla::statusor dependencies.
PiperOrigin-RevId: 698427377
Configuration menu - View commit details
-
Copy full SHA for 311693a - Browse repository at this point
Copy the full SHA 311693aView commit details -
[tsl] CountDownAsyncValueRef: enforce memory ordering around fetch_sub
name old cpu/op new cpu/op delta BM_CountDownSuccess/4 97.6ns ± 2% 97.9ns ± 1% ~ (p=0.841 n=5+5) BM_CountDownSuccess/8 123ns ± 2% 122ns ± 1% ~ (p=0.548 n=5+5) BM_CountDownSuccess/16 171ns ± 1% 172ns ± 2% ~ (p=0.548 n=5+5) BM_CountDownSuccess/32 270ns ± 1% 271ns ± 1% ~ (p=0.310 n=5+5) BM_CountDownError/4 215ns ± 1% 212ns ± 3% ~ (p=0.310 n=5+5) BM_CountDownError/8 309ns ± 2% 307ns ± 1% ~ (p=0.421 n=5+5) BM_CountDownError/16 500ns ± 1% 496ns ± 2% ~ (p=0.421 n=5+5) BM_CountDownError/32 888ns ± 1% 885ns ± 2% ~ (p=0.548 n=5+5) PiperOrigin-RevId: 698431683
Configuration menu - View commit details
-
Copy full SHA for 6602db5 - Browse repository at this point
Copy the full SHA 6602db5View commit details -
[Code-Health] Resolve the following technical debt issue: Todo(resolv…
…ed) in CUDA BUILD file. PiperOrigin-RevId: 698444212
Configuration menu - View commit details
-
Copy full SHA for 81b5c3a - Browse repository at this point
Copy the full SHA 81b5c3aView commit details -
[XLA-Code-Health] Resolve 2 instances of the following issue: Todo (r…
…esolved) PiperOrigin-RevId: 698444339
Configuration menu - View commit details
-
Copy full SHA for 90b9cff - Browse repository at this point
Copy the full SHA 90b9cffView commit details -
[Code-Health] Resolve the following technical debt issue:
Todo(resolved) PiperOrigin-RevId: 698445171
Configuration menu - View commit details
-
Copy full SHA for fe84b76 - Browse repository at this point
Copy the full SHA fe84b76View commit details -
[Code-Health] Resolve the following technical debt issue:
Todo(resolved) PiperOrigin-RevId: 698445177
Configuration menu - View commit details
-
Copy full SHA for 3e19c4b - Browse repository at this point
Copy the full SHA 3e19c4bView commit details -
Refactor exhaustive_test_main into a separate library target
PiperOrigin-RevId: 698448973
Configuration menu - View commit details
-
Copy full SHA for ab446e5 - Browse repository at this point
Copy the full SHA ab446e5View commit details -
PiperOrigin-RevId: 698452075
Configuration menu - View commit details
-
Copy full SHA for 5743cb9 - Browse repository at this point
Copy the full SHA 5743cb9View commit details -
[XLA:CPU] Add benchmarks for 2D strided convolutions
Currently the transposed convolution is orders of magnitude slower than the regular one. Ideally performance should be similar. Detailed results: ---------------------------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------------------------- BM_Conv2DStrided/process_time 3737222 ns 41608631 ns 16 BM_Conv2DTransposedStrided/process_time 590079914 ns 1.0847e+10 ns 1 PiperOrigin-RevId: 698453016
Configuration menu - View commit details
-
Copy full SHA for c895d87 - Browse repository at this point
Copy the full SHA c895d87View commit details -
Fix comments in
convolution_test_1d.cc
The correct output dimension when dumped to HLO text is `bf0`, where `f` means the output feature dimension. There is no dimension called `o`. PiperOrigin-RevId: 698453240
Configuration menu - View commit details
-
Copy full SHA for f98c25e - Browse repository at this point
Copy the full SHA f98c25eView commit details -
[tsl:concurrency] Keep AsyncValueRef a part of CountDownAsyncValueRef…
… State By keeping AsyncValueRef as a part of the State we avoid one extra reference counting operation when copying CountDownAsyncValue (and we expect to copy it `cnt` times). name old cpu/op new cpu/op delta BM_CountDownSuccess/8 95.8ns ± 4% 81.7ns ± 1% -14.64% (p=0.000 n=40+35) BM_CountDownSuccess/16 142ns ± 1% 127ns ± 1% -10.05% (p=0.000 n=37+38) BM_CountDownSuccess/32 229ns ± 2% 216ns ± 1% -5.56% (p=0.000 n=40+38) BM_CountDownError/4 165ns ± 1% 152ns ± 2% -7.65% (p=0.000 n=39+40) BM_CountDownError/8 238ns ± 2% 225ns ± 1% -5.65% (p=0.000 n=40+38) BM_CountDownError/16 388ns ± 2% 369ns ± 2% -4.77% (p=0.000 n=40+36) BM_CountDownError/32 684ns ± 1% 666ns ± 1% -2.50% (p=0.000 n=38+38) PiperOrigin-RevId: 698454410
Configuration menu - View commit details
-
Copy full SHA for 4446af5 - Browse repository at this point
Copy the full SHA 4446af5View commit details -
Forgot to reset the map of skipped
f16
->f32
dequantizations betwe……en calls to `Delegate::PrepareOpsToDelegate`. PiperOrigin-RevId: 698455074
Configuration menu - View commit details
-
Copy full SHA for 555a259 - Browse repository at this point
Copy the full SHA 555a259View commit details -
PiperOrigin-RevId: 698462670
Configuration menu - View commit details
-
Copy full SHA for cf6e234 - Browse repository at this point
Copy the full SHA cf6e234View commit details -
[xla:cpu] Replace Thunk::ExecuteEvent with tsl::CountDownAsyncValueRef
PiperOrigin-RevId: 698466696
Configuration menu - View commit details
-
Copy full SHA for 39fa07c - Browse repository at this point
Copy the full SHA 39fa07cView commit details -
PR #19237: [GPU] Fix passing of key-value store handle from client to…
… compiler. Imported from GitHub PR openxla/xla#19237 Copybara import of the project: -- 177f911fd4c6af86c25aba2e38ea09767477be03 by Ilia Sergachev <[email protected]>: [GPU] Fix passing of key-value store handle from client to compiler. -- ec2b96ccdf8cd81abdc25f3cff2bdf65df455219 by Ilia Sergachev <[email protected]>: use allowed_devices instead of CUDA_VISIBLE_DEVICES -- 77ba9fd7b172052269fafd1a1970d58d1d803a59 by Ilia Sergachev <[email protected]>: skip the added test on pre-Ampere GPUs Merging this change closes #19237 PiperOrigin-RevId: 698469112
Configuration menu - View commit details
-
Copy full SHA for 031f776 - Browse repository at this point
Copy the full SHA 031f776View commit details -
[XLA:GPU] Use HloPredicateIsOp in collective_select_folder
PiperOrigin-RevId: 698473752
Configuration menu - View commit details
-
Copy full SHA for 40526ad - Browse repository at this point
Copy the full SHA 40526adView commit details -
Remove unused gpu_types.h include from nccl_collective_thunk.cc
PiperOrigin-RevId: 698474274
Configuration menu - View commit details
-
Copy full SHA for 20222d8 - Browse repository at this point
Copy the full SHA 20222d8View commit details -
[XLA:GPU] Use ShuffleOp to reverse the order of elements in a vector.
No functional change is intended but it generates less IR. PiperOrigin-RevId: 698477060
Configuration menu - View commit details
-
Copy full SHA for 464bbc2 - Browse repository at this point
Copy the full SHA 464bbc2View commit details -
[XLA:TPU:MSA] Refactor some utility functions from algorithm and buff…
…er_interval_comparator into msa/utils. PiperOrigin-RevId: 698481492
Configuration menu - View commit details
-
Copy full SHA for b740329 - Browse repository at this point
Copy the full SHA b740329View commit details -
Add backend_kwargs to XLA tests config.
PiperOrigin-RevId: 698485833
Configuration menu - View commit details
-
Copy full SHA for 48e48f9 - Browse repository at this point
Copy the full SHA 48e48f9View commit details -
IFRT proxy optimization: Make more IFRT operations asynchronous.
As of this CL, all array operations (except `IsDeleted()`) are asynchronous. This CL also makes the following drive-by changes: 1. Version management is getting refactored to use an enum and a header file within /common. 2. All error responses from the server (except connection terminations, which follow the previous behavior) are now printed out as a WARNING. PiperOrigin-RevId: 698491308
Configuration menu - View commit details
-
Copy full SHA for 928eb81 - Browse repository at this point
Copy the full SHA 928eb81View commit details -
Fix TSAN for new mixed priority unit tests.
PiperOrigin-RevId: 698494677
Configuration menu - View commit details
-
Copy full SHA for 9d25d03 - Browse repository at this point
Copy the full SHA 9d25d03View commit details -
Move next pluggable device to public XLA:CPU API
PiperOrigin-RevId: 698502341
Configuration menu - View commit details
-
Copy full SHA for 0adfb4b - Browse repository at this point
Copy the full SHA 0adfb4bView commit details -
Add types to c api for quantization.
Also add a bit more comments and re-organize some things. PiperOrigin-RevId: 698507311
Configuration menu - View commit details
-
Copy full SHA for 67f3f89 - Browse repository at this point
Copy the full SHA 67f3f89View commit details -
Migrate SplitMergedOperandsPass to new TFL::Pass mechanism and. remov…
…e the .td definition. PiperOrigin-RevId: 698510501
Configuration menu - View commit details
-
Copy full SHA for a1fca06 - Browse repository at this point
Copy the full SHA a1fca06View commit details -
[IFRT] Fix signature of CreateIfrtVerifyDonationPass
PiperOrigin-RevId: 698519430
Configuration menu - View commit details
-
Copy full SHA for 1b65e9a - Browse repository at this point
Copy the full SHA 1b65e9aView commit details -
Add AssertEq wrapper and switch assert funcs to use generalized funct…
…ion poitners. check for correct union types in tensor cc api PiperOrigin-RevId: 698529069
Configuration menu - View commit details
-
Copy full SHA for 98ec0ed - Browse repository at this point
Copy the full SHA 98ec0edView commit details -
[NFC] hlo_op_profiler_test: Internal testing change.
PiperOrigin-RevId: 698529351
Configuration menu - View commit details
-
Copy full SHA for be03c16 - Browse repository at this point
Copy the full SHA be03c16View commit details -
Optimize explicit broadcasting-like patterns for TFL_Select*Ops in TF…
…Lite. This CL optimizes explicit broadcasting-like patterns in TFLite, because TFLite Ops support implicit broadcasting. Also, this CL is moving the existing fusions on broadcast-to+select to the dedicated pass. The patterns are: - Fuse splat const into select op. - Fuse fill-op into select op. PiperOrigin-RevId: 698530501
Configuration menu - View commit details
-
Copy full SHA for 6c8ca3b - Browse repository at this point
Copy the full SHA 6c8ca3bView commit details -
Add subgraph name to model runtime info proto.
PiperOrigin-RevId: 698531058
Configuration menu - View commit details
-
Copy full SHA for ba53e1b - Browse repository at this point
Copy the full SHA ba53e1bView commit details -
[IFRT] Add pass to legalize VIFRT into IFRT.
PiperOrigin-RevId: 698535976
Configuration menu - View commit details
-
Copy full SHA for 32ab0e9 - Browse repository at this point
Copy the full SHA 32ab0e9View commit details -
Remove unused functions from ir_emission_utils.cc
PiperOrigin-RevId: 698542878
Configuration menu - View commit details
-
Copy full SHA for 579b594 - Browse repository at this point
Copy the full SHA 579b594View commit details
Commits on Nov 21, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 031dc7c - Browse repository at this point
Copy the full SHA 031dc7cView commit details -
[XLA:MSA] Fixes a bug in GetInefficientAllocationSites(allocation_val…
…ues). The function was previously assuming allocation_values can never be empty. PiperOrigin-RevId: 698548828
Configuration menu - View commit details
-
Copy full SHA for afd233b - Browse repository at this point
Copy the full SHA afd233bView commit details -
Adding step to constant_value and add support for multiplication whil…
…e recursively calculating the range of an expression. PiperOrigin-RevId: 698551804
Configuration menu - View commit details
-
Copy full SHA for 605e172 - Browse repository at this point
Copy the full SHA 605e172View commit details -
[XLA:GPU] Remove RewriteReductionsPass
It is unused. PiperOrigin-RevId: 698554807
Configuration menu - View commit details
-
Copy full SHA for a3fda2b - Browse repository at this point
Copy the full SHA a3fda2bView commit details -
Implement getting per-tensor quantization in the c and cc api
PiperOrigin-RevId: 698555795
Configuration menu - View commit details
-
Copy full SHA for 23a1f6b - Browse repository at this point
Copy the full SHA 23a1f6bView commit details -
Expose SignatureRunner via interpreter.h
PiperOrigin-RevId: 698557779
Configuration menu - View commit details
-
Copy full SHA for aff0983 - Browse repository at this point
Copy the full SHA aff0983View commit details -
[Cleanup] Use HloPredicateIs(Not)Op in cudnn_fused_mha_rewriter.cc
PiperOrigin-RevId: 698567794
Configuration menu - View commit details
-
Copy full SHA for 0e5af71 - Browse repository at this point
Copy the full SHA 0e5af71View commit details -
Add helper methods to add inputs/outputs to internal tensor def.
PiperOrigin-RevId: 698572323
Configuration menu - View commit details
-
Copy full SHA for 613a4ac - Browse repository at this point
Copy the full SHA 613a4acView commit details -
[IfOp] Call
std::vector::reserve()
on theargs
vector before copy……ing input tensors to it. PiperOrigin-RevId: 698572634
Configuration menu - View commit details
-
Copy full SHA for 26eafbb - Browse repository at this point
Copy the full SHA 26eafbbView commit details -
* Add support for overriding cross program prefetch behavior. * Add support for filtering buffer intervals based on the uses of the buffer. * Add tests for overriding cross program prefetch behavior * Add tests for expanding filtering criteria. PiperOrigin-RevId: 698574108
Configuration menu - View commit details
-
Copy full SHA for c5b2880 - Browse repository at this point
Copy the full SHA c5b2880View commit details -
Move
tsl/platform/{cloud,default,windows}
toxla/tsl/platform
PiperOrigin-RevId: 698575496
Configuration menu - View commit details
-
Copy full SHA for e435325 - Browse repository at this point
Copy the full SHA e435325View commit details -
Temporally changes the supported OP check function in QC Compiler plu…
…gin. PiperOrigin-RevId: 698585431
Configuration menu - View commit details
-
Copy full SHA for 0b2559d - Browse repository at this point
Copy the full SHA 0b2559dView commit details -
Add utility function to compute min num of bytes for tensor with strides
PiperOrigin-RevId: 698592285
Configuration menu - View commit details
-
Copy full SHA for 9a46acc - Browse repository at this point
Copy the full SHA 9a46accView commit details -
[xla:cpu] Resolve arguments/results/temp mapping from buffer assignment
PiperOrigin-RevId: 698610190
Configuration menu - View commit details
-
Copy full SHA for 6fb4e33 - Browse repository at this point
Copy the full SHA 6fb4e33View commit details -
[xla:cpu] Resolve constant buffers
PiperOrigin-RevId: 698625663
Configuration menu - View commit details
-
Copy full SHA for 39fb1ff - Browse repository at this point
Copy the full SHA 39fb1ffView commit details -
[Code-Health] Resolve 2 instances of the following issue: Todo (resol…
…ved) PiperOrigin-RevId: 6986421
Configuration menu - View commit details
-
Copy full SHA for 87766ec - Browse repository at this point
Copy the full SHA 87766ecView commit details -
Add helper function to add new tensors to internal subgraph.
PiperOrigin-RevId: 698644067
Configuration menu - View commit details
-
Copy full SHA for 688cf8a - Browse repository at this point
Copy the full SHA 688cf8aView commit details -
[xla:cpu] Use CountDownAsyncValueRef in HostKernel state
PiperOrigin-RevId: 698648940
Configuration menu - View commit details
-
Copy full SHA for 399d155 - Browse repository at this point
Copy the full SHA 399d155View commit details -
Copy constant buffer data for partitioned tensor; Copy option for par…
…titioned Op. PiperOrigin-RevId: 698655747
Configuration menu - View commit details
-
Copy full SHA for ea216b1 - Browse repository at this point
Copy the full SHA ea216b1View commit details -
[XLA:TPU:MSA] Remove redundant checks for cross_program_prefetches in…
… memory_space_assignment tests. PiperOrigin-RevId: 698657506
Configuration menu - View commit details
-
Copy full SHA for 96cb336 - Browse repository at this point
Copy the full SHA 96cb336View commit details -
[XLA:MSA] Allow more flexible filtering when picking instruction to s…
…chedule after/before for prefetch time override. PiperOrigin-RevId: 698666645
Configuration menu - View commit details
-
Copy full SHA for b130342 - Browse repository at this point
Copy the full SHA b130342View commit details -
PiperOrigin-RevId: 698671850
Configuration menu - View commit details
-
Copy full SHA for b8b7854 - Browse repository at this point
Copy the full SHA b8b7854View commit details -
[XLA:GPU] Use DeviceDescription instead of GetDriverVersion in NVPTXC…
…ompiler NVPTXCompiler was calling `cuda::GetDriverVersion` to determine whether the CUDA driver is new enough to consider it for PTX JIT compilation. This change makes it use the driver version available in the `DeviceDescription` type. PiperOrigin-RevId: 698672918
Configuration menu - View commit details
-
Copy full SHA for c6a4cc9 - Browse repository at this point
Copy the full SHA c6a4cc9View commit details -
Use fast version of log if type is F16 or BF16.
There seems to be no dedicated libdevice call for Log with F16 or BF16 type. Currently we upcast to F32 and use __nv_logf. However it seems likely that __nv_fast_logf is good enough for F16 and BF16 type, so use it as it is considerably faster. PiperOrigin-RevId: 698673580
Configuration menu - View commit details
-
Copy full SHA for 81e2cc6 - Browse repository at this point
Copy the full SHA 81e2cc6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9264ca7 - Browse repository at this point
Copy the full SHA 9264ca7View commit details -
[XLA:GPU] Delete file that is not referenced in BUILD file anymore.
Also delete the other things which were only referenced from that file. PiperOrigin-RevId: 698706755
Configuration menu - View commit details
-
Copy full SHA for dec1404 - Browse repository at this point
Copy the full SHA dec1404View commit details -
Remove :cuda_runtime and :rocm_runtime targets
- The remaining `GetRuntimeVersion` and `GetFuncBySymbol` functions get moved into the executors - the only place where they are needed. - For CUDA is also create an overload of `cuda::ToStatus` which can convert a CUDA runtime error (`cudaError_t`) into an `absl::Status`. - I also had to adjust the `RocmKernel` and `CudaKernel` tests which were using `GetFuncBySymbol` directly. Now they rely on `LoadKernel` from the executors. PiperOrigin-RevId: 698720699
Configuration menu - View commit details
-
Copy full SHA for e563ba5 - Browse repository at this point
Copy the full SHA e563ba5View commit details -
Remove CUDA 12.1 workaround from reduction logic
There was a check in place that works around a performance bug in ptxas from CUDA 12.1. This check has various problems: 1. It's untested and the way it's implemented it can't be easily test. 2. The version check doesn't work library compilation which we transition towards as it's checking the version of a local ptxas binary 3. It's unclear whether the workaround is still needed with the new MLIR emitters. So I'm removing it here since it blocks me from making more refactoring around PTX compilation. PiperOrigin-RevId: 698720761
Configuration menu - View commit details
-
Copy full SHA for 8e7ac01 - Browse repository at this point
Copy the full SHA 8e7ac01View commit details -
[XLA:GPU][Emitters] Canonicalize unrolled IR.
If the IR is not canonicalized after unrolling, then the passes that follow unrolling in the pipeline don't converge sometimes. PiperOrigin-RevId: 698723354
Configuration menu - View commit details
-
Copy full SHA for b9f49aa - Browse repository at this point
Copy the full SHA b9f49aaView commit details -
PR #19528: [XLA:GPU] use separte command buffer cmd flag for conditio…
…nal and loop Imported from GitHub PR openxla/xla#19528 Observed in saxml workload that sharing the same command buffer cmd type (CONDITIONALS) for WHILE and CONDITIONAL command over kill the lowering opportunities. Many cases could allow CONDITIONAL instruction to lower into command buffer, while WHILE is not possible. This PR uses separate command buffer cmd type flag for CONDITIONAL and WHILE instructions when user specifies the type to lowering. Copybara import of the project: -- 4d62fb512995e2fc6e9077a1b3251a6754c866ca by Shawn Wang <[email protected]>: use separte command buffer cmd flag for conditional and loop Merging this change closes #19528 PiperOrigin-RevId: 698729891
Configuration menu - View commit details
-
Copy full SHA for 733d71d - Browse repository at this point
Copy the full SHA 733d71dView commit details -
Configuration menu - View commit details
-
Copy full SHA for d797898 - Browse repository at this point
Copy the full SHA d797898View commit details -
PR #19552: [GPU][NFC] Cleanup horizontal loop fusion.
Imported from GitHub PR openxla/xla#19552 - avoid unnecessary work - bump log level at which complete computations are printed - add log statements Copybara import of the project: -- e273aea41dd15efbc5d79c363810cf634e73203e by Ilia Sergachev <[email protected]>: [GPU][NFC] Cleanup horizontal loop fusion. - avoid unnecessary work - bump log level at which complete computations are printed - add log statements Merging this change closes #19552 PiperOrigin-RevId: 698731719
Configuration menu - View commit details
-
Copy full SHA for 02e74d9 - Browse repository at this point
Copy the full SHA 02e74d9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3995a20 - Browse repository at this point
Copy the full SHA 3995a20View commit details -
[XLA:GPU] Enable Triton normalization fusions by default.
With the feature enabled, XLA GPU will automatically match all kinds of normalization diamond patterns in the graph (Softmax, RmsNorm, etc.) and generate efficient kernels with Triton. In the compilation pipeline the following steps happen: 1. `SoftmaxRewriterTriton` pass matches minimal normalization diamonds and creates new fusions with `kCustom` kind. The fusions also have a backend config attached with `__triton` kind and tiling information in `BlockLevelFusionConfig`. 2. `PriorityFusion` uses the Cost Model to potentially fuse more instructions into the matched fusions. 3. Fusions are emitter with generic Triton fusion emitter. The Cost Model chooses tile sizes for each Triton fusion. Currently `SoftmaxRewriterTriton` only matches normalization patterns that reduce the minormost dimension. PiperOrigin-RevId: 698735843
Configuration menu - View commit details
-
Copy full SHA for aeef8f4 - Browse repository at this point
Copy the full SHA aeef8f4View commit details -
[XLA:GPU] Remove KernelFusionEmitterBase.
This class is no longer used. PiperOrigin-RevId: 698736858
Configuration menu - View commit details
-
Copy full SHA for 16206d3 - Browse repository at this point
Copy the full SHA 16206d3View commit details -
Integrate LLVM at llvm/llvm-project@33fcd6acc755
Updates LLVM usage to match [33fcd6acc755](llvm/llvm-project@33fcd6acc755) PiperOrigin-RevId: 698742870
Configuration menu - View commit details
-
Copy full SHA for 4eb39af - Browse repository at this point
Copy the full SHA 4eb39afView commit details -
Since `CudaDriverVersion()` is now only used in one place, let's inline the function and remove the target. PiperOrigin-RevId: 698747446
Configuration menu - View commit details
-
Copy full SHA for 8f4674a - Browse repository at this point
Copy the full SHA 8f4674aView commit details -
Remove patch that is not needed anymore.
This has been upstreamed to LLVM, and we have updated to a revision containing this. PiperOrigin-RevId: 698748177
Configuration menu - View commit details
-
Copy full SHA for f6033a3 - Browse repository at this point
Copy the full SHA f6033a3View commit details -
PR #18407: Fix xla-mlir failures on Windows
Imported from GitHub PR openxla/xla#18407 This PR aims to enable the XLA/mlir/tool test cases on the Windows Platform. Error: //xla/mlir/tools/mlir_bisect/... tests were failing on the Windows platform with the errors shown below: Errors Error 1.Error with llvm::seq no matching function for call to 'seq' for (auto i : llvm::seq(0ul, sizeof...(T))) { Solution: change to llvm::seq(0, sizeof...(T)) By explicitly specifying the type (unsigned long) in llvm::seq, the compiler now clearly understands the type of the sequence. Error 2. Missing dlfcn.h: Location: xla/mlir/tools/mlir_interpreter/dialects/func.cc fatal error: 'dlfcn.h' file not found Solution: include 'windows.h' for Windows platform Error 3. Use of Undeclared Identifiers sym and RTLD_DEFAULT: Location: xla/mlir/tools/mlir_interpreter/dialects/func.cc use of undeclared identifier 'sym' sym = dlsym(RTLD_DEFAULT, callee.getSymName().str().c_str()); ^ use of undeclared identifier 'RTLD_DEFAULT' Solution: On Windows, the approach to obtaining a symbol's address differs from Unix-based systems. GetModuleHandle function retrieves a handle to the specified module (DLL) that is loaded in the address space of the calling process. This handle is necessary to access the module's symbols. GetProcAddress function locates the address of an exported function or variable by name. Copybara import of the project: -- 1a428996c7991df8e093393e7989fbcf251dc0f4 by Raunak <[email protected]>: fix xla-mlir failures on windows -- 15009666c4ee861218bb798c6fe0d2493fa8e060 by Raunak <[email protected]>: resolve comments -- 2483001d510582179d74b94571f9fd6beb943aaa by Raunak <[email protected]>: Keep the original file -- 4c7fe5e4debed0ff39eb87f64f60f99ce6ee0a74 by Raunak <[email protected]>: fix the formatting issue -- 270898a2b0bca97a7de30435ce6a53b5980ca73e by mraunak <[email protected]>: Update symbol_finder_windows.cc -- 6b63a306ee4ef69f9849822418426d5f705e73ff by mraunak <[email protected]>: Update symbol_finder_linux.cc -- f0996fcc1c67e43bbb7b7829adddf0c7d8f5c738 by mraunak <[email protected]>: Update symbol_finder.h -- 0c0c9bba3dac548e663aab5e2e3af6fb96c77fde by Raunak <[email protected]>: Fix the build file -- 6d7f269262dcc8a85579c62db51e38dc534d6564 by Raunak <[email protected]>: Resolve the comments -- ef598af149e9ad96dc1fa27be763a7ffd219011c by Raunak <[email protected]>: Resolve the comments -- 7131b8d24ad353044b622614c36c342d90101d37 by Raunak <[email protected]>: added :find_symbol to dependency -- 64a6e9e45d6deef4201c3cd8da64e99b9d40ca78 by mraunak <[email protected]>: Update BUILD -- d47a8b27c89e5df01ea94a237080fd2ac3ad8e85 by mraunak <[email protected]>: Fix clang format -- 1a24df16d3de5f007065b69a67965158e821ffe3 by Raunak <[email protected]>: resolve the comments -- 12f69fc2d188f8bc368bc5e29b53a80d15b6dbac by Raunak <[email protected]>: adding namespace and header style consistent -- ec9b5051471a36f7881ed21215f60ec893f18e7d by Raunak <[email protected]>: Fix the build file Merging this change closes #18407 PiperOrigin-RevId: 698754912
Configuration menu - View commit details
-
Copy full SHA for b02f2fb - Browse repository at this point
Copy the full SHA b02f2fbView commit details -
Refactor PjRt environment initialization to have clearer data flow
Split the initialization into several methods to have a better distinction between their responisbilities. PiperOrigin-RevId: 698757702
Configuration menu - View commit details
-
Copy full SHA for 99e5ce3 - Browse repository at this point
Copy the full SHA 99e5ce3View commit details -
[XLA:GPU] Copy final bufferize patterns that were removed in upstream…
… MLIR. An upstream MLIR PR [0] removed `finalizing-bufferize` pass. We are using only two pattern from the pass. As suggested by the note in the PR description, we can copy those pattern. [0] llvm/llvm-project@cbc7802 PiperOrigin-RevId: 698761132
Configuration menu - View commit details
-
Copy full SHA for 3cce16f - Browse repository at this point
Copy the full SHA 3cce16fView commit details -
The build target doesn't exist anymore but there is still a header file which gets deleted in this change. PiperOrigin-RevId: 698778403
Configuration menu - View commit details
-
Copy full SHA for cae2ee4 - Browse repository at this point
Copy the full SHA cae2ee4View commit details -
[xla:cpu] Optimize buffer allocations construction from se::DeviceMem…
…oryBase name old cpu/op new cpu/op delta BM_NanoRtAddScalars 82.2ns ± 2% 63.1ns ± 2% -23.17% (p=0.000 n=37+40) BM_NanoRtFibonacci 86.7ns ± 2% 68.4ns ± 2% -21.09% (p=0.000 n=37+35) BM_PjRtAddScalars 1.78µs ± 2% 1.79µs ± 2% ~ (p=0.280 n=39+38) BM_PjRtFibonacci 1.79µs ± 3% 1.79µs ± 3% ~ (p=0.355 n=38+38) PiperOrigin-RevId: 698783540
Configuration menu - View commit details
-
Copy full SHA for 7aac3a5 - Browse repository at this point
Copy the full SHA 7aac3a5View commit details -
Specify a much shorter output path for Bazel on Windows.
To avoid running into the 259 character path length limitation. PiperOrigin-RevId: 698786300
Configuration menu - View commit details
-
Copy full SHA for dc24ff7 - Browse repository at this point
Copy the full SHA dc24ff7View commit details -
PR #19578: [doc] Fix a link to a page in the table of contents.
Imported from GitHub PR openxla/xla#19578 Copybara import of the project: -- 849d78bf539cc69387ecb3f9710b6188cee5a494 by Ilia Sergachev <[email protected]>: [doc] Fix a link to a page in the table of contents. Merging this change closes #19578 PiperOrigin-RevId: 698788574
Configuration menu - View commit details
-
Copy full SHA for afa0ccd - Browse repository at this point
Copy the full SHA afa0ccdView commit details -
[XLA:GPU] Change
ConstraintExpression
to use operator||/&& which re……turn a new instance. This CL changes the `ConstraintExpression` class by making it a value type and using C++ operators for logical operations. This hopefully makes the code more concise and easier to read. PiperOrigin-RevId: 698791293
Configuration menu - View commit details
-
Copy full SHA for 6e19bfe - Browse repository at this point
Copy the full SHA 6e19bfeView commit details -
[xla:cpu] Add a test for nanort executable with temp storage
PiperOrigin-RevId: 698800849
Configuration menu - View commit details
-
Copy full SHA for 8e81dc7 - Browse repository at this point
Copy the full SHA 8e81dc7View commit details -
Remove static_casts in implementations of SetNodeExecutionEnabled.
PiperOrigin-RevId: 698809905
Configuration menu - View commit details
-
Copy full SHA for 3098300 - Browse repository at this point
Copy the full SHA 3098300View commit details -
Merge pull request #80492 from tensorflow:gaikwadrahul8-patch-3
PiperOrigin-RevId: 698812072
Configuration menu - View commit details
-
Copy full SHA for dbe8068 - Browse repository at this point
Copy the full SHA dbe8068View commit details -
Merge pull request #80490 from tensorflow:gaikwadrahul8-patch-2
PiperOrigin-RevId: 698813006
Configuration menu - View commit details
-
Copy full SHA for 4907ba9 - Browse repository at this point
Copy the full SHA 4907ba9View commit details -
Derived lines only from the stream with most device events for GPU de…
…vice traceviewer PiperOrigin-RevId: 698820533
Configuration menu - View commit details
-
Copy full SHA for 811877e - Browse repository at this point
Copy the full SHA 811877eView commit details -
[tsl:concurrency] Fix asan error in CountDownAsyncValueRef
PiperOrigin-RevId: 698821973
Configuration menu - View commit details
-
Copy full SHA for 0aa5275 - Browse repository at this point
Copy the full SHA 0aa5275View commit details -
Remove unused GpuAsmOpts parameter from RedzoneAllocator
PiperOrigin-RevId: 698822218
Configuration menu - View commit details
-
Copy full SHA for 86eeec3 - Browse repository at this point
Copy the full SHA 86eeec3View commit details -
Set implicitTrunc on APInt creation
With llvm/llvm-project@3494ee9, upstream has stricter checks for ints. PiperOrigin-RevId: 698823182
Configuration menu - View commit details
-
Copy full SHA for 8ab5ae1 - Browse repository at this point
Copy the full SHA 8ab5ae1View commit details -
Move upstreamable part of sparse_dot to be a public patch
PiperOrigin-RevId: 698823837
Configuration menu - View commit details
-
Copy full SHA for 4844ee2 - Browse repository at this point
Copy the full SHA 4844ee2View commit details -
[Control flow] Add a lighter implementation of
cond_v2()
that is op……timized for latency. This change introduces `cond_v2.fast_cond_v2()`, which is a tool for writing latency-optimized conditionals using the functional `IfOp` implementation. PiperOrigin-RevId: 698835221
Configuration menu - View commit details
-
Copy full SHA for c934e12 - Browse repository at this point
Copy the full SHA c934e12View commit details -
Clarify the dimensions in gather/scatter dimensions. The following di…
…mensions do NOT overlap. These dims are processed separately in spmd partitioner. 1. Explicit batching dims exist in all tensors (operand, indices, output). 2. Index pass-through dims exist in indices and output. 3. Operand pass-through dims exist in operand and output. We replace `GatherOutputShardingFromIndexIndexPassthroughDimensions` with `GatherOutputShardingFromIndex(bool consider_explict_batch_dims=true)`. The added test failed before this change since it process explicit batch dims as index pass-through dims. This change fix this issue. PiperOrigin-RevId: 698840297
Configuration menu - View commit details
-
Copy full SHA for 6823a7d - Browse repository at this point
Copy the full SHA 6823a7dView commit details -
Add backend kwargs to xla tests.
PiperOrigin-RevId: 698843953
Configuration menu - View commit details
-
Copy full SHA for 81b91c2 - Browse repository at this point
Copy the full SHA 81b91c2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 12eb5bd - Browse repository at this point
Copy the full SHA 12eb5bdView commit details -
Make the implementation of GetXlaPjrtTpuClient more similar to how Ja…
…x uses PJRT. PiperOrigin-RevId: 698867902
Configuration menu - View commit details
-
Copy full SHA for 65e4fb1 - Browse repository at this point
Copy the full SHA 65e4fb1View commit details -
Cleanup. Merge
GatherScatterParallelDims
intoGatherScatterDims
.No behavior change. PiperOrigin-RevId: 698870832
Configuration menu - View commit details
-
Copy full SHA for 00c79a2 - Browse repository at this point
Copy the full SHA 00c79a2View commit details -
[XLA:GPU] Dump the failing HLO fusion to a file when Triton numerics …
…verification fails. The fusion is extracted into a separate module, so it's easier to reproduce the issue. If the fusion is too long, stdout log will be cropped. PiperOrigin-RevId: 698872626
Configuration menu - View commit details
-
Copy full SHA for 6db0eae - Browse repository at this point
Copy the full SHA 6db0eaeView commit details -
PiperOrigin-RevId: 698873637
Configuration menu - View commit details
-
Copy full SHA for c8815dd - Browse repository at this point
Copy the full SHA c8815ddView commit details -
Revert: [XLA:GPU] Enable Triton normalization fusions by default.
Internal test is broken. Reverts aeef8f4 PiperOrigin-RevId: 698893785
Configuration menu - View commit details
-
Copy full SHA for 79d66e7 - Browse repository at this point
Copy the full SHA 79d66e7View commit details -
[xla:codegen] Add a testonly KernelEmitter for testing XLA:CPU kernels
Prototyping test only KernelEmitter API that can be used for writing XLA:CPU kernel tests. PiperOrigin-RevId: 698895333
Configuration menu - View commit details
-
Copy full SHA for 82889b7 - Browse repository at this point
Copy the full SHA 82889b7View commit details -
Refactor
GetGatherScatterBatchParallelDims
. No behavior change.Before this change, `GetGatherScatterBatchParallelDims` only returns the implicit batching dims in operand and indices. We still need to call `GetGatherParallelOutputDims` to return the corresponding dims in the output. With this change, `GetGatherScatterBatchParallelDims` returns the implicit batch dims in 3 tensors (operand, indices, and output). PiperOrigin-RevId: 698895717
Configuration menu - View commit details
-
Copy full SHA for 27b5058 - Browse repository at this point
Copy the full SHA 27b5058View commit details -
[XLA:CollectivePipeliner-Sinking] Stop pipelining iterations if a lar…
…ge sunk collective is encountered. PiperOrigin-RevId: 698924172
Configuration menu - View commit details
-
Copy full SHA for efcfca3 - Browse repository at this point
Copy the full SHA efcfca3View commit details -
Move
jax
visibility insideinternal_visibility
callPiperOrigin-RevId: 698927051
Configuration menu - View commit details
-
Copy full SHA for cac5b0a - Browse repository at this point
Copy the full SHA cac5b0aView commit details -
Remove the C++ memory checker. Python checker remains.
PiperOrigin-RevId: 698929552
Configuration menu - View commit details
-
Copy full SHA for 1ccaf9c - Browse repository at this point
Copy the full SHA 1ccaf9cView commit details -
Eliminate static_casts in GpuCommandBuffer.
PiperOrigin-RevId: 698932952
Configuration menu - View commit details
-
Copy full SHA for f5ebe9a - Browse repository at this point
Copy the full SHA f5ebe9aView commit details -
[Cleanup] Use HloPredicateIs(Not)Op in cudnn_fused_mha_rewriter_test.cc
PiperOrigin-RevId: 698933032
Configuration menu - View commit details
-
Copy full SHA for 860cd53 - Browse repository at this point
Copy the full SHA 860cd53View commit details -
Switch flatbuffer_conversions to use ABSL_LOG instead of LOG
PiperOrigin-RevId: 698939145
Configuration menu - View commit details
-
Copy full SHA for 3efa2dd - Browse repository at this point
Copy the full SHA 3efa2ddView commit details
Commits on Nov 22, 2024
-
PR #16901: [XLA:GPU] Fix default device mesh for auto sharding
Imported from GitHub PR openxla/xla#16901 When the user does not specify the number of GPUs for auto sharding, XLA defaults to using all available GPUs. The current implementation uses the number of cores (SMs) on the GPU as the default shard count. For example, on an A100, the sharding algorithm will try to shard into 108 devices, which can be confusing for users. This patch changes the shard count to the number of cards, which has been tested to work correctly on an 8-card A100 machine. Copybara import of the project: -- 232a62ae2599e6fe76e2e235ea18452195bce799 by Tianyi Liu <[email protected]>: [XLA:GPU] Fix default device mesh for auto sharding Merging this change closes #16901 PiperOrigin-RevId: 698956243
Configuration menu - View commit details
-
Copy full SHA for 35fe8c2 - Browse repository at this point
Copy the full SHA 35fe8c2View commit details -
Stop using AsGpuStreamValue in gpu_cudamallocasync_allocator_test.
PiperOrigin-RevId: 698958036
Configuration menu - View commit details
-
Copy full SHA for f6c6a4e - Browse repository at this point
Copy the full SHA f6c6a4eView commit details -
Separate authoritative vs Q-DQ DRR patterns.
Some patterns added the the quantize_patterns.td were making decisions about quantizing some weights that are not annotated by Q-DQ nodes. This PR separates these two categories for cases we want strict adherence to Q-DQ annotations (e.g. QAT). PiperOrigin-RevId: 698960224
Configuration menu - View commit details
-
Copy full SHA for 3c7c5ae - Browse repository at this point
Copy the full SHA 3c7c5aeView commit details -
[Cleanup] Use HloPredicateIs(Not)Op in cudnn_fused_mha_transpose_fusi…
…on.cc PiperOrigin-RevId: 698970177
Configuration menu - View commit details
-
Copy full SHA for 03844d1 - Browse repository at this point
Copy the full SHA 03844d1View commit details -
Add batch tests to RemapArrays, and with different shapes.
PiperOrigin-RevId: 698973331
Configuration menu - View commit details
-
Copy full SHA for e8348ad - Browse repository at this point
Copy the full SHA e8348adView commit details -
[Cleanup] Use HloPredicateIs(Not)Op in alias_passthrough_params.cc
PiperOrigin-RevId: 698987161
Configuration menu - View commit details
-
Copy full SHA for 866c565 - Browse repository at this point
Copy the full SHA 866c565View commit details -
[IFRT] Implement BytecodeDialectInterface for VIFRT.
PiperOrigin-RevId: 698995873
Configuration menu - View commit details
-
Copy full SHA for 8b6acd0 - Browse repository at this point
Copy the full SHA 8b6acd0View commit details -
[Cleanup] Use HloPredicateIs(Not)Op in all_gather_dynamic_slice_simpl…
…ifier.cc PiperOrigin-RevId: 698997197
Configuration menu - View commit details
-
Copy full SHA for eaa1401 - Browse repository at this point
Copy the full SHA eaa1401View commit details -
Create a proto for holding logical topology metadata about a job.
PiperOrigin-RevId: 699000269
Configuration menu - View commit details
-
Copy full SHA for 16986fc - Browse repository at this point
Copy the full SHA 16986fcView commit details -
PiperOrigin-RevId: 699004930
Configuration menu - View commit details
-
Copy full SHA for de89503 - Browse repository at this point
Copy the full SHA de89503View commit details -
Set implicitTrunc on APInt creation
With llvm/llvm-project@3494ee9, upstream has stricter checks for ints. Setting `APInt(.., /*isSigned=*/ !isUnsigned, ..)` seems to break EvalCompareOpPattern, likely due to signed i1 not allowing 1. This change just keeps the status quo without making too many changes. PiperOrigin-RevId: 699031101
Configuration menu - View commit details
-
Copy full SHA for c23c6f5 - Browse repository at this point
Copy the full SHA c23c6f5View commit details -
Move
tsl/platform/profile_utils
toxla/tsl/platform/profile_utils
PiperOrigin-RevId: 699035755
Configuration menu - View commit details
-
Copy full SHA for 1d6bd16 - Browse repository at this point
Copy the full SHA 1d6bd16View commit details -
Remove obsolete PjRtClient::AsyncSendPlaceholder API.
PiperOrigin-RevId: 699044311
Configuration menu - View commit details
-
Copy full SHA for a40d63f - Browse repository at this point
Copy the full SHA a40d63fView commit details -
[Cleanup] Use HloPredicateIs(Not)Op in all_gather_optimizer.cc
PiperOrigin-RevId: 699044350
Configuration menu - View commit details
-
Copy full SHA for c4b75a0 - Browse repository at this point
Copy the full SHA c4b75a0View commit details -
[Cleanup] Use HloPredicateIs(Not)Op in all_reduce_blueconnect.cc
PiperOrigin-RevId: 699048665
Configuration menu - View commit details
-
Copy full SHA for b84678f - Browse repository at this point
Copy the full SHA b84678fView commit details -
[Cleanup] Use HloPredicateIs(Not)Op in async_wrapper.cc
PiperOrigin-RevId: 699052605
Configuration menu - View commit details
-
Copy full SHA for acc6557 - Browse repository at this point
Copy the full SHA acc6557View commit details -
[Cleanup] Use HloPredicateIs(Not)Op in async_wrapper_test.cc
PiperOrigin-RevId: 699056083
Configuration menu - View commit details
-
Copy full SHA for fef8b55 - Browse repository at this point
Copy the full SHA fef8b55View commit details -
Move ptxas/nvlink compilation into separate compilation unit
This moves all the PTX compilation functions that spawn subprocesses - notably ptxas, nvlink, and fatbin into a separate file. The goal is to make this optional so that and eventually disable it by default. Since we can compile through libraries like libnvjitlink the rather brittle approach of calling external binaries is not needed anymore. This also adds tests for all the helper functions. Tests for the actual compilation will follow separately. PiperOrigin-RevId: 699058086
Configuration menu - View commit details
-
Copy full SHA for 4eb2052 - Browse repository at this point
Copy the full SHA 4eb2052View commit details -
[Cleanup] Use HloPredicateIs(Not)Op in collective_permute_cycle_decom…
…poser.cc PiperOrigin-RevId: 699059539
Configuration menu - View commit details
-
Copy full SHA for 9d580d3 - Browse repository at this point
Copy the full SHA 9d580d3View commit details -
[Cleanup] Use HloPredicateIs(Not)Op in collective_permute_valid_itera…
…tion_annotator.cc PiperOrigin-RevId: 699062814
Configuration menu - View commit details
-
Copy full SHA for 410e20e - Browse repository at this point
Copy the full SHA 410e20eView commit details -
[Cleanup] Use HloPredicateIs(Not)Op in collective_select_folder.cc
PiperOrigin-RevId: 699067483
Configuration menu - View commit details
-
Copy full SHA for c082b0e - Browse repository at this point
Copy the full SHA c082b0eView commit details -
Move LinkGpuAsm into separate file
This adds a new target `:driver_compilation` and moves `LinkGpuAsm` into a new file `driver_compilatio.cc` I'm also bringing back the `StreamExecutor` argument for being able to call `ActicateContext` which I had removed mistakenly in a previous CL. The active context is indeed needed. The goal is to separate out all the different PTX compilation and linking methods, make them independently testable and optional. PiperOrigin-RevId: 699071278
Configuration menu - View commit details
-
Copy full SHA for beccd2c - Browse repository at this point
Copy the full SHA beccd2cView commit details -
[Cleanup] Use HloPredicateIs(Not)Op in collective_send_recv_combiner.cc
PiperOrigin-RevId: 699071557
Configuration menu - View commit details
-
Copy full SHA for ce26a52 - Browse repository at this point
Copy the full SHA ce26a52View commit details -
Add a simple test for the symbol_finder
Also renames the target for consistency. PiperOrigin-RevId: 699076843
Configuration menu - View commit details
-
Copy full SHA for 66d2fd0 - Browse repository at this point
Copy the full SHA 66d2fd0View commit details -
[Cleanup] Use HloPredicateIs(Not)Op in command_buffer_scheduling.cc
PiperOrigin-RevId: 699079990
Configuration menu - View commit details
-
Copy full SHA for 4ac933c - Browse repository at this point
Copy the full SHA 4ac933cView commit details -
Change parameter type in LinkUsingNvlink
`LinkUsingNvlink` and `LinkGpuAsmUsingDriver` used to take a list of `CubinOrPTXImage` structs as inputs, but the functions doen't even support compiling PTX, so it's very misleading. I change the parameter type to a a list of byte arrays (`std::vector<uint8_t>`) which is what we use everywhere else for representing compiled modules (CUBINS). PiperOrigin-RevId: 699082261
Configuration menu - View commit details
-
Copy full SHA for f523817 - Browse repository at this point
Copy the full SHA f523817View commit details -
PR #19656: Fix implicit index handling in ScatterDeterminismExpander
Imported from GitHub PR openxla/xla#19656 This PR fixes a bug related to handling missing (implied) indices and adds the corresponding tests. 1. When `scatter_dims_to_operand_dims` size is not equal to the operand rank, the `out_of_bound_tensor` has incorrect dimensions, resulting in mismatched shapes of the select op. This is fixed at line 718. 2. When the update is not scalar, the indices are recalculated - this requires updating the `out_of_bound_tensor` (lines 757-761). 3. After expanding the indices, the `has_scalar_indices` flag has to be updated (line 777). Also added a few cosmetic changes: 1. Removed `is_one_dimensional` branch in `ExpandIndices`, as this never happens (probably an artefact from prior implementation). 2. Broadcast the boundary constants instead of generating a (possibly big) literal. Copybara import of the project: -- 2e38efc0c9efc2f708058bd2ae526f13d2ed8354 by Sergey Kozub <[email protected]>: Fix implicit index handling in ScatterDeterminismExpander Merging this change closes #19656 PiperOrigin-RevId: 699083584
Configuration menu - View commit details
-
Copy full SHA for 7a6f538 - Browse repository at this point
Copy the full SHA 7a6f538View commit details -
[XLA] Propagate original_value when instructions are replaced in X64R…
…ewriter This copies over original_value attribute when an value is replaced during this pass. PiperOrigin-RevId: 699087576
Configuration menu - View commit details
-
Copy full SHA for 4e36ed8 - Browse repository at this point
Copy the full SHA 4e36ed8View commit details -
PR #19679: [XLA:CPU][oneDNN] Relocate Addend Shape Validation to the …
…Contraction Rewriter Imported from GitHub PR openxla/xla#19679 This PR moves the addend shape check to the rewriter so that the code to append oneDNN post-ops can be shared between matmul and convolution kernels. Copybara import of the project: -- c6497851473b2ec5b5041de459e4aaa3c8c2cb93 by Akhil Goel <[email protected]>: Move addend check Merging this change closes #19679 PiperOrigin-RevId: 699095534
Configuration menu - View commit details
-
Copy full SHA for c390ae7 - Browse repository at this point
Copy the full SHA c390ae7View commit details -
PR #19346: Bumped rules_python version to 0.39.0
Imported from GitHub PR openxla/xla#19346 cc @hawkinsp Copybara import of the project: -- 292e7ebb7ee57e5af5977c08f0aaf28fc1f852e2 by vfdev-5 <[email protected]>: Bumped rules_python version to 0.39.0 Merging this change closes #19346 PiperOrigin-RevId: 699100796
Configuration menu - View commit details
-
Copy full SHA for 6332b90 - Browse repository at this point
Copy the full SHA 6332b90View commit details -
Re-enable deterministic scatter expander pass by default.
The issues which we have hit previously seem to be fixed now. PiperOrigin-RevId: 699120716
Configuration menu - View commit details
-
Copy full SHA for 29090a9 - Browse repository at this point
Copy the full SHA 29090a9View commit details -
Clean up disabling reduce_hlo_test on TPU
PiperOrigin-RevId: 699125504
Configuration menu - View commit details
-
Copy full SHA for a66b63b - Browse repository at this point
Copy the full SHA a66b63bView commit details -
PR #19577: Cleanup handling of 2 fields of ExecutableBuildOptions.
Imported from GitHub PR openxla/xla#19577 Copybara import of the project: -- b4180bb5e59c92b374eb16fc59d6f03d7f37db4a by Ilia Sergachev <[email protected]>: Cleanup handling of 3 fields of ExecutableBuildOptions. -- 21206eb838fa04dabaddec0aa8cdf73789ce8206 by Ilia Sergachev <[email protected]>: add a test -- a6571ef2ac7ec6a94056b1588a3260ecc7d9db17 by Ilia Sergachev <[email protected]>: cleanup -- 44d479f3d3d6320d37d35934cf81596e50e10c51 by Ilia Sergachev <[email protected]>: add missing newline -- c3c550f491b2fc03dacdf1101042a3fbadd51e7c by Ilia Sergachev <[email protected]>: add missing include -- 5acf13c0423b7aef87f81b86ac95a0a1471927f1 by Ilia Sergachev <[email protected]>: ignore key_value_store Merging this change closes #19577 PiperOrigin-RevId: 699125923
Configuration menu - View commit details
-
Copy full SHA for 745b834 - Browse repository at this point
Copy the full SHA 745b834View commit details -
[xla:cpu] NFC: Remove ExecuteState alias from Thunk
PiperOrigin-RevId: 699128079
Configuration menu - View commit details
-
Copy full SHA for 1a1dad7 - Browse repository at this point
Copy the full SHA 1a1dad7View commit details -
Add cuda::CompilationProvider interface and first implementation for …
…subprocess compilation This adds a new interface `CompilationProvider` which offers `PTX` to `CUBIN` compilation. It also adds the first implementation of this interface, the `SubprocessCompilationProvider` which uses ptxas and nvlink to the compilation. Some additional changes were also needed: - New type `CompilationOptions` which collects and documents all compilation options in one place. - Some additional overloads in `:subprocess_compilation` where needed so that the `SubprocessCompilationProvider` can control the exact file path to ptxas and nvlink. - A fairly comprehensive test suite for the compilation provider is also added. PiperOrigin-RevId: 699134414
Configuration menu - View commit details
-
Copy full SHA for 888a584 - Browse repository at this point
Copy the full SHA 888a584View commit details -
[xla:cpu] Add a benchmark for creating zero-copy PjRt buffer
------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------ BM_CreateZeroCopyBuffer 234 ns 234 ns 3075841 PiperOrigin-RevId: 699137060
Configuration menu - View commit details
-
Copy full SHA for b260df9 - Browse repository at this point
Copy the full SHA b260df9View commit details -
[XLA:GPU] Fusion tests don't seem to require A100, so replace tag.
tf_cuda_tests_tags() seems to work as well. Add hermetic_cuda_data_dir parameter as well, so that e.g. ptxas can be found. Also use linkopts = ["-Wl,-rpath,$$ORIGIN/../lit_lib"] so that the dynamic libraries are found, which are symlinked from the lit_lib directory. PiperOrigin-RevId: 699146874
Configuration menu - View commit details
-
Copy full SHA for 90dc9c6 - Browse repository at this point
Copy the full SHA 90dc9c6View commit details -
#sdy Refactor
xla-sdy-mhlo-round-trip-shard-map-export
from a `Conv……ersionPattern` to a walk. PiperOrigin-RevId: 699148617
Configuration menu - View commit details
-
Copy full SHA for 3952aca - Browse repository at this point
Copy the full SHA 3952acaView commit details -
[XLA:GPU] Fix a bug in dot_algorithm_rewriter.
The low_f32 should be rounded to bf16 instead of truncation. PiperOrigin-RevId: 699154452
Configuration menu - View commit details
-
Copy full SHA for d70ee79 - Browse repository at this point
Copy the full SHA d70ee79View commit details -
PiperOrigin-RevId: 699159107
Configuration menu - View commit details
-
Copy full SHA for 631bee3 - Browse repository at this point
Copy the full SHA 631bee3View commit details -
Integrate LLVM at llvm/llvm-project@a12e79a85fc1
Updates LLVM usage to match [a12e79a85fc1](llvm/llvm-project@a12e79a85fc1) PiperOrigin-RevId: 699163893
Configuration menu - View commit details
-
Copy full SHA for 6aa3801 - Browse repository at this point
Copy the full SHA 6aa3801View commit details -
Configuration menu - View commit details
-
Copy full SHA for 68fddf7 - Browse repository at this point
Copy the full SHA 68fddf7View commit details -
[XLA] Go back to using a glob for including dialects in the `mlir_int…
…erpreter`. This is more in line with how the dialects were meant to be added according to the readme file in the parent directory. PiperOrigin-RevId: 699169422
Configuration menu - View commit details
-
Copy full SHA for 7a77dcc - Browse repository at this point
Copy the full SHA 7a77dccView commit details -
Prevent dequantizing/requantizing
f16
tof32
and back (2nd try).What this change does is it: 1. Identifies all `kTfLiteBuiltinDequantize` nodes converting `kTfLiteFloat16` to `kTfLiteFloat32` and plugging into a `kTfLiteBuiltinFullyConnected`, `kTfLiteBuiltinConv2d`, or `kTfLiteBuiltinDepthwiseConv2d` node. 2. Re-maps XNNPACK tensors pointing to the `kTfLiteFloat32` output to point to the original `kTfLiteFloat16` input. The `kTfLiteFloat16` weights/filters and biases are handled by XNNPACK directly. PiperOrigin-RevId: 699184221
Configuration menu - View commit details
-
Copy full SHA for 857e530 - Browse repository at this point
Copy the full SHA 857e530View commit details -
Merge pull request #80484 from tensorflow:fixtypos07
PiperOrigin-RevId: 699195297
Configuration menu - View commit details
-
Copy full SHA for 62970a9 - Browse repository at this point
Copy the full SHA 62970a9View commit details -
Fix two issues in
PartitionScatterIndexPassthroughDimensions
.We infer the update sharding from update to obtain `passthrough_sharding`. This `passthrough_sharding` should be merged with the existing update sharding, such that we may keep the original sharding axes in update. The added all-reduce are along the sharding axes along index pass-through dimensions. It should not be along the sharding axes along explicit batch dims or index vector dim. PiperOrigin-RevId: 699206933
Configuration menu - View commit details
-
Copy full SHA for 820d85b - Browse repository at this point
Copy the full SHA 820d85bView commit details -
Stop using xla/statusor.h in favor of absl/status/statusor.h directly.
PiperOrigin-RevId: 699215454
Configuration menu - View commit details
-
Copy full SHA for 4c7e533 - Browse repository at this point
Copy the full SHA 4c7e533View commit details -
PR #19660: [ROCm] switch rocm build to clang
Imported from GitHub PR openxla/xla#19660 This PR switches the default rocm build to clang as the gcc config is broken at the moment. Copybara import of the project: -- ea48f7c480d110eab3f133ed6ea8989da0e1e724 by Alexandros Theodoridis <[email protected]>: [ROCm] switch rocm build to clang -- 2743fabafd6a358c05e858781064e7fa2e389c78 by Alexandros Theodoridis <[email protected]>: Remove explicit clang path from the bazelrc rocm config -- 202dea0a80602cafdbee6067d8f20dc3055c6bbb by Alexandros Theodoridis <[email protected]>: Address review comments Merging this change closes #19660 PiperOrigin-RevId: 699222609
Configuration menu - View commit details
-
Copy full SHA for 28df421 - Browse repository at this point
Copy the full SHA 28df421View commit details -
[xla:collectives] Initial xla/core/collectives component commit
Next step is to migrate NcclComm and NcclOwnedComm to std::unique_ptr<Communicator> and proper virtual inheritance. PiperOrigin-RevId: 699233544
Configuration menu - View commit details
-
Copy full SHA for 224d07c - Browse repository at this point
Copy the full SHA 224d07cView commit details -
Further lower threshold for F64 in //xla/service/gpu/model:hlo_op_pro…
…filer_test This was originally proposed in openxla/xla#16102, but I still ran into issue where it failed by slight margin: ``` Expected: (profiler.MeasureClockCyclesPerOp(HloOpcode::kDivide, F64) .value() .clock_cycles()) > (300), actual: 296 vs 300 ``` That said, I ran 1000 tests and did not encounter this issue. Reducing the threshold to 280 since the bound seems very close and flaky test is no good either way. PiperOrigin-RevId: 699233864
Configuration menu - View commit details
-
Copy full SHA for 7c823c8 - Browse repository at this point
Copy the full SHA 7c823c8View commit details -
[xla:cpu] Add a KernelRunner API to codegen testlib and sketch a test…
… for XLA:CPU PiperOrigin-RevId: 699234540
Configuration menu - View commit details
-
Copy full SHA for 7c35801 - Browse repository at this point
Copy the full SHA 7c35801View commit details -
Lower the max bytes threshold used by the proto splitter
PiperOrigin-RevId: 699235057
Configuration menu - View commit details
-
Copy full SHA for cba80fe - Browse repository at this point
Copy the full SHA cba80feView commit details -
Fix test subgraph creation for StableHLO composite nodes.
Also fixes a few missing includes. Uses C++ includes instead or C ones. PiperOrigin-RevId: 699237969
Configuration menu - View commit details
-
Copy full SHA for 8310815 - Browse repository at this point
Copy the full SHA 8310815View commit details -
[tflite-gpu] Add REDUCE_ALL && REDUCE_ANY to gpu_compatibility
PiperOrigin-RevId: 699238045
Configuration menu - View commit details
-
Copy full SHA for bde84b7 - Browse repository at this point
Copy the full SHA bde84b7View commit details -
[xla:cpu] Add JitCompiler and FunctionLibrary APIs for XLA:CPU codegen
Define APIs for compiling LLVM modules to functions required by the XLA:CPU runtime: kernels, comparators, etc. Implementation largely exists as SimpleOrcJit in service/cpu, but it's tightly coupled with "legacy" XLA. PiperOrigin-RevId: 699239722
Configuration menu - View commit details
-
Copy full SHA for 5638e32 - Browse repository at this point
Copy the full SHA 5638e32View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2f87d0f - Browse repository at this point
Copy the full SHA 2f87d0fView commit details -
[XLA:GPU] remove channel ID checks in hlo_instructions.cc
PiperOrigin-RevId: 699247019
Configuration menu - View commit details
-
Copy full SHA for 72dd3b2 - Browse repository at this point
Copy the full SHA 72dd3b2View commit details -
* De-dupe logic in test common and model_buffer. * Factor out the flatbuffer model wrapper from the class in test common and move to flatbuffer_tools. * Add some extra helpers for flatbuffers in flatbuffer_tools, and add test. * Hide all the usage of `std::filesystem` stuff in one cc. Technically `<filesystem>` is an unapproved header. * Update model_load to use the flatbuffer tools. * Pull some of the member functions of "model unpacker" out into non-member functions. PiperOrigin-RevId: 699249089
Configuration menu - View commit details
-
Copy full SHA for ec81a53 - Browse repository at this point
Copy the full SHA ec81a53View commit details -
[XLA:CPU] Support asynchronous execution for custom transposed convol…
…utions Performance is comparable to the synchronous version. Detailed results (where 'old' is the synchronous execution, 'new' is async execution; both use the same, custom algorithm for transposed conv): name old cpu/op new cpu/op delta BM_Conv1DStrided/process_time 29.4ms ± 6% 29.7ms ± 5% ~ (p=0.841 n=5+5) BM_Conv1DTransposedStrided/process_time 29.6ms ± 2% 30.7ms ± 2% +3.52% (p=0.008 n=5+5) BM_Conv1DTransposedStridedNonDefaultLayout/process_time 28.5ms ± 3% 28.3ms ± 1% ~ (p=0.222 n=5+5) name old time/op new time/op delta BM_Conv1DStrided/process_time 2.68ms ± 7% 2.72ms ± 5% ~ (p=0.548 n=5+5) BM_Conv1DTransposedStrided/process_time 7.91ms ± 3% 7.98ms ± 5% ~ (p=0.548 n=5+5) BM_Conv1DTransposedStridedNonDefaultLayout/process_time 7.00ms ± 2% 7.32ms ± 4% +4.58% (p=0.016 n=5+5) PiperOrigin-RevId: 699250549
Configuration menu - View commit details
-
Copy full SHA for ef30054 - Browse repository at this point
Copy the full SHA ef30054View commit details -
Integrate LLVM at llvm/llvm-project@556ea5265a25
Updates LLVM usage to match [556ea5265a25](llvm/llvm-project@556ea5265a25) PiperOrigin-RevId: 699251575
Configuration menu - View commit details
-
Copy full SHA for f888384 - Browse repository at this point
Copy the full SHA f888384View commit details -
[xla:collectives] Add backends/gpu/collectives:nccl_communicator
NCCL implementation detail will have private visibility, and for all external users (Thunks etc.) we'll export it via public header that uses xla/core/collectives APIs. PiperOrigin-RevId: 699256314
Configuration menu - View commit details
-
Copy full SHA for eaf0194 - Browse repository at this point
Copy the full SHA eaf0194View commit details -
Add support for TensorV1Attr in flatbuffer_export and flatbuffer_oper…
…ator, encoded as follows ``` _TENSOR_V1_<name>: { TENSOR_SHAPE: Vector<i64>, TENSOR_TYPE: tflite::TensorType (casted to i64), TENSOR_DATA: Vector<f32> or Vector<i64> } ``` PiperOrigin-RevId: 699272982
Configuration menu - View commit details
-
Copy full SHA for 807e6fd - Browse repository at this point
Copy the full SHA 807e6fdView commit details -
[IFRT] Add VIFRT pass for converting between VIFRT versions.
The pass runs over a VIFRT module, and tries to convert it to a given target version. PiperOrigin-RevId: 699279298
Configuration menu - View commit details
-
Copy full SHA for 65f6a91 - Browse repository at this point
Copy the full SHA 65f6a91View commit details -
[xla:collectives] Use NcclCommunicator in NcclApi implementation
PiperOrigin-RevId: 699279921
Configuration menu - View commit details
-
Copy full SHA for 9e800e8 - Browse repository at this point
Copy the full SHA 9e800e8View commit details -
[xla:collectives] Remove unused CommDestroy
PiperOrigin-RevId: 699286343
Configuration menu - View commit details
-
Copy full SHA for d1ebdd9 - Browse repository at this point
Copy the full SHA d1ebdd9View commit details -
[Code-Health] Resolve the following technical debt issue: Todo(resolved)
PiperOrigin-RevId: 699309235
Configuration menu - View commit details
-
Copy full SHA for d479650 - Browse repository at this point
Copy the full SHA d479650View commit details
Commits on Nov 23, 2024
-
Use absl::Nonnull to indicate that sharding in xla::ifrt::ArraySpec c…
…annot be null PiperOrigin-RevId: 699310290
Configuration menu - View commit details
-
Copy full SHA for 45abed3 - Browse repository at this point
Copy the full SHA 45abed3View commit details -
Add all the quantized models to test_models constants and try to unif…
…ying the naming. PiperOrigin-RevId: 699317601
Configuration menu - View commit details
-
Copy full SHA for b9f2ce2 - Browse repository at this point
Copy the full SHA b9f2ce2View commit details -
Update target_config to be a text proto and populate it on the
StreamExecutorGpuClient topology description as well. PiperOrigin-RevId: 699320139
Configuration menu - View commit details
-
Copy full SHA for c9d9178 - Browse repository at this point
Copy the full SHA c9d9178View commit details -
Remove absl::Nonnull from AbslStringify
nullptr is handled here. PiperOrigin-RevId: 699323007
Configuration menu - View commit details
-
Copy full SHA for 00a86aa - Browse repository at this point
Copy the full SHA 00a86aaView commit details -
Add support for quantization in litert model load
Also: * Add some helper functions for checking a litert op matches a tfl op which can can also be re-used in other contexts. * Add some quantization related helper functions to flatbuffer_tools * Update dump for quantization * Move thins around a bit and add quantization stuff to model_util support checks PiperOrigin-RevId: 699333588
Configuration menu - View commit details
-
Copy full SHA for ebc16ff - Browse repository at this point
Copy the full SHA ebc16ffView commit details -
[xla:collectives] NFC: Remove communicator aliases from NcclApi
PiperOrigin-RevId: 699337598
Configuration menu - View commit details
-
Copy full SHA for 9f1c1aa - Browse repository at this point
Copy the full SHA 9f1c1aaView commit details -
Add target_config as an optional field of
StreamExecutorGpuTopologyDescription rather than parsing it for every compile. PiperOrigin-RevId: 699344815
Configuration menu - View commit details
-
Copy full SHA for 2358eff - Browse repository at this point
Copy the full SHA 2358effView commit details -
The
MoveUserInstructionsIn
cannot handle the conditional operations…… with array output and multiple users. It may trigger compilation error, such as the added test target. PiperOrigin-RevId: 699357851
Configuration menu - View commit details
-
Copy full SHA for f06a547 - Browse repository at this point
Copy the full SHA f06a547View commit details -
PiperOrigin-RevId: 699361885
Configuration menu - View commit details
-
Copy full SHA for 6a6dd7c - Browse repository at this point
Copy the full SHA 6a6dd7cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 57c775e - Browse repository at this point
Copy the full SHA 57c775eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3c02a74 - Browse repository at this point
Copy the full SHA 3c02a74View commit details -
Move BatchedGatherScatterNormalizer from pre-SPMD for pose-SPMD.
PiperOrigin-RevId: 699397857
Configuration menu - View commit details
-
Copy full SHA for c3fd63e - Browse repository at this point
Copy the full SHA c3fd63eView commit details -
Pull the zip functions into a public header
PiperOrigin-RevId: 699409569
Configuration menu - View commit details
-
Copy full SHA for d13a02a - Browse repository at this point
Copy the full SHA d13a02aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5ca90b0 - Browse repository at this point
Copy the full SHA 5ca90b0View commit details -
Merge pull request #79777 from tensorflow:gaikwadrahul8-patch-1
PiperOrigin-RevId: 699496299
Configuration menu - View commit details
-
Copy full SHA for e88d3b3 - Browse repository at this point
Copy the full SHA e88d3b3View commit details -
Merge pull request #80574 from tensorflow:gaikwadrahul8-patch-2
PiperOrigin-RevId: 699497695
Configuration menu - View commit details
-
Copy full SHA for c5f0512 - Browse repository at this point
Copy the full SHA c5f0512View commit details -
Reverts c5f0512 PiperOrigin-RevId: 699499360
Configuration menu - View commit details
-
Copy full SHA for f377b15 - Browse repository at this point
Copy the full SHA f377b15View commit details