Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] master from tensorflow:master #238

Open
wants to merge 1,131 commits into
base: master
Choose a base branch
from
Open
This pull request is big! We’re only showing the most recent 250 commits.

Commits on Nov 19, 2024

  1. Replace MockGpuExecutor with MockStreamExecutor in the only use.

    PiperOrigin-RevId: 698049836
    klucke authored and tensorflower-gardener committed Nov 19, 2024
    Configuration menu
    Copy the full SHA
    9aba913 View commit details
    Browse the repository at this point in the history
  2. Refactor and repurpose the existing fold_broadcast_to_pass to handl…

    …e ALL broadcast-like inputs on TFLite ops that support implicit broadcasting
    
    PiperOrigin-RevId: 698054216
    vamsimanchala authored and tensorflower-gardener committed Nov 19, 2024
    Configuration menu
    Copy the full SHA
    9b76733 View commit details
    Browse the repository at this point in the history
  3. Improve QC compiler plugin configurations.

    * Change default QNN graph config to use HTP FP16 precision backend config, this is required to correctly compile FP32 OPs.
    * Create 1-element 1D tensor out of scalar value, QNN OP always use ranked tensor type as input.
    
    PiperOrigin-RevId: 698081261
    tensorflower-gardener committed Nov 19, 2024
    Configuration menu
    Copy the full SHA
    8a87e42 View commit details
    Browse the repository at this point in the history
  4. Migrate LegalizeTensorListPass to new TFL::Pass mechanism and. remove…

    … the .td definition.
    
    PiperOrigin-RevId: 698082807
    vamsimanchala authored and tensorflower-gardener committed Nov 19, 2024
    Configuration menu
    Copy the full SHA
    f0b88da View commit details
    Browse the repository at this point in the history
  5. Add two simple legalizations and cleanup.

     * Add FC Op legalization and test data.
     * Add Select/Select_v2 Op legalization.
     * Mics cleanups.
    
    PiperOrigin-RevId: 698094953
    tensorflower-gardener committed Nov 19, 2024
    Configuration menu
    Copy the full SHA
    9147aa3 View commit details
    Browse the repository at this point in the history
  6. Internal change only

    PiperOrigin-RevId: 698111562
    SiqiaoWu1993 authored and tensorflower-gardener committed Nov 19, 2024
    Configuration menu
    Copy the full SHA
    8916a40 View commit details
    Browse the repository at this point in the history
  7. Add a moduleop to the MlirToHloArgs and enable compilation without se…

    …rializing any modules.
    
    Also pulled the deserialization a little further up the stack and only do it if the input doesn't already have a full module op.
    
    PiperOrigin-RevId: 698116466
    tensorflower-gardener committed Nov 19, 2024
    Configuration menu
    Copy the full SHA
    aa88ff7 View commit details
    Browse the repository at this point in the history
  8. Move stable hlo compile test to XLA:CPU public API

    PiperOrigin-RevId: 698121993
    changm authored and tensorflower-gardener committed Nov 19, 2024
    Configuration menu
    Copy the full SHA
    7bd28c8 View commit details
    Browse the repository at this point in the history
  9. Add test cases for QC compiler plugin.

    PiperOrigin-RevId: 698132925
    tensorflower-gardener committed Nov 19, 2024
    Configuration menu
    Copy the full SHA
    ac16d7f View commit details
    Browse the repository at this point in the history
  10. Remove unneeded xla:statusor dependency.

    PiperOrigin-RevId: 698133024
    klucke authored and tensorflower-gardener committed Nov 19, 2024
    Configuration menu
    Copy the full SHA
    0fd96fa View commit details
    Browse the repository at this point in the history
  11. [Upkeep][XLA-Code-Health] Resolve 2 instances of the following issue:…

    … Todo (resolved)
    
    PiperOrigin-RevId: 698133747
    Varcho authored and tensorflower-gardener committed Nov 19, 2024
    Configuration menu
    Copy the full SHA
    509b91c View commit details
    Browse the repository at this point in the history
  12. Update the default Python version to 3.11

    This is to fix issue with gsutil which expects Python 3.5-3.11:
    ```
    Error: gsutil requires Python version 2.7 or 3.5-3.11, but a different version is installed.
    ```
    PiperOrigin-RevId: 698134102
    nitins17 authored and tensorflower-gardener committed Nov 19, 2024
    Configuration menu
    Copy the full SHA
    622ea93 View commit details
    Browse the repository at this point in the history
  13. Fold FillOp into TFL Ops that support implicit broadcasting.

    PiperOrigin-RevId: 698137647
    vamsimanchala authored and tensorflower-gardener committed Nov 19, 2024
    Configuration menu
    Copy the full SHA
    d040ef2 View commit details
    Browse the repository at this point in the history
  14. Add a few more mlir based test model

    PiperOrigin-RevId: 698150097
    LukeBoyer authored and tensorflower-gardener committed Nov 19, 2024
    Configuration menu
    Copy the full SHA
    09a9566 View commit details
    Browse the repository at this point in the history
  15. Add backend kwargs to xla tests.

    PiperOrigin-RevId: 698163185
    tensorflower-gardener committed Nov 19, 2024
    Configuration menu
    Copy the full SHA
    feeb338 View commit details
    Browse the repository at this point in the history
  16. [XLA:MSA] Remove unnecessary Extend() call in memory space assignment…

    …. This Extend() call would also lead to a memory assignment issue since it wasn't accompanied by the necessary chunk commit requests. We also add a VerifyAllocations() function that uses a BufferIntervalTree to check for overlapping Allocations before scheduling the asynchronous copies. This is an extra check for the correctness of MsaAlgorithm allocations, and is only applied if options_.verify is enabled in MSA options. options_.verify is disabled by default.
    
    PiperOrigin-RevId: 698164396
    mehrdadkhani authored and tensorflower-gardener committed Nov 19, 2024
    Configuration menu
    Copy the full SHA
    37fa2bb View commit details
    Browse the repository at this point in the history
  17. Add quantized OP in test data.

    PiperOrigin-RevId: 698164750
    tensorflower-gardener committed Nov 19, 2024
    Configuration menu
    Copy the full SHA
    41d42d8 View commit details
    Browse the repository at this point in the history
  18. Move some tests to public XLA:CPU API

    PiperOrigin-RevId: 698164921
    changm authored and tensorflower-gardener committed Nov 19, 2024
    Configuration menu
    Copy the full SHA
    9df12f0 View commit details
    Browse the repository at this point in the history
  19. [IFRT] Legalize IFRT dialect into VIFRT dialect.

    This change adds the legalization pass from IFRT to VIFRT. Legalization uses a templated OpConversion class, which is refined via the `IFRT` <-> `VIFRT` and `mlir::Func::*` <-> `VIFRT` op mappings defined in `map_ifrt_to_vifrt.h` The change versions also `mlir::func::FuncOp`, `mlir::func::ReturnOp` and `mlir::func::CallOp` because this provides the following advantages: 1) we can use the templated OpConversion class rather than implementing a separate converter for each op, and 2) we can restrict the surface of possible breaking changes to just builtin types and attributes. Moreover, the change versions `mlir::FunctionType` and `mlir::TypeAttr` in order to be able to use the generic Op converter, and to restrict types allowed in functions (just builtin and IFRT types).
    
    PiperOrigin-RevId: 698168526
    ICGog authored and tensorflower-gardener committed Nov 19, 2024
    Configuration menu
    Copy the full SHA
    4f86f56 View commit details
    Browse the repository at this point in the history

Commits on Nov 20, 2024

  1. Add a test to check C header compiler compatibility

    Also fixed invalid C++ header usage.
    
    PiperOrigin-RevId: 698170878
    terryheo authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    af5962f View commit details
    Browse the repository at this point in the history
  2. Migrate WhileOutlinePass to new TFL::Pass mechanism and. remove the .…

    …td definition.
    
    PiperOrigin-RevId: 698171237
    vamsimanchala authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    ac8fea5 View commit details
    Browse the repository at this point in the history
  3. Remove unneeded use of gpu_types.h in topk_kernel_test.cc.

    PiperOrigin-RevId: 698174417
    klucke authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    dcf6c7a View commit details
    Browse the repository at this point in the history
  4. Reverts feeb338

    PiperOrigin-RevId: 698189797
    pizzud authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    8ceaba2 View commit details
    Browse the repository at this point in the history
  5. Remove dead ShapeContainsToken in HLO verifier

    PiperOrigin-RevId: 698196106
    frgossen authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    caa1197 View commit details
    Browse the repository at this point in the history
  6. legalization_op_config: Delete unused IsOpLegalizedWithMlir.

    PiperOrigin-RevId: 698201598
    pizzud authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    6ec3612 View commit details
    Browse the repository at this point in the history
  7. Move StableHLO test to public XLA:CPU PJRT plugin

    PiperOrigin-RevId: 698212499
    changm authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    43227b2 View commit details
    Browse the repository at this point in the history
  8. Internal change only

    PiperOrigin-RevId: 698218778
    SiqiaoWu1993 authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    79dee4c View commit details
    Browse the repository at this point in the history
  9. Add a new 'priority_merge' mixed priority batching policy.

    PiperOrigin-RevId: 698221629
    tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    c3ca149 View commit details
    Browse the repository at this point in the history
  10. Stop using gpu_types.h where it's not needed.

    PiperOrigin-RevId: 698228306
    klucke authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    5158891 View commit details
    Browse the repository at this point in the history
  11. [Upkeep][XLA-Code-Health] Resolve the following technical debt issue:…

    … Todo(resolved)
    
    PiperOrigin-RevId: 698230798
    Varcho authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    fcbd379 View commit details
    Browse the repository at this point in the history
  12. Cleanup. Refactor GetGatherScatterBatchParallelDims. No behavior change.

    PiperOrigin-RevId: 698230884
    ZixuanJiang authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    f06fbf0 View commit details
    Browse the repository at this point in the history
  13. Update ops-related pbtxt files.

    PiperOrigin-RevId: 698237370
    tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    d83bca1 View commit details
    Browse the repository at this point in the history
  14. Migrate UnfoldLargeSplatConstantPass to new TFL::Pass mechanism and. …

    …remove the .td definition.
    
    PiperOrigin-RevId: 698241447
    vamsimanchala authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    a559db0 View commit details
    Browse the repository at this point in the history
  15. Move compiler plugin unique ptr alias to cc api. Also use string view…

    … for bytes return from plugin in tests to avoid copy
    
    PiperOrigin-RevId: 698251740
    LukeBoyer authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    42fed64 View commit details
    Browse the repository at this point in the history
  16. [XLA:SPMD] Add HLO annotation to disable collective matmul in SPMD.

    PiperOrigin-RevId: 698271808
    seherellis authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    3efd256 View commit details
    Browse the repository at this point in the history
  17. Update GraphDef version to 2052.

    PiperOrigin-RevId: 698294876
    tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    0d82241 View commit details
    Browse the repository at this point in the history
  18. compat: Update forward compatibility horizon to 2024-11-20

    PiperOrigin-RevId: 698294898
    tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    e1a2b83 View commit details
    Browse the repository at this point in the history
  19. Add a pattern matcher for ragged dot HLO.

    PiperOrigin-RevId: 698297679
    pravnar authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    1de2011 View commit details
    Browse the repository at this point in the history
  20. Remove custom PTX compilation pipeline from RedzoneAllocator

    We have support for lowering PTX in the runtime, so we can just
    use `MultiKernelLoaderSpec` and we get compilation and caching for free.
    
    PiperOrigin-RevId: 698297929
    beckerhe authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    126b347 View commit details
    Browse the repository at this point in the history
  21. Account for optional channel ID in send/recv error message

    PiperOrigin-RevId: 698302393
    frgossen authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    9bd61d5 View commit details
    Browse the repository at this point in the history
  22. Remove custom compilation call from DynamicSharedMemoryTest

    This is not needed since the runtime can compile PTX for us. Actually I'm surprised
    that this even worked because this original code compiled PTX into CUBIN and then
    forced the CUBIN into the PTX argument in the kernel creation helper. But this is
    now all fixed.
    
    PiperOrigin-RevId: 698303129
    beckerhe authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    b6a2f70 View commit details
    Browse the repository at this point in the history
  23. Make g_trace_filter_bitmap atomic to avoid race across threads.

    PiperOrigin-RevId: 698304950
    tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    23bda2a View commit details
    Browse the repository at this point in the history
  24. Remove unused GpuAsmOpts parameter from Cholesky and TriangularSolveT…

    …hunks
    
    Also the usual drive-by cleanups:
    - Remove unused includes
    - Add explicit includes for things we depended on transitively
    - Clean up dependencies of the build targets
    
    PiperOrigin-RevId: 698317755
    beckerhe authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    fa7483b View commit details
    Browse the repository at this point in the history
  25. delete hlo-legalize-to-memref-unranked.mlir

    func-bufferize pass is removed by llvm/llvm-project@e394fec
    
    PiperOrigin-RevId: 698322658
    metaflow authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    0156f11 View commit details
    Browse the repository at this point in the history
  26. [XLA:GPU] Combine pipelined instructions as much as possible by default.

    We turn on previously implemented heuristics by default.
    
    PiperOrigin-RevId: 698324486
    golechwierowicz authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    d256bac View commit details
    Browse the repository at this point in the history
  27. Move SparseDotMetaEncodingAttr inside xla

    PiperOrigin-RevId: 698329980
    tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    43b2f90 View commit details
    Browse the repository at this point in the history
  28. PR #19363: Loop Counter Increment in Collective Pipeliner

    Imported from GitHub PR openxla/xla#19363
    
    Sets the loop iteration counter increment in the backward transformation of the collective pipeliner pass to account for cases with non-zero initial value of the loop iteration counter. See #16953 and #18568.
    Copybara import of the project:
    
    --
    06137aa0618d372e2d4badbf16920bead9922cfb by Philipp Hack <[email protected]>:
    
    Modifies the loop counter increment set in the backward transformation of the collective pipeliner.
    
    --
    6da45bcb26643d8994bf608f05230fa748286b02 by Philipp Hack <[email protected]>:
    
    Modifies the loop counter increment set in the backward transformation of the collective pipeliner.
    
    Merging this change closes #19363
    
    PiperOrigin-RevId: 698342374
    philipphack authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    09adcb3 View commit details
    Browse the repository at this point in the history
  29. Multiple subgraphs may share the same delegate.

    Dequantized static data is cached. However, when there are multiple subgraphs, the data is overwritten by each subgraph.
    
    PiperOrigin-RevId: 698342673
    alankelly authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    4bc54e6 View commit details
    Browse the repository at this point in the history
  30. PR #19393: [GPU] Horizontal loop fusion: pass bitcasts when looking f…

    …or fusion candidates.
    
    Imported from GitHub PR openxla/xla#19393
    
    Copybara import of the project:
    
    --
    ec107a12fbee6826f1f668218b7c7a40f5886420 by Ilia Sergachev <[email protected]>:
    
    [GPU] Horizontal loop fusion: pass bitcasts when looking for fusion candidates.
    
    --
    71241097ce67412246ec18efca5165619601eace by Ilia Sergachev <[email protected]>:
    
    simplify cuDNN norm test
    
    Merging this change closes #19393
    
    PiperOrigin-RevId: 698357838
    sergachev authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    c76ae32 View commit details
    Browse the repository at this point in the history
  31. [TritonGPU] Add DotLike trait to SparseDotOp

    PiperOrigin-RevId: 698360931
    tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    c36f39c View commit details
    Browse the repository at this point in the history
  32. Use OpTrait::DotLike to identify dot-like operations

    PiperOrigin-RevId: 698372450
    tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    3d74da5 View commit details
    Browse the repository at this point in the history
  33. PR #19484: [ROCm] Fix //xla/tests:complex_unary_op_test and //xla/ser…

    …vice/gpu/te…
    
    Imported from GitHub PR openxla/xla#19484
    
    …sts:gpu_input_fusible_slice_test
    
    Copybara import of the project:
    
    --
    0d307384bff386d5182f89ae5a5422f8ca1a1290 by Dragan Mladjenovic <[email protected]>:
    
    [ROCm] Fix //xla/tests:complex_unary_op_test and //xla/service/gpu/tests:gpu_input_fusible_slice_test
    
    Merging this change closes #19484
    
    PiperOrigin-RevId: 698374588
    draganmladjenovic authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    499a186 View commit details
    Browse the repository at this point in the history
  34. [Triton] Restrict block_m to be > 16 in the GEMM autotuner to resolve…

    … CUDA_ERROR_ILLEGAL_ADDRESS in (micro)benchmarks with FP8 Triton kernels during exhaustive autotuning.
    
    PiperOrigin-RevId: 698387396
    Moerafaat authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    9ee8948 View commit details
    Browse the repository at this point in the history
  35. [XLA:ALGEBRAIC_SIMPLIFIER] Turn constant all-gather into broadcast

    PiperOrigin-RevId: 698388778
    blakehechtman authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    48a554c View commit details
    Browse the repository at this point in the history
  36. Configuration menu
    Copy the full SHA
    ed9291f View commit details
    Browse the repository at this point in the history
  37. Merge sparsity_layout.patch into sparse_dot.patch

    PiperOrigin-RevId: 698389323
    tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    ad81c08 View commit details
    Browse the repository at this point in the history
  38. Prevent dequantizing/requantizing f16 to f32 and back.

    What this change does is it:
    
     1. Identifies all `kTfLiteBuiltinDequantize` nodes converting `kTfLiteFloat16` to `kTfLiteFloat32` and plugging into a `kTfLiteBuiltinFullyConnected`, `kTfLiteBuiltinConv2d`, or `kTfLiteBuiltinDepthwiseConv2d` node.
     2. Re-maps XNNPACK tensors pointing to the `kTfLiteFloat32` output to point to the original `kTfLiteFloat16` input.
    
    The `kTfLiteFloat16` weights/filters and biases are handled by XNNPACK directly.
    
    PiperOrigin-RevId: 698395748
    tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    4fc4983 View commit details
    Browse the repository at this point in the history
  39. Cleanup. Simplify the gather/scatter related functions in hlo_shardin…

    …g_util by using `PropagateShardingAlongDimsAndReplicateOthers`. This is a no-op change.
    
    PiperOrigin-RevId: 698403022
    ZixuanJiang authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    58b5cf4 View commit details
    Browse the repository at this point in the history
  40. Update docs to make use of new API for adding a TfLiteRegistrationExt…

    …ernal to a MutableOpResolver.
    
    PiperOrigin-RevId: 698407108
    fergushenderson authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    86b52e2 View commit details
    Browse the repository at this point in the history
  41. Rename a Google Tensor field

    PiperOrigin-RevId: 698410181
    tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    f635a02 View commit details
    Browse the repository at this point in the history
  42. [XLA:GPU] Adjust GetNumWarps heuristic in Tiled Cost Model.

    We need to adjust the heuristic because before our emitter had an issue that prevented Triton from doing proper layout optimizations. It was fixed in openxla/xla@7280b9a.
    
    We needed to use higher number of warps (up to 32) before to cover the lack of layout optimization, but now it can cause performance regressions, because Triton likes to insert shmem usage and barrier syncs.
    
    PiperOrigin-RevId: 698416298
    olegshyshkov authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    6bdf799 View commit details
    Browse the repository at this point in the history
  43. Do not allow overlap between explicit and implicit batching dims in g…

    …ather/scatter instructions. Implicit batching dims are also known as index parallel dims.
    
    Update `GetGatherScatterBatchParallelDims` accordingly. The sharding propagation and spmd partitioner will process explicit and implicit batching dims separately.
    
    PiperOrigin-RevId: 698421986
    ZixuanJiang authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    870f8ff View commit details
    Browse the repository at this point in the history
  44. [xla:cpu] Add initial implementation of NanoRt backends for XLA:CPU

    Minimal XLA:CPU runtime implementation optimized for low latency inference.
    
    --------------------------------------------------------------
    Benchmark                    Time             CPU   Iterations
    --------------------------------------------------------------
    BM_NanoRtAddScalars       84.8 ns         84.8 ns      8277118
    BM_NanoRtFibonacci        81.1 ns         81.1 ns      8468298
    BM_PjRtAddScalars         1517 ns         1517 ns       460076
    BM_PjRtFibonacci          1523 ns         1523 ns       460415
    
    PiperOrigin-RevId: 698426607
    ezhulenev authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    bbaf53b View commit details
    Browse the repository at this point in the history
  45. Remove unneeded xla:status and xla::statusor dependencies.

    PiperOrigin-RevId: 698427377
    klucke authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    311693a View commit details
    Browse the repository at this point in the history
  46. [tsl] CountDownAsyncValueRef: enforce memory ordering around fetch_sub

    name                     old cpu/op   new cpu/op   delta
    BM_CountDownSuccess/4    97.6ns ± 2%  97.9ns ± 1%    ~     (p=0.841 n=5+5)
    BM_CountDownSuccess/8     123ns ± 2%   122ns ± 1%    ~     (p=0.548 n=5+5)
    BM_CountDownSuccess/16    171ns ± 1%   172ns ± 2%    ~     (p=0.548 n=5+5)
    BM_CountDownSuccess/32    270ns ± 1%   271ns ± 1%    ~     (p=0.310 n=5+5)
    BM_CountDownError/4       215ns ± 1%   212ns ± 3%    ~     (p=0.310 n=5+5)
    BM_CountDownError/8       309ns ± 2%   307ns ± 1%    ~     (p=0.421 n=5+5)
    BM_CountDownError/16      500ns ± 1%   496ns ± 2%    ~     (p=0.421 n=5+5)
    BM_CountDownError/32      888ns ± 1%   885ns ± 2%    ~     (p=0.548 n=5+5)
    
    PiperOrigin-RevId: 698431683
    cota authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    6602db5 View commit details
    Browse the repository at this point in the history
  47. [Code-Health] Resolve the following technical debt issue: Todo(resolv…

    …ed) in CUDA BUILD file.
    
    PiperOrigin-RevId: 698444212
    Varcho authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    81b5c3a View commit details
    Browse the repository at this point in the history
  48. [XLA-Code-Health] Resolve 2 instances of the following issue: Todo (r…

    …esolved)
    
    PiperOrigin-RevId: 698444339
    Varcho authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    90b9cff View commit details
    Browse the repository at this point in the history
  49. [Code-Health] Resolve the following technical debt issue:

    	Todo(resolved)
    
    PiperOrigin-RevId: 698445171
    Varcho authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    fe84b76 View commit details
    Browse the repository at this point in the history
  50. [Code-Health] Resolve the following technical debt issue:

    	Todo(resolved)
    
    PiperOrigin-RevId: 698445177
    Varcho authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    3e19c4b View commit details
    Browse the repository at this point in the history
  51. Refactor exhaustive_test_main into a separate library target

    PiperOrigin-RevId: 698448973
    IllogicalMoose authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    ab446e5 View commit details
    Browse the repository at this point in the history
  52. Internal CI/CD change

    PiperOrigin-RevId: 698452075
    changm authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    5743cb9 View commit details
    Browse the repository at this point in the history
  53. [XLA:CPU] Add benchmarks for 2D strided convolutions

    Currently the transposed convolution is orders of magnitude slower than the regular one. Ideally performance should be similar. Detailed results:
    
    ----------------------------------------------------------------------------------
    Benchmark                                        Time             CPU   Iterations
    ----------------------------------------------------------------------------------
    BM_Conv2DStrided/process_time              3737222 ns     41608631 ns           16
    BM_Conv2DTransposedStrided/process_time  590079914 ns   1.0847e+10 ns            1
    
    PiperOrigin-RevId: 698453016
    Adam-Banas authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    c895d87 View commit details
    Browse the repository at this point in the history
  54. Fix comments in convolution_test_1d.cc

    The correct output dimension when dumped to HLO text is `bf0`, where `f` means the output feature dimension. There is no dimension called `o`.
    
    PiperOrigin-RevId: 698453240
    Adam-Banas authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    f98c25e View commit details
    Browse the repository at this point in the history
  55. [tsl:concurrency] Keep AsyncValueRef a part of CountDownAsyncValueRef…

    … State
    
    By keeping AsyncValueRef as a part of the State we avoid one extra reference counting operation when copying CountDownAsyncValue (and we expect to copy it `cnt` times).
    
    name                     old cpu/op   new cpu/op   delta
    BM_CountDownSuccess/8    95.8ns ± 4%  81.7ns ± 1%  -14.64%  (p=0.000 n=40+35)
    BM_CountDownSuccess/16    142ns ± 1%   127ns ± 1%  -10.05%  (p=0.000 n=37+38)
    BM_CountDownSuccess/32    229ns ± 2%   216ns ± 1%   -5.56%  (p=0.000 n=40+38)
    BM_CountDownError/4       165ns ± 1%   152ns ± 2%   -7.65%  (p=0.000 n=39+40)
    BM_CountDownError/8       238ns ± 2%   225ns ± 1%   -5.65%  (p=0.000 n=40+38)
    BM_CountDownError/16      388ns ± 2%   369ns ± 2%   -4.77%  (p=0.000 n=40+36)
    BM_CountDownError/32      684ns ± 1%   666ns ± 1%   -2.50%  (p=0.000 n=38+38)
    
    PiperOrigin-RevId: 698454410
    ezhulenev authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    4446af5 View commit details
    Browse the repository at this point in the history
  56. Forgot to reset the map of skipped f16->f32 dequantizations betwe…

    …en calls to `Delegate::PrepareOpsToDelegate`.
    
    PiperOrigin-RevId: 698455074
    tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    555a259 View commit details
    Browse the repository at this point in the history
  57. Reverts f62e0d0

    PiperOrigin-RevId: 698462670
    BlaziusMaximus authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    cf6e234 View commit details
    Browse the repository at this point in the history
  58. [xla:cpu] Replace Thunk::ExecuteEvent with tsl::CountDownAsyncValueRef

    PiperOrigin-RevId: 698466696
    ezhulenev authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    39fa07c View commit details
    Browse the repository at this point in the history
  59. PR #19237: [GPU] Fix passing of key-value store handle from client to…

    … compiler.
    
    Imported from GitHub PR openxla/xla#19237
    
    Copybara import of the project:
    
    --
    177f911fd4c6af86c25aba2e38ea09767477be03 by Ilia Sergachev <[email protected]>:
    
    [GPU] Fix passing of key-value store handle from client to compiler.
    
    --
    ec2b96ccdf8cd81abdc25f3cff2bdf65df455219 by Ilia Sergachev <[email protected]>:
    
    use allowed_devices instead of CUDA_VISIBLE_DEVICES
    
    --
    77ba9fd7b172052269fafd1a1970d58d1d803a59 by Ilia Sergachev <[email protected]>:
    
    skip the added test on pre-Ampere GPUs
    
    Merging this change closes #19237
    
    PiperOrigin-RevId: 698469112
    sergachev authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    031f776 View commit details
    Browse the repository at this point in the history
  60. [XLA:GPU] Use HloPredicateIsOp in collective_select_folder

    PiperOrigin-RevId: 698473752
    frgossen authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    40526ad View commit details
    Browse the repository at this point in the history
  61. Remove unused gpu_types.h include from nccl_collective_thunk.cc

    PiperOrigin-RevId: 698474274
    beckerhe authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    20222d8 View commit details
    Browse the repository at this point in the history
  62. [XLA:GPU] Use ShuffleOp to reverse the order of elements in a vector.

    No functional change is intended but it generates less IR.
    
    PiperOrigin-RevId: 698477060
    majnemer authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    464bbc2 View commit details
    Browse the repository at this point in the history
  63. [XLA:TPU:MSA] Refactor some utility functions from algorithm and buff…

    …er_interval_comparator into msa/utils.
    
    PiperOrigin-RevId: 698481492
    subhankarshah authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    b740329 View commit details
    Browse the repository at this point in the history
  64. Add backend_kwargs to XLA tests config.

    PiperOrigin-RevId: 698485833
    tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    48e48f9 View commit details
    Browse the repository at this point in the history
  65. IFRT proxy optimization: Make more IFRT operations asynchronous.

    As of this CL, all array operations (except `IsDeleted()`) are asynchronous.
    
    This CL also makes the following drive-by changes:
    
    1. Version management is getting refactored to use an enum and a header file
    within /common.
    
    2. All error responses from the server (except connection
    terminations, which follow the previous behavior) are now printed out as a
    WARNING.
    
    PiperOrigin-RevId: 698491308
    tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    928eb81 View commit details
    Browse the repository at this point in the history
  66. Fix TSAN for new mixed priority unit tests.

    PiperOrigin-RevId: 698494677
    tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    9d25d03 View commit details
    Browse the repository at this point in the history
  67. Move next pluggable device to public XLA:CPU API

    PiperOrigin-RevId: 698502341
    changm authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    0adfb4b View commit details
    Browse the repository at this point in the history
  68. Add types to c api for quantization.

    Also add a bit more comments and re-organize some things.
    
    PiperOrigin-RevId: 698507311
    LukeBoyer authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    67f3f89 View commit details
    Browse the repository at this point in the history
  69. Migrate SplitMergedOperandsPass to new TFL::Pass mechanism and. remov…

    …e the .td definition.
    
    PiperOrigin-RevId: 698510501
    vamsimanchala authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    a1fca06 View commit details
    Browse the repository at this point in the history
  70. [IFRT] Fix signature of CreateIfrtVerifyDonationPass

    PiperOrigin-RevId: 698519430
    ICGog authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    1b65e9a View commit details
    Browse the repository at this point in the history
  71. Add AssertEq wrapper and switch assert funcs to use generalized funct…

    …ion poitners. check for correct union types in tensor cc api
    
    PiperOrigin-RevId: 698529069
    LukeBoyer authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    98ec0ed View commit details
    Browse the repository at this point in the history
  72. [NFC] hlo_op_profiler_test: Internal testing change.

    PiperOrigin-RevId: 698529351
    pizzud authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    be03c16 View commit details
    Browse the repository at this point in the history
  73. Optimize explicit broadcasting-like patterns for TFL_Select*Ops in TF…

    …Lite.
    
    This CL optimizes explicit broadcasting-like patterns in TFLite, because TFLite Ops support implicit broadcasting.
    
    Also, this CL is moving the existing fusions on broadcast-to+select to the dedicated pass.
    
    The patterns are:
    - Fuse splat const into select op.
    - Fuse fill-op into select op.
    
    PiperOrigin-RevId: 698530501
    vamsimanchala authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    6c8ca3b View commit details
    Browse the repository at this point in the history
  74. Add subgraph name to model runtime info proto.

    PiperOrigin-RevId: 698531058
    tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    ba53e1b View commit details
    Browse the repository at this point in the history
  75. [IFRT] Add pass to legalize VIFRT into IFRT.

    PiperOrigin-RevId: 698535976
    ICGog authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    32ab0e9 View commit details
    Browse the repository at this point in the history
  76. Remove unused functions from ir_emission_utils.cc

    PiperOrigin-RevId: 698542878
    majnemer authored and tensorflower-gardener committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    579b594 View commit details
    Browse the repository at this point in the history

Commits on Nov 21, 2024

  1. Reverts 4fc4983

    PiperOrigin-RevId: 698548746
    tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    031dc7c View commit details
    Browse the repository at this point in the history
  2. [XLA:MSA] Fixes a bug in GetInefficientAllocationSites(allocation_val…

    …ues). The function was previously assuming allocation_values can never be empty.
    
    PiperOrigin-RevId: 698548828
    mehrdadkhani authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    afd233b View commit details
    Browse the repository at this point in the history
  3. Adding step to constant_value and add support for multiplication whil…

    …e recursively calculating the range of an expression.
    
    PiperOrigin-RevId: 698551804
    fhoushmand authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    605e172 View commit details
    Browse the repository at this point in the history
  4. [XLA:GPU] Remove RewriteReductionsPass

    It is unused.
    
    PiperOrigin-RevId: 698554807
    majnemer authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    a3fda2b View commit details
    Browse the repository at this point in the history
  5. Implement getting per-tensor quantization in the c and cc api

    PiperOrigin-RevId: 698555795
    LukeBoyer authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    23a1f6b View commit details
    Browse the repository at this point in the history
  6. Expose SignatureRunner via interpreter.h

    PiperOrigin-RevId: 698557779
    tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    aff0983 View commit details
    Browse the repository at this point in the history
  7. [Cleanup] Use HloPredicateIs(Not)Op in cudnn_fused_mha_rewriter.cc

    PiperOrigin-RevId: 698567794
    frgossen authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    0e5af71 View commit details
    Browse the repository at this point in the history
  8. Add helper methods to add inputs/outputs to internal tensor def.

    PiperOrigin-RevId: 698572323
    LukeBoyer authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    613a4ac View commit details
    Browse the repository at this point in the history
  9. [IfOp] Call std::vector::reserve() on the args vector before copy…

    …ing input tensors to it.
    
    PiperOrigin-RevId: 698572634
    mrry authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    26eafbb View commit details
    Browse the repository at this point in the history
  10. [XLA:TPU:MSA]

    * Add support for overriding cross program prefetch behavior.
    * Add support for filtering buffer intervals based on the uses of the buffer.
    * Add tests for overriding cross program prefetch behavior
    * Add tests for expanding filtering criteria.
    
    PiperOrigin-RevId: 698574108
    subhankarshah authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    c5b2880 View commit details
    Browse the repository at this point in the history
  11. Move tsl/platform/{cloud,default,windows} to xla/tsl/platform

    PiperOrigin-RevId: 698575496
    ddunl authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    e435325 View commit details
    Browse the repository at this point in the history
  12. Temporally changes the supported OP check function in QC Compiler plu…

    …gin.
    
    PiperOrigin-RevId: 698585431
    tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    0b2559d View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    9a46acc View commit details
    Browse the repository at this point in the history
  14. [xla:cpu] Resolve arguments/results/temp mapping from buffer assignment

    PiperOrigin-RevId: 698610190
    ezhulenev authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    6fb4e33 View commit details
    Browse the repository at this point in the history
  15. [xla:cpu] Resolve constant buffers

    PiperOrigin-RevId: 698625663
    ezhulenev authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    39fb1ff View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    87766ec View commit details
    Browse the repository at this point in the history
  17. Add helper function to add new tensors to internal subgraph.

    PiperOrigin-RevId: 698644067
    LukeBoyer authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    688cf8a View commit details
    Browse the repository at this point in the history
  18. [xla:cpu] Use CountDownAsyncValueRef in HostKernel state

    PiperOrigin-RevId: 698648940
    ezhulenev authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    399d155 View commit details
    Browse the repository at this point in the history
  19. Copy constant buffer data for partitioned tensor; Copy option for par…

    …titioned Op.
    
    PiperOrigin-RevId: 698655747
    tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    ea216b1 View commit details
    Browse the repository at this point in the history
  20. [XLA:TPU:MSA] Remove redundant checks for cross_program_prefetches in…

    … memory_space_assignment tests.
    
    PiperOrigin-RevId: 698657506
    subhankarshah authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    96cb336 View commit details
    Browse the repository at this point in the history
  21. [XLA:MSA] Allow more flexible filtering when picking instruction to s…

    …chedule after/before for prefetch time override.
    
    PiperOrigin-RevId: 698666645
    subhankarshah authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    b130342 View commit details
    Browse the repository at this point in the history
  22. Add Dispatch API for MediaTek

    PiperOrigin-RevId: 698671850
    tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    b8b7854 View commit details
    Browse the repository at this point in the history
  23. [XLA:GPU] Use DeviceDescription instead of GetDriverVersion in NVPTXC…

    …ompiler
    
    NVPTXCompiler was calling `cuda::GetDriverVersion` to determine whether the CUDA driver is new enough to consider it for PTX JIT compilation.
    
    This change makes it use the driver version available in the `DeviceDescription` type.
    
    PiperOrigin-RevId: 698672918
    beckerhe authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    c6a4cc9 View commit details
    Browse the repository at this point in the history
  24. Use fast version of log if type is F16 or BF16.

    There seems to be no dedicated libdevice call for Log with F16 or BF16 type.
    Currently we upcast to F32 and use __nv_logf. However it seems likely that
    __nv_fast_logf is good enough for F16 and BF16 type, so use it as it is
    considerably faster.
    
    PiperOrigin-RevId: 698673580
    akuegel authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    81e2cc6 View commit details
    Browse the repository at this point in the history
  25. Configuration menu
    Copy the full SHA
    9264ca7 View commit details
    Browse the repository at this point in the history
  26. [XLA:GPU] Delete file that is not referenced in BUILD file anymore.

    Also delete the other things which were only referenced from that file.
    
    PiperOrigin-RevId: 698706755
    akuegel authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    dec1404 View commit details
    Browse the repository at this point in the history
  27. Remove :cuda_runtime and :rocm_runtime targets

    - The remaining `GetRuntimeVersion` and `GetFuncBySymbol` functions get moved into the executors - the only place where they are needed.
    - For CUDA is also create an overload of `cuda::ToStatus` which can convert a CUDA runtime error (`cudaError_t`) into an `absl::Status`.
    - I also had to adjust the `RocmKernel` and `CudaKernel` tests which were using `GetFuncBySymbol` directly. Now they rely on `LoadKernel` from the executors.
    
    PiperOrigin-RevId: 698720699
    beckerhe authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    e563ba5 View commit details
    Browse the repository at this point in the history
  28. Remove CUDA 12.1 workaround from reduction logic

    There was a check in place that works around a performance bug in ptxas from CUDA 12.1. This check has various problems:
    
    1. It's untested and the way it's implemented it can't be easily test.
    2. The version check doesn't work library compilation which we transition towards as it's checking the version of a local ptxas binary
    3. It's unclear whether the workaround is still needed with the new MLIR emitters.
    
    So I'm removing it here since it blocks me from making more refactoring around PTX compilation.
    
    PiperOrigin-RevId: 698720761
    beckerhe authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    8e7ac01 View commit details
    Browse the repository at this point in the history
  29. [XLA:GPU][Emitters] Canonicalize unrolled IR.

    If the IR is not canonicalized after unrolling, then the passes that follow
    unrolling in the pipeline don't converge sometimes.
    
    PiperOrigin-RevId: 698723354
    pifon2a authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    b9f49aa View commit details
    Browse the repository at this point in the history
  30. PR #19528: [XLA:GPU] use separte command buffer cmd flag for conditio…

    …nal and loop
    
    Imported from GitHub PR openxla/xla#19528
    
    Observed in saxml workload that sharing the same command buffer cmd type (CONDITIONALS) for WHILE and CONDITIONAL command over kill the lowering opportunities.
    
    Many cases could allow CONDITIONAL instruction to lower into command buffer, while WHILE is not possible.
    
    This PR uses separate command buffer cmd type flag for CONDITIONAL and WHILE instructions when user specifies the type to lowering.
    Copybara import of the project:
    
    --
    4d62fb512995e2fc6e9077a1b3251a6754c866ca by Shawn Wang <[email protected]>:
    
    use separte command buffer cmd flag for conditional and loop
    
    Merging this change closes #19528
    
    PiperOrigin-RevId: 698729891
    shawnwang18 authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    733d71d View commit details
    Browse the repository at this point in the history
  31. Configuration menu
    Copy the full SHA
    d797898 View commit details
    Browse the repository at this point in the history
  32. PR #19552: [GPU][NFC] Cleanup horizontal loop fusion.

    Imported from GitHub PR openxla/xla#19552
    
    - avoid unnecessary work
    - bump log level at which complete computations are printed
    - add log statements
    Copybara import of the project:
    
    --
    e273aea41dd15efbc5d79c363810cf634e73203e by Ilia Sergachev <[email protected]>:
    
    [GPU][NFC] Cleanup horizontal loop fusion.
    
    - avoid unnecessary work
    - bump log level at which complete computations are printed
    - add log statements
    
    Merging this change closes #19552
    
    PiperOrigin-RevId: 698731719
    sergachev authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    02e74d9 View commit details
    Browse the repository at this point in the history
  33. Configuration menu
    Copy the full SHA
    3995a20 View commit details
    Browse the repository at this point in the history
  34. [XLA:GPU] Enable Triton normalization fusions by default.

    With the feature enabled, XLA GPU will automatically match all kinds of normalization diamond patterns in the graph (Softmax, RmsNorm, etc.) and generate efficient kernels with Triton.
    
    In the compilation pipeline the following steps happen:
    
    1. `SoftmaxRewriterTriton` pass matches minimal normalization diamonds and creates new fusions with `kCustom` kind. The fusions also have a backend config attached with  `__triton` kind and tiling information in `BlockLevelFusionConfig`.
    2. `PriorityFusion` uses the Cost Model to potentially fuse more instructions into the matched fusions.
    3. Fusions are emitter with generic Triton fusion emitter.
    
    The Cost Model chooses tile sizes for each Triton fusion.
    
    Currently `SoftmaxRewriterTriton` only matches normalization patterns that reduce the minormost dimension.
    
    PiperOrigin-RevId: 698735843
    olegshyshkov authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    aeef8f4 View commit details
    Browse the repository at this point in the history
  35. [XLA:GPU] Remove KernelFusionEmitterBase.

    This class is no longer used.
    
    PiperOrigin-RevId: 698736858
    pifon2a authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    16206d3 View commit details
    Browse the repository at this point in the history
  36. Integrate LLVM at llvm/llvm-project@33fcd6acc755

    Updates LLVM usage to match
    [33fcd6acc755](llvm/llvm-project@33fcd6acc755)
    
    PiperOrigin-RevId: 698742870
    metaflow authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    4eb39af View commit details
    Browse the repository at this point in the history
  37. Remove :cuda_driver_version

    Since `CudaDriverVersion()` is now only used in one place, let's inline the function and remove the target.
    
    PiperOrigin-RevId: 698747446
    beckerhe authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    8f4674a View commit details
    Browse the repository at this point in the history
  38. Remove patch that is not needed anymore.

    This has been upstreamed to LLVM, and we have updated to a revision containing
    this.
    
    PiperOrigin-RevId: 698748177
    akuegel authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    f6033a3 View commit details
    Browse the repository at this point in the history
  39. PR #18407: Fix xla-mlir failures on Windows

    Imported from GitHub PR openxla/xla#18407
    
    This PR aims to enable the XLA/mlir/tool test cases on the Windows Platform.
    
    Error:
    //xla/mlir/tools/mlir_bisect/... tests were failing on the Windows platform with the errors shown below:
    
    Errors
    Error 1.Error with llvm::seq
    no matching function for call to 'seq'
    for (auto i : llvm::seq(0ul, sizeof...(T))) {
    Solution: change to llvm::seq(0, sizeof...(T))
    By explicitly specifying the type (unsigned long) in llvm::seq, the compiler now clearly understands the type of the sequence.
    
    Error 2. Missing dlfcn.h:
    Location: xla/mlir/tools/mlir_interpreter/dialects/func.cc
    fatal error: 'dlfcn.h' file not found
    Solution: include 'windows.h' for Windows platform
    Error 3.
    Use of Undeclared Identifiers sym and RTLD_DEFAULT:
    Location: xla/mlir/tools/mlir_interpreter/dialects/func.cc
    use of undeclared identifier 'sym'
    sym = dlsym(RTLD_DEFAULT, callee.getSymName().str().c_str());
    ^
    use of undeclared identifier 'RTLD_DEFAULT'
    Solution:
    On Windows, the approach to obtaining a symbol's address differs from Unix-based systems.
    GetModuleHandle function retrieves a handle to the specified module (DLL) that is loaded in the address space of the calling process. This handle is necessary to access the module's symbols.
    GetProcAddress function locates the address of an exported function or variable by name.
    Copybara import of the project:
    
    --
    1a428996c7991df8e093393e7989fbcf251dc0f4 by Raunak <[email protected]>:
    
    fix xla-mlir failures on windows
    
    --
    15009666c4ee861218bb798c6fe0d2493fa8e060 by Raunak <[email protected]>:
    
    resolve comments
    
    --
    2483001d510582179d74b94571f9fd6beb943aaa by Raunak <[email protected]>:
    
    Keep the original file
    
    --
    4c7fe5e4debed0ff39eb87f64f60f99ce6ee0a74 by Raunak <[email protected]>:
    
    fix the formatting issue
    
    --
    270898a2b0bca97a7de30435ce6a53b5980ca73e by mraunak <[email protected]>:
    
    Update symbol_finder_windows.cc
    --
    6b63a306ee4ef69f9849822418426d5f705e73ff by mraunak <[email protected]>:
    
    Update symbol_finder_linux.cc
    --
    f0996fcc1c67e43bbb7b7829adddf0c7d8f5c738 by mraunak <[email protected]>:
    
    Update symbol_finder.h
    --
    0c0c9bba3dac548e663aab5e2e3af6fb96c77fde by Raunak <[email protected]>:
    
    Fix the build file
    
    --
    6d7f269262dcc8a85579c62db51e38dc534d6564 by Raunak <[email protected]>:
    
    Resolve the comments
    
    --
    ef598af149e9ad96dc1fa27be763a7ffd219011c by Raunak <[email protected]>:
    
    Resolve the comments
    
    --
    7131b8d24ad353044b622614c36c342d90101d37 by Raunak <[email protected]>:
    
    added :find_symbol to dependency
    
    --
    64a6e9e45d6deef4201c3cd8da64e99b9d40ca78 by mraunak <[email protected]>:
    
    Update BUILD
    --
    d47a8b27c89e5df01ea94a237080fd2ac3ad8e85 by mraunak <[email protected]>:
    
    Fix clang format
    --
    1a24df16d3de5f007065b69a67965158e821ffe3 by Raunak <[email protected]>:
    
    resolve the comments
    
    --
    12f69fc2d188f8bc368bc5e29b53a80d15b6dbac by Raunak <[email protected]>:
    
    adding namespace and header style consistent
    
    --
    ec9b5051471a36f7881ed21215f60ec893f18e7d by Raunak <[email protected]>:
    
    Fix the build file
    
    Merging this change closes #18407
    
    PiperOrigin-RevId: 698754912
    mraunak authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    b02f2fb View commit details
    Browse the repository at this point in the history
  40. Refactor PjRt environment initialization to have clearer data flow

    Split the initialization into several methods to have a better distinction between their responisbilities.
    
    PiperOrigin-RevId: 698757702
    nputikhin authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    99e5ce3 View commit details
    Browse the repository at this point in the history
  41. [XLA:GPU] Copy final bufferize patterns that were removed in upstream…

    … MLIR.
    
    An upstream MLIR PR [0] removed `finalizing-bufferize` pass. We are using only two pattern from the pass. As suggested by the note in the PR description, we can copy those pattern.
    
    [0] llvm/llvm-project@cbc7802
    
    PiperOrigin-RevId: 698761132
    olegshyshkov authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    3cce16f View commit details
    Browse the repository at this point in the history
  42. Remove remnants of GpuDriver

    The build target doesn't exist anymore but there is still a header file which gets deleted in this change.
    
    PiperOrigin-RevId: 698778403
    beckerhe authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    cae2ee4 View commit details
    Browse the repository at this point in the history
  43. [xla:cpu] Optimize buffer allocations construction from se::DeviceMem…

    …oryBase
    
    name                 old cpu/op   new cpu/op   delta
    BM_NanoRtAddScalars  82.2ns ± 2%  63.1ns ± 2%  -23.17%  (p=0.000 n=37+40)
    BM_NanoRtFibonacci   86.7ns ± 2%  68.4ns ± 2%  -21.09%  (p=0.000 n=37+35)
    BM_PjRtAddScalars    1.78µs ± 2%  1.79µs ± 2%     ~     (p=0.280 n=39+38)
    BM_PjRtFibonacci     1.79µs ± 3%  1.79µs ± 3%     ~     (p=0.355 n=38+38)
    
    PiperOrigin-RevId: 698783540
    ezhulenev authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    7aac3a5 View commit details
    Browse the repository at this point in the history
  44. Specify a much shorter output path for Bazel on Windows.

    To avoid running into the 259 character path length limitation.
    
    PiperOrigin-RevId: 698786300
    belitskiy authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    dc24ff7 View commit details
    Browse the repository at this point in the history
  45. PR #19578: [doc] Fix a link to a page in the table of contents.

    Imported from GitHub PR openxla/xla#19578
    
    Copybara import of the project:
    
    --
    849d78bf539cc69387ecb3f9710b6188cee5a494 by Ilia Sergachev <[email protected]>:
    
    [doc] Fix a link to a page in the table of contents.
    
    Merging this change closes #19578
    
    PiperOrigin-RevId: 698788574
    sergachev authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    afa0ccd View commit details
    Browse the repository at this point in the history
  46. [XLA:GPU] Change ConstraintExpression to use operator||/&& which re…

    …turn a new instance.
    
    This CL changes the `ConstraintExpression` class by making it a value type and using C++ operators for logical operations. This hopefully makes the code more concise and easier to read.
    
    PiperOrigin-RevId: 698791293
    chsigg authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    6e19bfe View commit details
    Browse the repository at this point in the history
  47. [xla:cpu] Add a test for nanort executable with temp storage

    PiperOrigin-RevId: 698800849
    ezhulenev authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    8e81dc7 View commit details
    Browse the repository at this point in the history
  48. Remove static_casts in implementations of SetNodeExecutionEnabled.

    PiperOrigin-RevId: 698809905
    klucke authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    3098300 View commit details
    Browse the repository at this point in the history
  49. Merge pull request #80492 from tensorflow:gaikwadrahul8-patch-3

    PiperOrigin-RevId: 698812072
    tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    dbe8068 View commit details
    Browse the repository at this point in the history
  50. Merge pull request #80490 from tensorflow:gaikwadrahul8-patch-2

    PiperOrigin-RevId: 698813006
    tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    4907ba9 View commit details
    Browse the repository at this point in the history
  51. Derived lines only from the stream with most device events for GPU de…

    …vice traceviewer
    
    PiperOrigin-RevId: 698820533
    tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    811877e View commit details
    Browse the repository at this point in the history
  52. [tsl:concurrency] Fix asan error in CountDownAsyncValueRef

    PiperOrigin-RevId: 698821973
    ezhulenev authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    0aa5275 View commit details
    Browse the repository at this point in the history
  53. Remove unused GpuAsmOpts parameter from RedzoneAllocator

    PiperOrigin-RevId: 698822218
    beckerhe authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    86eeec3 View commit details
    Browse the repository at this point in the history
  54. Set implicitTrunc on APInt creation

    With llvm/llvm-project@3494ee9, upstream has stricter checks for ints.
    
    PiperOrigin-RevId: 698823182
    tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    8ab5ae1 View commit details
    Browse the repository at this point in the history
  55. Move upstreamable part of sparse_dot to be a public patch

    PiperOrigin-RevId: 698823837
    tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    4844ee2 View commit details
    Browse the repository at this point in the history
  56. [Control flow] Add a lighter implementation of cond_v2() that is op…

    …timized for latency.
    
    This change introduces `cond_v2.fast_cond_v2()`, which is a tool for writing latency-optimized conditionals using the functional `IfOp` implementation.
    
    PiperOrigin-RevId: 698835221
    mrry authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    c934e12 View commit details
    Browse the repository at this point in the history
  57. Clarify the dimensions in gather/scatter dimensions. The following di…

    …mensions do NOT overlap. These dims are processed separately in spmd partitioner.
    
    1. Explicit batching dims exist in all tensors (operand, indices, output).
    2. Index pass-through dims exist in indices and output.
    3. Operand pass-through dims exist in operand and output.
    
    We replace `GatherOutputShardingFromIndexIndexPassthroughDimensions` with `GatherOutputShardingFromIndex(bool consider_explict_batch_dims=true)`.
    
    The added test failed before this change since it process explicit batch dims as index pass-through dims. This change fix this issue.
    
    PiperOrigin-RevId: 698840297
    ZixuanJiang authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    6823a7d View commit details
    Browse the repository at this point in the history
  58. Add backend kwargs to xla tests.

    PiperOrigin-RevId: 698843953
    tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    81b91c2 View commit details
    Browse the repository at this point in the history
  59. Reverts 2dee618

    PiperOrigin-RevId: 698863847
    tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    12eb5bd View commit details
    Browse the repository at this point in the history
  60. Make the implementation of GetXlaPjrtTpuClient more similar to how Ja…

    …x uses PJRT.
    
    PiperOrigin-RevId: 698867902
    matthiaskramm authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    65e4fb1 View commit details
    Browse the repository at this point in the history
  61. Cleanup. Merge GatherScatterParallelDims into GatherScatterDims.

    No behavior change.
    
    PiperOrigin-RevId: 698870832
    ZixuanJiang authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    00c79a2 View commit details
    Browse the repository at this point in the history
  62. [XLA:GPU] Dump the failing HLO fusion to a file when Triton numerics …

    …verification fails.
    
    The fusion is extracted into a separate module, so it's easier to reproduce the issue.
    
    If the fusion is too long, stdout log will be cropped.
    
    PiperOrigin-RevId: 698872626
    olegshyshkov authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    6db0eae View commit details
    Browse the repository at this point in the history
  63. Automated Code Change

    PiperOrigin-RevId: 698873637
    Martin Huschenbett authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    c8815dd View commit details
    Browse the repository at this point in the history
  64. Revert: [XLA:GPU] Enable Triton normalization fusions by default.

    Internal test is broken.
    
    Reverts aeef8f4
    
    PiperOrigin-RevId: 698893785
    olegshyshkov authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    79d66e7 View commit details
    Browse the repository at this point in the history
  65. [xla:codegen] Add a testonly KernelEmitter for testing XLA:CPU kernels

    Prototyping test only KernelEmitter API that can be used for writing XLA:CPU kernel tests.
    
    PiperOrigin-RevId: 698895333
    ezhulenev authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    82889b7 View commit details
    Browse the repository at this point in the history
  66. Refactor GetGatherScatterBatchParallelDims. No behavior change.

    Before this change, `GetGatherScatterBatchParallelDims` only returns the implicit batching dims in operand and indices. We still need to call `GetGatherParallelOutputDims` to return the corresponding dims in the output.
    
    With this change, `GetGatherScatterBatchParallelDims` returns the implicit batch dims in 3 tensors (operand, indices, and output).
    
    PiperOrigin-RevId: 698895717
    ZixuanJiang authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    27b5058 View commit details
    Browse the repository at this point in the history
  67. [XLA:CollectivePipeliner-Sinking] Stop pipelining iterations if a lar…

    …ge sunk collective is encountered.
    
    PiperOrigin-RevId: 698924172
    seherellis authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    efcfca3 View commit details
    Browse the repository at this point in the history
  68. Move jax visibility inside internal_visibility call

    PiperOrigin-RevId: 698927051
    ddunl authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    cac5b0a View commit details
    Browse the repository at this point in the history
  69. Remove the C++ memory checker. Python checker remains.

    PiperOrigin-RevId: 698929552
    patnotz authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    1ccaf9c View commit details
    Browse the repository at this point in the history
  70. Eliminate static_casts in GpuCommandBuffer.

    PiperOrigin-RevId: 698932952
    klucke authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    f5ebe9a View commit details
    Browse the repository at this point in the history
  71. [Cleanup] Use HloPredicateIs(Not)Op in cudnn_fused_mha_rewriter_test.cc

    PiperOrigin-RevId: 698933032
    frgossen authored and tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    860cd53 View commit details
    Browse the repository at this point in the history
  72. Switch flatbuffer_conversions to use ABSL_LOG instead of LOG

    PiperOrigin-RevId: 698939145
    tensorflower-gardener committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    3efa2dd View commit details
    Browse the repository at this point in the history

Commits on Nov 22, 2024

  1. PR #16901: [XLA:GPU] Fix default device mesh for auto sharding

    Imported from GitHub PR openxla/xla#16901
    
    When the user does not specify the number of GPUs for auto sharding, XLA defaults to using all available GPUs.
    
    The current implementation uses the number of cores (SMs) on the GPU as the default shard count. For example, on an A100, the sharding algorithm will try to shard into 108 devices, which can be confusing for users.
    
    This patch changes the shard count to the number of cards, which has been tested to work correctly on an 8-card A100 machine.
    Copybara import of the project:
    
    --
    232a62ae2599e6fe76e2e235ea18452195bce799 by Tianyi Liu <[email protected]>:
    
    [XLA:GPU] Fix default device mesh for auto sharding
    
    Merging this change closes #16901
    
    PiperOrigin-RevId: 698956243
    i-Pear authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    35fe8c2 View commit details
    Browse the repository at this point in the history
  2. Stop using AsGpuStreamValue in gpu_cudamallocasync_allocator_test.

    PiperOrigin-RevId: 698958036
    klucke authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    f6c6a4e View commit details
    Browse the repository at this point in the history
  3. Separate authoritative vs Q-DQ DRR patterns.

    Some patterns added the the quantize_patterns.td were making decisions about quantizing some weights that are not annotated by Q-DQ nodes. This PR separates these two categories for cases we want strict adherence to Q-DQ annotations (e.g. QAT).
    
    PiperOrigin-RevId: 698960224
    majiddadashi authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    3c7c5ae View commit details
    Browse the repository at this point in the history
  4. [Cleanup] Use HloPredicateIs(Not)Op in cudnn_fused_mha_transpose_fusi…

    …on.cc
    
    PiperOrigin-RevId: 698970177
    frgossen authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    03844d1 View commit details
    Browse the repository at this point in the history
  5. Add batch tests to RemapArrays, and with different shapes.

    PiperOrigin-RevId: 698973331
    tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    e8348ad View commit details
    Browse the repository at this point in the history
  6. [Cleanup] Use HloPredicateIs(Not)Op in alias_passthrough_params.cc

    PiperOrigin-RevId: 698987161
    frgossen authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    866c565 View commit details
    Browse the repository at this point in the history
  7. [IFRT] Implement BytecodeDialectInterface for VIFRT.

    PiperOrigin-RevId: 698995873
    ICGog authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    8b6acd0 View commit details
    Browse the repository at this point in the history
  8. [Cleanup] Use HloPredicateIs(Not)Op in all_gather_dynamic_slice_simpl…

    …ifier.cc
    
    PiperOrigin-RevId: 698997197
    frgossen authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    eaa1401 View commit details
    Browse the repository at this point in the history
  9. Create a proto for holding logical topology metadata about a job.

    PiperOrigin-RevId: 699000269
    bmass02 authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    16986fc View commit details
    Browse the repository at this point in the history
  10. Automated Code Change

    PiperOrigin-RevId: 699004930
    mkruskal-google authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    de89503 View commit details
    Browse the repository at this point in the history
  11. Set implicitTrunc on APInt creation

    With llvm/llvm-project@3494ee9, upstream has stricter checks for ints.
    
    Setting `APInt(.., /*isSigned=*/ !isUnsigned, ..)` seems to break EvalCompareOpPattern, likely due to signed i1 not allowing 1. This change just keeps the status quo without making too many changes.
    
    PiperOrigin-RevId: 699031101
    tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    c23c6f5 View commit details
    Browse the repository at this point in the history
  12. Move tsl/platform/profile_utils to xla/tsl/platform/profile_utils

    PiperOrigin-RevId: 699035755
    ddunl authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    1d6bd16 View commit details
    Browse the repository at this point in the history
  13. Remove obsolete PjRtClient::AsyncSendPlaceholder API.

    PiperOrigin-RevId: 699044311
    tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    a40d63f View commit details
    Browse the repository at this point in the history
  14. [Cleanup] Use HloPredicateIs(Not)Op in all_gather_optimizer.cc

    PiperOrigin-RevId: 699044350
    frgossen authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    c4b75a0 View commit details
    Browse the repository at this point in the history
  15. [Cleanup] Use HloPredicateIs(Not)Op in all_reduce_blueconnect.cc

    PiperOrigin-RevId: 699048665
    frgossen authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    b84678f View commit details
    Browse the repository at this point in the history
  16. [Cleanup] Use HloPredicateIs(Not)Op in async_wrapper.cc

    PiperOrigin-RevId: 699052605
    frgossen authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    acc6557 View commit details
    Browse the repository at this point in the history
  17. [Cleanup] Use HloPredicateIs(Not)Op in async_wrapper_test.cc

    PiperOrigin-RevId: 699056083
    frgossen authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    fef8b55 View commit details
    Browse the repository at this point in the history
  18. Move ptxas/nvlink compilation into separate compilation unit

    This moves all the PTX compilation functions that spawn subprocesses - notably ptxas, nvlink, and fatbin into a separate file.
    
    The goal is to make this optional so that and eventually disable it by default. Since we can compile through libraries like libnvjitlink the rather brittle approach of calling external binaries is not needed anymore.
    
    This also adds tests for all the helper functions. Tests for the actual compilation will follow separately.
    
    PiperOrigin-RevId: 699058086
    beckerhe authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    4eb2052 View commit details
    Browse the repository at this point in the history
  19. [Cleanup] Use HloPredicateIs(Not)Op in collective_permute_cycle_decom…

    …poser.cc
    
    PiperOrigin-RevId: 699059539
    frgossen authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    9d580d3 View commit details
    Browse the repository at this point in the history
  20. [Cleanup] Use HloPredicateIs(Not)Op in collective_permute_valid_itera…

    …tion_annotator.cc
    
    PiperOrigin-RevId: 699062814
    frgossen authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    410e20e View commit details
    Browse the repository at this point in the history
  21. [Cleanup] Use HloPredicateIs(Not)Op in collective_select_folder.cc

    PiperOrigin-RevId: 699067483
    frgossen authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    c082b0e View commit details
    Browse the repository at this point in the history
  22. Move LinkGpuAsm into separate file

    This adds a new target `:driver_compilation` and moves `LinkGpuAsm` into a new file `driver_compilatio.cc`
    
    I'm also bringing back the `StreamExecutor` argument for being able to call `ActicateContext` which I had removed mistakenly in a previous CL. The active
    context is indeed needed.
    
    The goal is to separate out all the different PTX compilation and linking methods, make them independently testable and optional.
    
    PiperOrigin-RevId: 699071278
    beckerhe authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    beccd2c View commit details
    Browse the repository at this point in the history
  23. [Cleanup] Use HloPredicateIs(Not)Op in collective_send_recv_combiner.cc

    PiperOrigin-RevId: 699071557
    frgossen authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    ce26a52 View commit details
    Browse the repository at this point in the history
  24. Add a simple test for the symbol_finder

    Also renames the target for consistency.
    
    PiperOrigin-RevId: 699076843
    beckerhe authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    66d2fd0 View commit details
    Browse the repository at this point in the history
  25. [Cleanup] Use HloPredicateIs(Not)Op in command_buffer_scheduling.cc

    PiperOrigin-RevId: 699079990
    frgossen authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    4ac933c View commit details
    Browse the repository at this point in the history
  26. Change parameter type in LinkUsingNvlink

    `LinkUsingNvlink` and `LinkGpuAsmUsingDriver` used to take a list of `CubinOrPTXImage` structs as inputs, but the functions doen't even support compiling PTX, so it's very misleading.
    
    I change the parameter type to a a list of byte arrays (`std::vector<uint8_t>`) which is what we use everywhere else for representing compiled modules (CUBINS).
    
    PiperOrigin-RevId: 699082261
    beckerhe authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    f523817 View commit details
    Browse the repository at this point in the history
  27. PR #19656: Fix implicit index handling in ScatterDeterminismExpander

    Imported from GitHub PR openxla/xla#19656
    
    This PR fixes a bug related to handling missing (implied) indices and adds the corresponding tests.
    
    1. When `scatter_dims_to_operand_dims` size is not equal to the operand rank, the `out_of_bound_tensor` has incorrect dimensions, resulting in mismatched shapes of the select op. This is fixed at line 718.
    2. When the update is not scalar, the indices are recalculated - this requires updating the `out_of_bound_tensor` (lines 757-761).
    3. After expanding the indices, the `has_scalar_indices` flag has to be updated (line 777).
    
    Also added a few cosmetic changes:
    
    1. Removed `is_one_dimensional` branch in `ExpandIndices`, as this never happens (probably an artefact from prior implementation).
    2. Broadcast the boundary constants instead of generating a (possibly big) literal.
    Copybara import of the project:
    
    --
    2e38efc0c9efc2f708058bd2ae526f13d2ed8354 by Sergey Kozub <[email protected]>:
    
    Fix implicit index handling in ScatterDeterminismExpander
    
    Merging this change closes #19656
    
    PiperOrigin-RevId: 699083584
    sergey-kozub authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    7a6f538 View commit details
    Browse the repository at this point in the history
  28. [XLA] Propagate original_value when instructions are replaced in X64R…

    …ewriter
    
    This copies over original_value attribute when an value is replaced during this pass.
    
    PiperOrigin-RevId: 699087576
    jcai19 authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    4e36ed8 View commit details
    Browse the repository at this point in the history
  29. PR #19679: [XLA:CPU][oneDNN] Relocate Addend Shape Validation to the …

    …Contraction Rewriter
    
    Imported from GitHub PR openxla/xla#19679
    
    This PR moves the addend shape check to the rewriter so that the code to append oneDNN post-ops can be shared between matmul and convolution kernels.
    Copybara import of the project:
    
    --
    c6497851473b2ec5b5041de459e4aaa3c8c2cb93 by Akhil Goel <[email protected]>:
    
    Move addend check
    
    Merging this change closes #19679
    
    PiperOrigin-RevId: 699095534
    akhilgoe authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    c390ae7 View commit details
    Browse the repository at this point in the history
  30. PR #19346: Bumped rules_python version to 0.39.0

    Imported from GitHub PR openxla/xla#19346
    
    cc @hawkinsp
    Copybara import of the project:
    
    --
    292e7ebb7ee57e5af5977c08f0aaf28fc1f852e2 by vfdev-5 <[email protected]>:
    
    Bumped rules_python version to 0.39.0
    
    Merging this change closes #19346
    
    PiperOrigin-RevId: 699100796
    vfdev-5 authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    6332b90 View commit details
    Browse the repository at this point in the history
  31. Re-enable deterministic scatter expander pass by default.

    The issues which we have hit previously seem to be fixed now.
    
    PiperOrigin-RevId: 699120716
    akuegel authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    29090a9 View commit details
    Browse the repository at this point in the history
  32. Clean up disabling reduce_hlo_test on TPU

    PiperOrigin-RevId: 699125504
    tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    a66b63b View commit details
    Browse the repository at this point in the history
  33. PR #19577: Cleanup handling of 2 fields of ExecutableBuildOptions.

    Imported from GitHub PR openxla/xla#19577
    
    Copybara import of the project:
    
    --
    b4180bb5e59c92b374eb16fc59d6f03d7f37db4a by Ilia Sergachev <[email protected]>:
    
    Cleanup handling of 3 fields of ExecutableBuildOptions.
    
    --
    21206eb838fa04dabaddec0aa8cdf73789ce8206 by Ilia Sergachev <[email protected]>:
    
    add a test
    
    --
    a6571ef2ac7ec6a94056b1588a3260ecc7d9db17 by Ilia Sergachev <[email protected]>:
    
    cleanup
    
    --
    44d479f3d3d6320d37d35934cf81596e50e10c51 by Ilia Sergachev <[email protected]>:
    
    add missing newline
    
    --
    c3c550f491b2fc03dacdf1101042a3fbadd51e7c by Ilia Sergachev <[email protected]>:
    
    add missing include
    
    --
    5acf13c0423b7aef87f81b86ac95a0a1471927f1 by Ilia Sergachev <[email protected]>:
    
    ignore key_value_store
    
    Merging this change closes #19577
    
    PiperOrigin-RevId: 699125923
    sergachev authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    745b834 View commit details
    Browse the repository at this point in the history
  34. [xla:cpu] NFC: Remove ExecuteState alias from Thunk

    PiperOrigin-RevId: 699128079
    ezhulenev authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    1a1dad7 View commit details
    Browse the repository at this point in the history
  35. Add cuda::CompilationProvider interface and first implementation for …

    …subprocess compilation
    
    This adds a new interface `CompilationProvider` which offers `PTX` to `CUBIN` compilation. It also adds the first implementation of this interface, the `SubprocessCompilationProvider` which uses ptxas and nvlink to the compilation.
    
    Some additional changes were also needed:
    
    - New type `CompilationOptions` which collects and documents all compilation options in one place.
    - Some additional overloads in `:subprocess_compilation` where needed so that the `SubprocessCompilationProvider` can control the exact file path to ptxas and nvlink.
    - A fairly comprehensive test suite for the compilation provider is also added.
    
    PiperOrigin-RevId: 699134414
    beckerhe authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    888a584 View commit details
    Browse the repository at this point in the history
  36. [xla:cpu] Add a benchmark for creating zero-copy PjRt buffer

    ------------------------------------------------------------------
    Benchmark                        Time             CPU   Iterations
    ------------------------------------------------------------------
    BM_CreateZeroCopyBuffer        234 ns          234 ns      3075841
    
    PiperOrigin-RevId: 699137060
    ezhulenev authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    b260df9 View commit details
    Browse the repository at this point in the history
  37. [XLA:GPU] Fusion tests don't seem to require A100, so replace tag.

    tf_cuda_tests_tags() seems to work as well. Add hermetic_cuda_data_dir
    parameter as well, so that e.g. ptxas can be found. Also use
    linkopts = ["-Wl,-rpath,$$ORIGIN/../lit_lib"] so that the dynamic
    libraries are found, which are symlinked from the lit_lib directory.
    PiperOrigin-RevId: 699146874
    akuegel authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    90dc9c6 View commit details
    Browse the repository at this point in the history
  38. #sdy Refactor xla-sdy-mhlo-round-trip-shard-map-export from a `Conv…

    …ersionPattern` to a walk.
    
    PiperOrigin-RevId: 699148617
    bartchr808 authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    3952aca View commit details
    Browse the repository at this point in the history
  39. [XLA:GPU] Fix a bug in dot_algorithm_rewriter.

    The low_f32 should be rounded to bf16 instead of truncation.
    
    PiperOrigin-RevId: 699154452
    loislo authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    d70ee79 View commit details
    Browse the repository at this point in the history
  40. Sync OSS interpreter.h.

    PiperOrigin-RevId: 699159107
    qukhan authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    631bee3 View commit details
    Browse the repository at this point in the history
  41. Integrate LLVM at llvm/llvm-project@a12e79a85fc1

    Updates LLVM usage to match
    [a12e79a85fc1](llvm/llvm-project@a12e79a85fc1)
    
    PiperOrigin-RevId: 699163893
    metaflow authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    6aa3801 View commit details
    Browse the repository at this point in the history
  42. Configuration menu
    Copy the full SHA
    68fddf7 View commit details
    Browse the repository at this point in the history
  43. [XLA] Go back to using a glob for including dialects in the `mlir_int…

    …erpreter`.
    
    This is more in line with how the dialects were meant to be added according to the readme file in the parent directory.
    
    PiperOrigin-RevId: 699169422
    dimitar-asenov authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    7a77dcc View commit details
    Browse the repository at this point in the history
  44. Prevent dequantizing/requantizing f16 to f32 and back (2nd try).

    What this change does is it:
    
     1. Identifies all `kTfLiteBuiltinDequantize` nodes converting `kTfLiteFloat16` to `kTfLiteFloat32` and plugging into a `kTfLiteBuiltinFullyConnected`, `kTfLiteBuiltinConv2d`, or `kTfLiteBuiltinDepthwiseConv2d` node.
     2. Re-maps XNNPACK tensors pointing to the `kTfLiteFloat32` output to point to the original `kTfLiteFloat16` input.
    
    The `kTfLiteFloat16` weights/filters and biases are handled by XNNPACK directly.
    
    PiperOrigin-RevId: 699184221
    tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    857e530 View commit details
    Browse the repository at this point in the history
  45. Merge pull request #80484 from tensorflow:fixtypos07

    PiperOrigin-RevId: 699195297
    tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    62970a9 View commit details
    Browse the repository at this point in the history
  46. Fix two issues in PartitionScatterIndexPassthroughDimensions.

    We infer the update sharding from update to obtain `passthrough_sharding`. This `passthrough_sharding` should be merged with the existing update sharding, such that we may keep the original sharding axes in update.
    
    The added all-reduce are along the sharding axes along index pass-through dimensions. It should not be along the sharding axes along explicit batch dims or index vector dim.
    
    PiperOrigin-RevId: 699206933
    ZixuanJiang authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    820d85b View commit details
    Browse the repository at this point in the history
  47. Stop using xla/statusor.h in favor of absl/status/statusor.h directly.

    PiperOrigin-RevId: 699215454
    klucke authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    4c7e533 View commit details
    Browse the repository at this point in the history
  48. PR #19660: [ROCm] switch rocm build to clang

    Imported from GitHub PR openxla/xla#19660
    
    This PR switches the default rocm build to clang as the gcc config is broken at the moment.
    
    Copybara import of the project:
    
    --
    ea48f7c480d110eab3f133ed6ea8989da0e1e724 by Alexandros Theodoridis <[email protected]>:
    
    [ROCm] switch rocm build to clang
    
    --
    2743fabafd6a358c05e858781064e7fa2e389c78 by Alexandros Theodoridis <[email protected]>:
    
    Remove explicit clang path from the bazelrc rocm config
    
    --
    202dea0a80602cafdbee6067d8f20dc3055c6bbb by Alexandros Theodoridis <[email protected]>:
    
    Address review comments
    
    Merging this change closes #19660
    
    PiperOrigin-RevId: 699222609
    alekstheod authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    28df421 View commit details
    Browse the repository at this point in the history
  49. [xla:collectives] Initial xla/core/collectives component commit

    Next step is to migrate NcclComm and NcclOwnedComm to std::unique_ptr<Communicator> and proper virtual inheritance.
    
    PiperOrigin-RevId: 699233544
    ezhulenev authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    224d07c View commit details
    Browse the repository at this point in the history
  50. Further lower threshold for F64 in //xla/service/gpu/model:hlo_op_pro…

    …filer_test
    
    This was originally proposed in openxla/xla#16102, but I still ran into issue where it failed by slight margin:
    
    ```
    Expected: (profiler.MeasureClockCyclesPerOp(HloOpcode::kDivide, F64) .value() .clock_cycles()) > (300), actual: 296 vs 300
    ```
    
    That said, I ran 1000 tests and did not encounter this issue. Reducing the threshold to 280 since the bound seems very close and flaky test is no good either way.
    
    PiperOrigin-RevId: 699233864
    ghpvnist authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    7c823c8 View commit details
    Browse the repository at this point in the history
  51. [xla:cpu] Add a KernelRunner API to codegen testlib and sketch a test…

    … for XLA:CPU
    
    PiperOrigin-RevId: 699234540
    ezhulenev authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    7c35801 View commit details
    Browse the repository at this point in the history
  52. Lower the max bytes threshold used by the proto splitter

    PiperOrigin-RevId: 699235057
    eunjaekim-0 authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    cba80fe View commit details
    Browse the repository at this point in the history
  53. Fix test subgraph creation for StableHLO composite nodes.

    Also fixes a few missing includes. Uses C++ includes instead or C ones.
    
    PiperOrigin-RevId: 699237969
    qukhan authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    8310815 View commit details
    Browse the repository at this point in the history
  54. [tflite-gpu] Add REDUCE_ALL && REDUCE_ANY to gpu_compatibility

    PiperOrigin-RevId: 699238045
    grantjensen authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    bde84b7 View commit details
    Browse the repository at this point in the history
  55. [xla:cpu] Add JitCompiler and FunctionLibrary APIs for XLA:CPU codegen

    Define APIs for compiling LLVM modules to functions required by the XLA:CPU runtime: kernels, comparators, etc. Implementation largely exists as SimpleOrcJit in service/cpu, but it's tightly coupled with "legacy" XLA.
    
    PiperOrigin-RevId: 699239722
    ezhulenev authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    5638e32 View commit details
    Browse the repository at this point in the history
  56. Update XNNPACK version

    PiperOrigin-RevId: 699242142
    tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    2f87d0f View commit details
    Browse the repository at this point in the history
  57. [XLA:GPU] remove channel ID checks in hlo_instructions.cc

    PiperOrigin-RevId: 699247019
    tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    72dd3b2 View commit details
    Browse the repository at this point in the history
  58. (cleanup)

    * De-dupe logic in test common and model_buffer.
    * Factor out the flatbuffer model wrapper from the class in test common and move to flatbuffer_tools.
    * Add some extra helpers for flatbuffers in flatbuffer_tools, and add test.
    * Hide all the usage of `std::filesystem` stuff in one cc. Technically `<filesystem>` is an unapproved header.
    * Update model_load to use the flatbuffer tools.
    * Pull some of the member functions of "model unpacker" out into non-member functions.
    
    PiperOrigin-RevId: 699249089
    LukeBoyer authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    ec81a53 View commit details
    Browse the repository at this point in the history
  59. [XLA:CPU] Support asynchronous execution for custom transposed convol…

    …utions
    
    Performance is comparable to the synchronous version. Detailed results (where 'old' is the synchronous execution, 'new' is async execution; both use the same, custom algorithm for transposed conv):
    
    name                                                     old cpu/op   new cpu/op   delta
    BM_Conv1DStrided/process_time                            29.4ms ± 6%  29.7ms ± 5%    ~     (p=0.841 n=5+5)
    BM_Conv1DTransposedStrided/process_time                  29.6ms ± 2%  30.7ms ± 2%  +3.52%  (p=0.008 n=5+5)
    BM_Conv1DTransposedStridedNonDefaultLayout/process_time  28.5ms ± 3%  28.3ms ± 1%    ~     (p=0.222 n=5+5)
    
    name                                                     old time/op  new time/op  delta
    BM_Conv1DStrided/process_time                            2.68ms ± 7%  2.72ms ± 5%    ~     (p=0.548 n=5+5)
    BM_Conv1DTransposedStrided/process_time                  7.91ms ± 3%  7.98ms ± 5%    ~     (p=0.548 n=5+5)
    BM_Conv1DTransposedStridedNonDefaultLayout/process_time  7.00ms ± 2%  7.32ms ± 4%  +4.58%  (p=0.016 n=5+5)
    
    PiperOrigin-RevId: 699250549
    Adam-Banas authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    ef30054 View commit details
    Browse the repository at this point in the history
  60. Integrate LLVM at llvm/llvm-project@556ea5265a25

    Updates LLVM usage to match
    [556ea5265a25](llvm/llvm-project@556ea5265a25)
    
    PiperOrigin-RevId: 699251575
    metaflow authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    f888384 View commit details
    Browse the repository at this point in the history
  61. [xla:collectives] Add backends/gpu/collectives:nccl_communicator

    NCCL implementation detail will have private visibility, and for all external users (Thunks etc.) we'll export it via public header that uses xla/core/collectives APIs.
    
    PiperOrigin-RevId: 699256314
    ezhulenev authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    eaf0194 View commit details
    Browse the repository at this point in the history
  62. Add support for TensorV1Attr in flatbuffer_export and flatbuffer_oper…

    …ator, encoded as follows
    
    ```
    _TENSOR_V1_<name>: {
      TENSOR_SHAPE: Vector<i64>,
      TENSOR_TYPE: tflite::TensorType (casted to i64),
      TENSOR_DATA: Vector<f32> or Vector<i64>
    }
    ```
    
    PiperOrigin-RevId: 699272982
    sirakiin authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    807e6fd View commit details
    Browse the repository at this point in the history
  63. [IFRT] Add VIFRT pass for converting between VIFRT versions.

    The pass runs over a VIFRT module, and tries to convert it to a given target version.
    
    PiperOrigin-RevId: 699279298
    ICGog authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    65f6a91 View commit details
    Browse the repository at this point in the history
  64. [xla:collectives] Use NcclCommunicator in NcclApi implementation

    PiperOrigin-RevId: 699279921
    ezhulenev authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    9e800e8 View commit details
    Browse the repository at this point in the history
  65. [xla:collectives] Remove unused CommDestroy

    PiperOrigin-RevId: 699286343
    ezhulenev authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    d1ebdd9 View commit details
    Browse the repository at this point in the history
  66. [Code-Health] Resolve the following technical debt issue: Todo(resolved)

    PiperOrigin-RevId: 699309235
    Varcho authored and tensorflower-gardener committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    d479650 View commit details
    Browse the repository at this point in the history

Commits on Nov 23, 2024

  1. Use absl::Nonnull to indicate that sharding in xla::ifrt::ArraySpec c…

    …annot be null
    
    PiperOrigin-RevId: 699310290
    krishnaharidasan authored and tensorflower-gardener committed Nov 23, 2024
    Configuration menu
    Copy the full SHA
    45abed3 View commit details
    Browse the repository at this point in the history
  2. Add all the quantized models to test_models constants and try to unif…

    …ying the naming.
    
    PiperOrigin-RevId: 699317601
    LukeBoyer authored and tensorflower-gardener committed Nov 23, 2024
    Configuration menu
    Copy the full SHA
    b9f2ce2 View commit details
    Browse the repository at this point in the history
  3. Update target_config to be a text proto and populate it on the

    StreamExecutorGpuClient topology description as well.
    
    PiperOrigin-RevId: 699320139
    pschuh authored and tensorflower-gardener committed Nov 23, 2024
    Configuration menu
    Copy the full SHA
    c9d9178 View commit details
    Browse the repository at this point in the history
  4. Remove absl::Nonnull from AbslStringify

    nullptr is handled here.
    
    PiperOrigin-RevId: 699323007
    krishnaharidasan authored and tensorflower-gardener committed Nov 23, 2024
    Configuration menu
    Copy the full SHA
    00a86aa View commit details
    Browse the repository at this point in the history
  5. Add support for quantization in litert model load

    Also:
    * Add some helper functions for checking a litert op matches a tfl op which can can also be re-used in other contexts.
    * Add some quantization related helper functions to flatbuffer_tools
    * Update dump for quantization
    * Move thins around a bit and add quantization stuff to model_util support checks
    PiperOrigin-RevId: 699333588
    LukeBoyer authored and tensorflower-gardener committed Nov 23, 2024
    Configuration menu
    Copy the full SHA
    ebc16ff View commit details
    Browse the repository at this point in the history
  6. [xla:collectives] NFC: Remove communicator aliases from NcclApi

    PiperOrigin-RevId: 699337598
    ezhulenev authored and tensorflower-gardener committed Nov 23, 2024
    Configuration menu
    Copy the full SHA
    9f1c1aa View commit details
    Browse the repository at this point in the history
  7. Add target_config as an optional field of

    StreamExecutorGpuTopologyDescription rather than parsing it for every compile.
    
    PiperOrigin-RevId: 699344815
    pschuh authored and tensorflower-gardener committed Nov 23, 2024
    Configuration menu
    Copy the full SHA
    2358eff View commit details
    Browse the repository at this point in the history
  8. The MoveUserInstructionsIn cannot handle the conditional operations…

    … with array output and multiple users. It may trigger compilation error, such as the added test target.
    
    PiperOrigin-RevId: 699357851
    ZixuanJiang authored and tensorflower-gardener committed Nov 23, 2024
    Configuration menu
    Copy the full SHA
    f06a547 View commit details
    Browse the repository at this point in the history
  9. internal visibility change

    PiperOrigin-RevId: 699361885
    deqiangc authored and tensorflower-gardener committed Nov 23, 2024
    Configuration menu
    Copy the full SHA
    6a6dd7c View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    57c775e View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    3c02a74 View commit details
    Browse the repository at this point in the history
  12. Move BatchedGatherScatterNormalizer from pre-SPMD for pose-SPMD.

    PiperOrigin-RevId: 699397857
    ZixuanJiang authored and tensorflower-gardener committed Nov 23, 2024
    Configuration menu
    Copy the full SHA
    c3fd63e View commit details
    Browse the repository at this point in the history
  13. Pull the zip functions into a public header

    PiperOrigin-RevId: 699409569
    LukeBoyer authored and tensorflower-gardener committed Nov 23, 2024
    Configuration menu
    Copy the full SHA
    d13a02a View commit details
    Browse the repository at this point in the history
  14. Automated Code Change

    PiperOrigin-RevId: 699467519
    tensorflower-gardener committed Nov 23, 2024
    Configuration menu
    Copy the full SHA
    5ca90b0 View commit details
    Browse the repository at this point in the history
  15. Merge pull request #79777 from tensorflow:gaikwadrahul8-patch-1

    PiperOrigin-RevId: 699496299
    tensorflower-gardener committed Nov 23, 2024
    Configuration menu
    Copy the full SHA
    e88d3b3 View commit details
    Browse the repository at this point in the history
  16. Merge pull request #80574 from tensorflow:gaikwadrahul8-patch-2

    PiperOrigin-RevId: 699497695
    tensorflower-gardener committed Nov 23, 2024
    Configuration menu
    Copy the full SHA
    c5f0512 View commit details
    Browse the repository at this point in the history
  17. Rollback of PR #80574

    Reverts c5f0512
    
    PiperOrigin-RevId: 699499360
    tensorflower-gardener committed Nov 23, 2024
    Configuration menu
    Copy the full SHA
    f377b15 View commit details
    Browse the repository at this point in the history