[Computation Hash] User Computation hash is agnostic to debug metadata #8538

rpsilva-aws · 2025-01-07T00:15:16Z

🚀 Feature

Similar to #8537:

Currently, the user computation hashes its computation operand from the protobuf bytes [1]:
hash_ = torch::lazy::MHash(name, computation_.proto().SerializeAsString());
[1] xla/torch_xla/csrc/runtime/computation_client.h](

xla/torch_xla/csrc/runtime/computation_client.h

Line 119 in 31919d5

torch::lazy::MHash(name, computation_.proto().SerializeAsString());

and in this case, different entries from IR debug will end up generating different unique hashes. Subsequently, the hash will require a recompilation due to a Torch JIT cache miss.

In this feature request, we discuss and consider making a design decision to make the cache agnostic to metadata that should not influence the execution of the computations.

Motivation

The motivation is that we want to avoid having to recompile in case there is metadata in the proto, which otherwise affects performance and possibly OOMs (some backend engines treat the resulting HLO from TorchXLA as an unique executable binary).

Pitch

We could consider introducing a new flag that changes that behavior, but make it an opt in (or opt-out):
Say, XLA_CACHE_IGNORE_METADATA that removes some entries in the metadata field of the HLO proto module, e.g.:

op_name
source_line
source_file
before generating the hash key for caching, but retaining otherwise. See all entries in https://github.com/openxla/xla/blob/a49d854a355374453af37a266823450db6714d9a/xla/xla_data.proto#L385. This would be on any place where the resulting computed protobuf for caching (includes user computation) is generated. This should be extensible in case other entries fit the profile here as well. Hence, at least, TorchXLA's "debug" metadata entry should, by design, not interfere with the resulting computation - unlike explicit metadata like frontend_attributes, rhs_contracting_dims, etc, e.g.:

computations {
  name: "SiluForwardImpl.290"
  instructions {
    name: "p0.291"
    opcode: "parameter"
    shape {
      element_type: BF16
      dimensions: 4
      dimensions: 4096
      dimensions: 11008
      layout {
        minor_to_major: 2
        minor_to_major: 1
        minor_to_major: 0
        tail_padding_alignment_in_elements: 1
      }
      is_dynamic_dimension: false
      is_dynamic_dimension: false
      is_dynamic_dimension: false
    }
    metadata {                                                                       <<<<<<<<<<<<<<<<<<<<<<<<<<<<< here
      op_type: "some_op"
      op_name: "xla__some_op"
      source_file: ".../lib/python3.10/site-packages/torch_xla/some_file.py"
      source_line: 1559
    }
    id: 291
    frontend_attributes {
    }
  }

The text was updated successfully, but these errors were encountered:

tengyifei · 2025-01-07T01:38:29Z

We could consider introducing a new flag that changes that behavior

Is there any reason someone would want the old behavior? Sounds like we should just fix torch_xla::runtime::Computation to not do torch::lazy::MHash(name, computation_.proto().SerializeAsString()); and instead ignore the metadata fields.

rpsilva-aws · 2025-01-07T01:49:09Z

If someone somehow relies on the metadata for any functional property. However, I am not aware of any such functionality, especially with the attributes that are currently in it, however, there are several: https://github.com/openxla/xla/blob/a49d854a355374453af37a266823450db6714d9a/xla/xla_data.proto#L385. Some are quite ambiguous: https://github.com/openxla/xla/blob/a49d854a355374453af37a266823450db6714d9a/xla/xla_data.proto#L445

The earlier we make such a decision, the better, but the flag would allow us to revert this behavior for any customer that do rely on these. We can then push for XLA to add proper entries outside of the metadata if needed for any functionality. It's a bit unclear there.

We can also remove a subset of the attrs, which some backend compile engines do, e.g. op_name, source_file, and source_line, but that's less maintainable.

tengyifei · 2025-01-07T01:59:24Z

Let me double check with some XLA folks

tengyifei · 2025-01-07T02:02:32Z

I think the question I wanted to get an answer is, does any of these fields affect the semantics of the resulting compiled executable in any way.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Computation Hash] User Computation hash is agnostic to debug metadata #8538

[Computation Hash] User Computation hash is agnostic to debug metadata #8538

rpsilva-aws commented Jan 7, 2025 •

edited

Loading

tengyifei commented Jan 7, 2025

rpsilva-aws commented Jan 7, 2025 •

edited

Loading

tengyifei commented Jan 7, 2025

tengyifei commented Jan 7, 2025 •

edited

Loading

[Computation Hash] User Computation hash is agnostic to debug metadata #8538

[Computation Hash] User Computation hash is agnostic to debug metadata #8538

Comments

rpsilva-aws commented Jan 7, 2025 • edited Loading

🚀 Feature

Motivation

Pitch

tengyifei commented Jan 7, 2025

rpsilva-aws commented Jan 7, 2025 • edited Loading

tengyifei commented Jan 7, 2025

tengyifei commented Jan 7, 2025 • edited Loading

rpsilva-aws commented Jan 7, 2025 •

edited

Loading

rpsilva-aws commented Jan 7, 2025 •

edited

Loading

tengyifei commented Jan 7, 2025 •

edited

Loading