Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Computation Hash] User Computation hash is agnostic to debug metadata #8538

Open
rpsilva-aws opened this issue Jan 7, 2025 · 4 comments
Open

Comments

@rpsilva-aws
Copy link
Contributor

rpsilva-aws commented Jan 7, 2025

🚀 Feature

Similar to #8537:

Currently, the user computation hashes its computation operand from the protobuf bytes [1]:
hash_ = torch::lazy::MHash(name, computation_.proto().SerializeAsString());
[1] xla/torch_xla/csrc/runtime/computation_client.h](

torch::lazy::MHash(name, computation_.proto().SerializeAsString());

and in this case, different entries from IR debug will end up generating different unique hashes. Subsequently, the hash will require a recompilation due to a Torch JIT cache miss.

In this feature request, we discuss and consider making a design decision to make the cache agnostic to metadata that should not influence the execution of the computations.

Motivation

The motivation is that we want to avoid having to recompile in case there is metadata in the proto, which otherwise affects performance and possibly OOMs (some backend engines treat the resulting HLO from TorchXLA as an unique executable binary).

Pitch

We could consider introducing a new flag that changes that behavior, but make it an opt in (or opt-out):
Say, XLA_CACHE_IGNORE_METADATA that removes some entries in the metadata field of the HLO proto module, e.g.:

  • op_name
  • source_line
  • source_file
    before generating the hash key for caching, but retaining otherwise. See all entries in https://github.com/openxla/xla/blob/a49d854a355374453af37a266823450db6714d9a/xla/xla_data.proto#L385. This would be on any place where the resulting computed protobuf for caching (includes user computation) is generated. This should be extensible in case other entries fit the profile here as well. Hence, at least, TorchXLA's "debug" metadata entry should, by design, not interfere with the resulting computation - unlike explicit metadata like frontend_attributes, rhs_contracting_dims, etc, e.g.:
computations {
  name: "SiluForwardImpl.290"
  instructions {
    name: "p0.291"
    opcode: "parameter"
    shape {
      element_type: BF16
      dimensions: 4
      dimensions: 4096
      dimensions: 11008
      layout {
        minor_to_major: 2
        minor_to_major: 1
        minor_to_major: 0
        tail_padding_alignment_in_elements: 1
      }
      is_dynamic_dimension: false
      is_dynamic_dimension: false
      is_dynamic_dimension: false
    }
    metadata {                                                                       <<<<<<<<<<<<<<<<<<<<<<<<<<<<< here
      op_type: "some_op"
      op_name: "xla__some_op"
      source_file: ".../lib/python3.10/site-packages/torch_xla/some_file.py"
      source_line: 1559
    }
    id: 291
    frontend_attributes {
    }
  }
@tengyifei
Copy link
Collaborator

We could consider introducing a new flag that changes that behavior

Is there any reason someone would want the old behavior? Sounds like we should just fix torch_xla::runtime::Computation to not do torch::lazy::MHash(name, computation_.proto().SerializeAsString()); and instead ignore the metadata fields.

@rpsilva-aws
Copy link
Contributor Author

rpsilva-aws commented Jan 7, 2025

If someone somehow relies on the metadata for any functional property. However, I am not aware of any such functionality, especially with the attributes that are currently in it, however, there are several: https://github.com/openxla/xla/blob/a49d854a355374453af37a266823450db6714d9a/xla/xla_data.proto#L385. Some are quite ambiguous: https://github.com/openxla/xla/blob/a49d854a355374453af37a266823450db6714d9a/xla/xla_data.proto#L445

The earlier we make such a decision, the better, but the flag would allow us to revert this behavior for any customer that do rely on these. We can then push for XLA to add proper entries outside of the metadata if needed for any functionality. It's a bit unclear there.

We can also remove a subset of the attrs, which some backend compile engines do, e.g. op_name, source_file, and source_line, but that's less maintainable.

@tengyifei
Copy link
Collaborator

Let me double check with some XLA folks

@tengyifei
Copy link
Collaborator

tengyifei commented Jan 7, 2025

I think the question I wanted to get an answer is, does any of these fields affect the semantics of the resulting compiled executable in any way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants