Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Same Model Hash Code Issue from different models #21672

Open
geekadalovelace opened this issue Aug 8, 2024 · 3 comments
Open

Same Model Hash Code Issue from different models #21672

geekadalovelace opened this issue Aug 8, 2024 · 3 comments
Labels
core runtime issues related to core runtime

Comments

@geekadalovelace
Copy link

geekadalovelace commented Aug 8, 2024

Describe the issue

If given models have the same architecture and identical input/output tensor names for each node, they will generate the same model hash code. Even if the model structure is the same, different weights and shapes should result in different models and thus develop different hash codes.

The hash code is only dependent on the names.

for (const auto* node_arg : main_graph.GetInputsIncludingInitializers()) {

To reproduce

Two models with the same architecture but different weights generate the same model hash code.

Urgency

No response

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

v1.10

ONNX Runtime API

C++

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

@sophies927 sophies927 added the core runtime issues related to core runtime label Aug 15, 2024
@skottmckay
Copy link
Contributor

Is this a real world issue or theoretical? This would only occur if the same instance of the EP was loaded in multiple sessions, and it was a compiling EP. Hashing the weights would add a huge cost.

@geekadalovelace
Copy link
Author

geekadalovelace commented Aug 16, 2024

Is this a real world issue or theoretical? This would only occur if the same instance of the EP was loaded in multiple sessions, and it was a compiling EP. Hashing the weights would add a huge cost.

This is a real-world issue. I have models with the same architecture but trained with different channel sizes. Also, there may be models with the same architecture but weights trained for different training objectives. Compiling models is time-consuming, so I cache the compilation results and use the hash code as the key for the cache.

I modified the code to hash the weights and observed that the time to generate the hash code increases proportionally with the model size. I need smart solutions to address this problem.

@skottmckay
Copy link
Contributor

Intended usage of ModelMetadefIdGenerator was to create a deterministic yet unique hash that can be used in the name of the node containing the compiled model to make it easier to debug issues. It wasn't intended as a cache key hash.

Where is the caching code? I assume ORT isn't handling that so it's not clear why the ModelMetadefIdGenerator hash needs to be used as the cache key.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core runtime issues related to core runtime
Projects
None yet
Development

No branches or pull requests

3 participants