Same Model Hash Code Issue from different models #21672

geekadalovelace · 2024-08-08T07:30:26Z

Describe the issue

If given models have the same architecture and identical input/output tensor names for each node, they will generate the same model hash code. Even if the model structure is the same, different weights and shapes should result in different models and thus develop different hash codes.

The hash code is only dependent on the names.

onnxruntime/onnxruntime/core/framework/model_metadef_id_generator.cc

Line 52 in d616025

for (const auto* node_arg : main_graph.GetInputsIncludingInitializers()) {

To reproduce

Two models with the same architecture but different weights generate the same model hash code.

Urgency

No response

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

v1.10

ONNX Runtime API

C++

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

skottmckay · 2024-08-15T23:02:54Z

Is this a real world issue or theoretical? This would only occur if the same instance of the EP was loaded in multiple sessions, and it was a compiling EP. Hashing the weights would add a huge cost.

geekadalovelace · 2024-08-16T00:43:37Z

Is this a real world issue or theoretical? This would only occur if the same instance of the EP was loaded in multiple sessions, and it was a compiling EP. Hashing the weights would add a huge cost.

This is a real-world issue. I have models with the same architecture but trained with different channel sizes. Also, there may be models with the same architecture but weights trained for different training objectives. Compiling models is time-consuming, so I cache the compilation results and use the hash code as the key for the cache.

I modified the code to hash the weights and observed that the time to generate the hash code increases proportionally with the model size. I need smart solutions to address this problem.

skottmckay · 2024-08-26T08:23:20Z

Intended usage of ModelMetadefIdGenerator was to create a deterministic yet unique hash that can be used in the name of the node containing the compiled model to make it easier to debug issues. It wasn't intended as a cache key hash.

Where is the caching code? I assume ORT isn't handling that so it's not clear why the ModelMetadefIdGenerator hash needs to be used as the cache key.

sophies927 added the core runtime issues related to core runtime label Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Same Model Hash Code Issue from different models #21672

Same Model Hash Code Issue from different models #21672

geekadalovelace commented Aug 8, 2024 •

edited

Loading

skottmckay commented Aug 15, 2024

geekadalovelace commented Aug 16, 2024 •

edited

Loading

skottmckay commented Aug 26, 2024

Same Model Hash Code Issue from different models #21672

Same Model Hash Code Issue from different models #21672

Comments

geekadalovelace commented Aug 8, 2024 • edited Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

skottmckay commented Aug 15, 2024

geekadalovelace commented Aug 16, 2024 • edited Loading

skottmckay commented Aug 26, 2024

geekadalovelace commented Aug 8, 2024 •

edited

Loading

geekadalovelace commented Aug 16, 2024 •

edited

Loading