Not able to load onnx model multilingual-e5-large #21321

JirHr · 2024-07-11T15:40:27Z

Describe the issue

I am trying to use Spring AI for onnx model multilingual-e5-large:

Model is here:
https://huggingface.co/intfloat/multilingual-e5-large/tree/main/onnx

When I have dependencies in build.gradle like (not trying to force new onnxruntime version):
implementation 'org.springframework.ai:spring-ai-transformers'
//implementation group: 'com.microsoft.onnxruntime', name: 'onnxruntime', version: '1.18.0'

I am getting:
ORT_RUNTIME_EXCEPTION - message: Exception during initialization: C:\a_work\1\s\onnxruntime\core\optimizer\initializer.cc:35 onnxruntime::Initializer::Initializer !model_path.IsEmpty() was false. model_path must not be empty. Ensure that a path is provided when the model is created or loaded.

When I have dependencies in build.gradle like (trying to force new onnxruntime version):
implementation 'org.springframework.ai:spring-ai-transformers'
implementation group: 'com.microsoft.onnxruntime', name: 'onnxruntime', version: '1.18.0'

I am getting:
Error code - ORT_FAIL - message: Deserialize tensor onnx::MatMul_3326 failed.GetFileLength for .\model.onnx_data failed:open file model.onnx_data fail, errcode = 2 - unknown error

The code is perfectly working with model all-MiniLM-L6-v2:
https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/tree/main/onnx

Unfortunately at this moment I do not have any idea, how to resolve this issue. Can you help me?

To reproduce

My configuration is:

@Configuration
public class TransformerConf {
    @Bean("transformersEmbeddingModel")
    public EmbeddingModel embeddingClient() throws Exception {
        TransformersEmbeddingModel embeddingModel = new TransformersEmbeddingModel();
        embeddingModel.setTokenizerResource("classpath:/onnx/multilingual-e5-large/tokenizer.json");
        embeddingModel.setModelResource("classpath:/onnx/multilingual-e5-large/model.onnx");
        embeddingModel.afterPropertiesSet();
        return embeddingModel;
    }
}

Urgency

No response

Platform

Windows

OS Version

11

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.18.0

ONNX Runtime API

Java

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

The text was updated successfully, but these errors were encountered:

yufenglee · 2024-07-11T17:45:26Z

The error shows that the model.onnx_data can not be found. Did you download both the model.onnx and model.onnx_data and save them in the same folder?

JirHr · 2024-07-12T07:58:15Z

I apologize - you are right.
There are model.onnx and model.onnx_data files and I copied both of them, expecting that model.onnx is the "main" file and the load will be redirected to model.onnx_data

Once specifying:
TransformersEmbeddingModel embeddingModel = new TransformersEmbeddingModel();
embeddingModel.setModelResource("classpath:/onnx/multilingual-e5-large/model.onnx_data");
I was able to resolve the issue

Unfortunately spring AI with ai.onnxruntime still does not work. Now I am getting error:
Caused by: java.lang.OutOfMemoryError: Required array size too large
at java.base/java.io.InputStream.readNBytes(InputStream.java:420) ~[na:na]
at java.base/java.io.InputStream.readAllBytes(InputStream.java:349) ~[na:na]
at org.springframework.util.FileCopyUtils.copyToByteArray(FileCopyUtils.java:149) ~[spring-core-6.1.10.jar:6.1.10]
at org.springframework.core.io.Resource.getContentAsByteArray(Resource.java:151) ~[spring-core-6.1.10.jar:6.1.10]
at org.springframework.ai.transformers.TransformersEmbeddingModel.afterPropertiesSet(TransformersEmbeddingModel.java:193) ~[spring-ai-transformers-1.0.0-M1.jar:1.0.0-M1]

The model has 2,1Gb, I am using 64bit JVM 17 and setting JVM -Xms16g -Xmx16g did not help.....
Thank you very much for ressolution of initial issue.

Craigacp · 2024-07-12T13:30:09Z

You can't load a multi-part model (where there is both model.onnx and model.onnx_data) from the classpath as a byte array, you need to extract it to a temporary location and load it using a file path. This is a limitation of both ORT and Java, ORT won't let you pass in the other model parts as byte arrays, and Java won't let you make a byte array that is bigger than 2^31. Open an issue on Spring AI as they'll need to add an alternative load mechanism.

JirHr · 2024-07-23T09:20:39Z

I am struggling with this issue almost two weeks - unfortunately I amjust onnx beginner....

I java I need to create OrtSession and there are only two constructor options:

A) From the single file

  /**
   * Create a session loading the model from disk.
   *
   * @param env The environment.
   * @param modelPath The path to the model.
   * @param allocator The allocator to use.
   * @param options Session configuration options.
   * @throws OrtException If the file could not be read, or the model was corrupted etc.
   */
  OrtSession(OrtEnvironment env, String modelPath, OrtAllocator allocator, SessionOptions options)
      throws OrtException {
    this(
        createSession(
            OnnxRuntime.ortApiHandle, env.getNativeHandle(), modelPath, options.getNativeHandle()),
        allocator);
  }

B) From protobuf byte array

  /**
   * Creates a session reading the model from the supplied byte array.
   *
   * @param env The environment.
   * @param modelArray The model protobuf as a byte array.
   * @param allocator The allocator to use.
   * @param options Session configuration options.
   * @throws OrtException If the model was corrupted or some other error occurred in native code.
   */
  OrtSession(OrtEnvironment env, byte[] modelArray, OrtAllocator allocator, SessionOptions options)

anyway, in the end you always need to have "one consolidated onnx export" e.g. in some way to "merge" the files

I have spend a lot of time to understand the issue, finding that probably the principle is:
The current structure of initializer in model.onnx is always pointing to model_data (example):

  initializer {
    dims: 1024
    data_type: 1
    name: "encoder.layer.0.attention.output.LayerNorm.bias"
    external_data {
      key: "location"
      value: "model.onnx_data"
    }
    external_data {
      key: "offset"
      value: "1026146304"
    }
    external_data {
      key: "length"
      value: "4096"
    }
    data_location: EXTERNAL
  }

Expected resulting structure is without extarnal data, containing just raw_data (example):

  initializer {
    dims: 384
    data_type: 1
    name: "encoder.layer.0.attention.output.LayerNorm.bias"
    raw_data: "n\204v......."

I tried to:

load model.onnx
load model.onnx_data as ByteArray
change structure of the model

final python code is:

def combine_onnx_files2(model_path, data_path, output_path):
    # Load the ONNX model structure
    model = onnx.load(model_path)

    # Load onnx_data
    with open(data_path, 'rb') as f:
        tensor_data = f.read()

    for initializer in model.graph.initializer:
        if initializer.data_location == onnx.TensorProto.EXTERNAL:
            offset = 0
            length = 0
            for data in initializer.external_data:
                if data.key == "offset":
                    offset = int(data.value)
                elif data.key == "length":
                    length = int(data.value)

            raw_data = tensor_data[offset:offset + length]

            del initializer.external_data[:]
            initializer.ClearField("data_location")

            initializer.raw_data = raw_data

    onnx.save(model, output_path)

Unfortunately:
protoc --decode=onnx.ModelProto onnx.proto < model_comb.onnx > output_comb.txt
is saying, that the structure is noc correct...
what I am doing wrong?

Craigacp · 2024-07-23T15:21:51Z

You can't combine them into a single file, it won't load as it will be over the file size limit. You can load the onnx file and let it read the onnx_data file from disk in the location it is in, or load in the onnx_data file in python and write the initializers out in something you can easily read in Java then add them to the SessionOptions using addExternalInitializers. I haven't added support for reading onnx_data files directly in Java, though we could look at doing this.

JirHr · 2024-07-24T12:45:08Z

Thank you for reccomendation of addExternalInitializers

I have tried following code:

import ai.onnx.proto.OnnxMl;
import ai.onnxruntime.*;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.logging.Logger;

public class Main {
    public static void main(String[] args) {
        String modelPath = "/onnx/multilingual-e5-large/model.onnx";
        String dataPath = "/onnx/multilingual-e5-large/model.onnx_data";

        OnnxJavaType typeTest = OnnxJavaType.mapFromClass(Float.class);
        System.out.println("Type size: "+ String.valueOf(typeTest.size));
        System.out.println("Max value: "+ String.valueOf(Integer.MAX_VALUE - (8 * typeTest.size)));

        try {
            // Create the ONNX Runtime environment
            OrtEnvironment env = OrtEnvironment.getEnvironment();
            // Create SessionOptions
            OrtSession.SessionOptions sessionOptions = new OrtSession.SessionOptions();
            sessionOptions.setOptimizationLevel(OrtSession.SessionOptions.OptLevel.BASIC_OPT);
            // Load the ONNX model
            OnnxMl.ModelProto model = OnnxMl.ModelProto.parseFrom(new FileInputStream(modelPath));
            OnnxMl.GraphProto graph = model.getGraph();
            // Read _data file
            Map<String, OnnxTensorLike> initializers = new HashMap<>();
            for (OnnxMl.TensorProto initializer : graph.getInitializerList()) {
                if (initializer.getDataLocation() == OnnxMl.TensorProto.DataLocation.EXTERNAL) {
                    long offset = 0;
                    int length = 0;
                    String name = initializer.getName();
                    for (OnnxMl.StringStringEntryProto data : initializer.getExternalDataList()) {
                        if (data.getKey().equals("offset")) {
                            offset = Long.parseLong(data.getValue());
                        } else if (data.getKey().equals("length")) {
                            length = Integer.parseInt(data.getValue());
                        }
                    }
                    byte[] rawData = readExternalData(dataPath, offset, length);
                    ByteBuffer byteBuffer = ByteBuffer.wrap(rawData);
                    System.out.println("Raw data offset: " + String.valueOf(offset));
                    System.out.println("Raw data length: " + String.valueOf(length));
                    System.out.println("Raw data size: " + String.valueOf(rawData.length));
                    System.out.println("Buffer limit: " + String.valueOf(byteBuffer.limit()));


                    // Create OnnxTensor and use it as OnnxTensorLike
                    /*
                    OnnxTensorLike onnxTensorLike = OnnxTensor.createTensor(
                            env,
                            byteBuffer,
                            convertListToLongArray(initializer.getDimsList()),
                            OnnxJavaType.mapFromInt(initializer.getDataType())
                    );

                    initializers.put(name, onnxTensorLike);
                     */
                }
            }

            // Add external initializers to SessionOptions
            sessionOptions.addExternalInitializers(initializers);

            // Load the model and create a session
            OrtSession session = env.createSession(modelPath, sessionOptions);

            // Close resources
            session.close();
            sessionOptions.close();
            env.close();
        } catch (IOException | OrtException e) {
            e.printStackTrace();
        }
    }

    private static byte[] readExternalData(String dataPath, long offset, int length) throws IOException {
        try (FileInputStream fis = new FileInputStream(dataPath);
             FileChannel fileChannel = fis.getChannel()) {
            ByteBuffer buffer = ByteBuffer.allocate(length);
            fileChannel.position(offset);
            fileChannel.read(buffer);
            return buffer.array();
        }
    }

    public static long[] convertListToLongArray(List<Long> longList) {
        long[] longArray = new long[longList.size()];
        for (int i = 0; i < longList.size(); i++) {
            longArray[i] = longList.get(i);
        }
        return longArray;
    }
}

When you uncomment "Create OnnxTensor and use it as OnnxTensorLike" part, I am getting an error:

Cannot allocate a direct buffer of the requested size and type, size 1024008192, type = FLOAT

I am using Java 17 and added these VM options: -XX:MaxDirectMemorySize=4g -Xmx2g - did not help

Going to OrtUtil.java, line 492 I see following condition:

  static BufferTuple prepareBuffer(Buffer data, OnnxJavaType type) {
    if (type == OnnxJavaType.STRING || type == OnnxJavaType.UNKNOWN) {
      throw new IllegalStateException("Cannot create a " + type + " tensor from a buffer");
    }
    int bufferPos;
    long bufferSizeLong = data.remaining() * (long) type.size;
    if (bufferSizeLong > (Integer.MAX_VALUE - (8 * type.size))) {

Therefore I have commented "Create OnnxTensor and use it as OnnxTensorLike" part, getting for first largest initializer
Type size: 4 //e.g. OnnxJavaType typeTest = OnnxJavaType.mapFromClass(Float.class);System.out.println("Type size: "+ String.valueOf(typeTest.size));
Max value: 2147483615 //e.g. System.out.println("Max value: "+ String.valueOf(Integer.MAX_VALUE - (8 * typeTest.size)));
Raw data offset: 0 //e.g. initializer offset
Raw data length: 1024008192 //e.g. initializer length
Raw data size: 1024008192 //e.g. check of raw data size
Buffer limit: 1024008192 //e.g. byteBuffer.limit()

Raw data offset: 1024008192
Raw data length: 2105344
Raw data size: 2105344
Buffer limit: 2105344
.....

It seems, that even here is 2Gb limit (2147483615), and I do not understand why it fails if initializer has 1Gb...what I am doing wrong?

JirHr · 2024-07-24T13:54:27Z

I was recommended this workaround, which seems to be working (further tests required):

byte[] rawData = readExternalData(dataPath, offset, length);
ByteBuffer byteBuffer = ByteBuffer.wrap(rawData);
byteBuffer.order(ByteOrder.LITTLE_ENDIAN);
FloatBuffer floatBuffer = byteBuffer.asFloatBuffer();
OnnxTensorLike onnxTensorLike = OnnxTensor.createTensor(env,floatBuffer,convertListToLongArray(initializer.getDimsList()));

leichangqing · 2024-08-22T02:18:56Z

I had this issue too. when can fix it?

Craigacp · 2024-08-22T02:52:47Z

You can load in the initializers manually by processing the byte stream from the onnx_data file if you have it in a classpath resource, or load it off disk. This is really a Spring AI issue as they assume that everything can be loaded from classpath resources, but that's not true for ONNX models which are larger than 2GB.

…ining elements of a different type (#21774) ### Description Fixes a bug where the buffer offset and position was incorrectly computed if the user supplied a `ByteBuffer` to `createTensor` but set the type of the tensor to something other than `INT8`. This would be more common if the user was trying to load the initializers from a serialized representation and didn't want to bother with the type information (which is the case in #21321). ### Motivation and Context Partial fix for #21321. The remainder of the fix is to add a helper which allows users to load initializers out of an `onnx_data` file, but that will require adding protobuf as a dependency for the Java API to allow the parsing of an ONNX file separately from the native code. It might be nicer to put that functionality into ORT's C API so it can return the lengths & offsets of the initializers when provided with an ONNX file containing external initializers. We hit this kind of thing in Java more often than other languages as in Java models can be supplied as classpath resources which we can easily read, but not materialize on disk for the ORT native library to read.

github-actions bot added api:Java issues related to the Java API model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. platform:windows issues related to the Windows platform labels Jul 11, 2024

sophies927 removed the platform:windows issues related to the Windows platform label Jul 11, 2024

JirHr mentioned this issue Jul 23, 2024

How to load multipart model multilingual-e5-large to Spring AI, how to combine .onnx and .onnx-data to single file/byte array? onnx/onnx#6251

Open

Craigacp mentioned this issue Aug 16, 2024

[java] Fix for OnnxTensor creation when passing in a ByteBuffer containing elements of a different type #21774

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not able to load onnx model multilingual-e5-large #21321

Not able to load onnx model multilingual-e5-large #21321

JirHr commented Jul 11, 2024

yufenglee commented Jul 11, 2024 •

edited

Loading

JirHr commented Jul 12, 2024

Craigacp commented Jul 12, 2024

JirHr commented Jul 23, 2024 •

edited

Loading

Craigacp commented Jul 23, 2024

JirHr commented Jul 24, 2024

JirHr commented Jul 24, 2024

leichangqing commented Aug 22, 2024

Craigacp commented Aug 22, 2024

Not able to load onnx model multilingual-e5-large #21321

Not able to load onnx model multilingual-e5-large #21321

Comments

JirHr commented Jul 11, 2024

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

yufenglee commented Jul 11, 2024 • edited Loading

JirHr commented Jul 12, 2024

Craigacp commented Jul 12, 2024

JirHr commented Jul 23, 2024 • edited Loading

Craigacp commented Jul 23, 2024

JirHr commented Jul 24, 2024

JirHr commented Jul 24, 2024

leichangqing commented Aug 22, 2024

Craigacp commented Aug 22, 2024

yufenglee commented Jul 11, 2024 •

edited

Loading

JirHr commented Jul 23, 2024 •

edited

Loading