Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to load onnx model multilingual-e5-large #21321

Open
JirHr opened this issue Jul 11, 2024 · 9 comments
Open

Not able to load onnx model multilingual-e5-large #21321

JirHr opened this issue Jul 11, 2024 · 9 comments
Labels
api:Java issues related to the Java API model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.

Comments

@JirHr
Copy link

JirHr commented Jul 11, 2024

Describe the issue

I am trying to use Spring AI for onnx model multilingual-e5-large:

Model is here:
https://huggingface.co/intfloat/multilingual-e5-large/tree/main/onnx

When I have dependencies in build.gradle like (not trying to force new onnxruntime version):
implementation 'org.springframework.ai:spring-ai-transformers'
//implementation group: 'com.microsoft.onnxruntime', name: 'onnxruntime', version: '1.18.0'

I am getting:
ORT_RUNTIME_EXCEPTION - message: Exception during initialization: C:\a_work\1\s\onnxruntime\core\optimizer\initializer.cc:35 onnxruntime::Initializer::Initializer !model_path.IsEmpty() was false. model_path must not be empty. Ensure that a path is provided when the model is created or loaded.

When I have dependencies in build.gradle like (trying to force new onnxruntime version):
implementation 'org.springframework.ai:spring-ai-transformers'
implementation group: 'com.microsoft.onnxruntime', name: 'onnxruntime', version: '1.18.0'

I am getting:
Error code - ORT_FAIL - message: Deserialize tensor onnx::MatMul_3326 failed.GetFileLength for .\model.onnx_data failed:open file model.onnx_data fail, errcode = 2 - unknown error

The code is perfectly working with model all-MiniLM-L6-v2:
https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/tree/main/onnx

Unfortunately at this moment I do not have any idea, how to resolve this issue. Can you help me?

To reproduce

My configuration is:

@Configuration
public class TransformerConf {
    @Bean("transformersEmbeddingModel")
    public EmbeddingModel embeddingClient() throws Exception {
        TransformersEmbeddingModel embeddingModel = new TransformersEmbeddingModel();
        embeddingModel.setTokenizerResource("classpath:/onnx/multilingual-e5-large/tokenizer.json");
        embeddingModel.setModelResource("classpath:/onnx/multilingual-e5-large/model.onnx");
        embeddingModel.afterPropertiesSet();
        return embeddingModel;
    }
}

Urgency

No response

Platform

Windows

OS Version

11

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.18.0

ONNX Runtime API

Java

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

@github-actions github-actions bot added api:Java issues related to the Java API model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. platform:windows issues related to the Windows platform labels Jul 11, 2024
@yufenglee
Copy link
Member

yufenglee commented Jul 11, 2024

The error shows that the model.onnx_data can not be found. Did you download both the model.onnx and model.onnx_data and save them in the same folder?

@sophies927 sophies927 removed the platform:windows issues related to the Windows platform label Jul 11, 2024
@JirHr
Copy link
Author

JirHr commented Jul 12, 2024

I apologize - you are right.
There are model.onnx and model.onnx_data files and I copied both of them, expecting that model.onnx is the "main" file and the load will be redirected to model.onnx_data

Once specifying:
TransformersEmbeddingModel embeddingModel = new TransformersEmbeddingModel();
embeddingModel.setModelResource("classpath:/onnx/multilingual-e5-large/model.onnx_data");
I was able to resolve the issue

Unfortunately spring AI with ai.onnxruntime still does not work. Now I am getting error:
Caused by: java.lang.OutOfMemoryError: Required array size too large
at java.base/java.io.InputStream.readNBytes(InputStream.java:420) ~[na:na]
at java.base/java.io.InputStream.readAllBytes(InputStream.java:349) ~[na:na]
at org.springframework.util.FileCopyUtils.copyToByteArray(FileCopyUtils.java:149) ~[spring-core-6.1.10.jar:6.1.10]
at org.springframework.core.io.Resource.getContentAsByteArray(Resource.java:151) ~[spring-core-6.1.10.jar:6.1.10]
at org.springframework.ai.transformers.TransformersEmbeddingModel.afterPropertiesSet(TransformersEmbeddingModel.java:193) ~[spring-ai-transformers-1.0.0-M1.jar:1.0.0-M1]

The model has 2,1Gb, I am using 64bit JVM 17 and setting JVM -Xms16g -Xmx16g did not help.....
Thank you very much for ressolution of initial issue.

@Craigacp
Copy link
Contributor

You can't load a multi-part model (where there is both model.onnx and model.onnx_data) from the classpath as a byte array, you need to extract it to a temporary location and load it using a file path. This is a limitation of both ORT and Java, ORT won't let you pass in the other model parts as byte arrays, and Java won't let you make a byte array that is bigger than 2^31. Open an issue on Spring AI as they'll need to add an alternative load mechanism.

@JirHr
Copy link
Author

JirHr commented Jul 23, 2024

I am struggling with this issue almost two weeks - unfortunately I amjust onnx beginner....

I java I need to create OrtSession and there are only two constructor options:

A) From the single file

  /**
   * Create a session loading the model from disk.
   *
   * @param env The environment.
   * @param modelPath The path to the model.
   * @param allocator The allocator to use.
   * @param options Session configuration options.
   * @throws OrtException If the file could not be read, or the model was corrupted etc.
   */
  OrtSession(OrtEnvironment env, String modelPath, OrtAllocator allocator, SessionOptions options)
      throws OrtException {
    this(
        createSession(
            OnnxRuntime.ortApiHandle, env.getNativeHandle(), modelPath, options.getNativeHandle()),
        allocator);
  }

B) From protobuf byte array

  /**
   * Creates a session reading the model from the supplied byte array.
   *
   * @param env The environment.
   * @param modelArray The model protobuf as a byte array.
   * @param allocator The allocator to use.
   * @param options Session configuration options.
   * @throws OrtException If the model was corrupted or some other error occurred in native code.
   */
  OrtSession(OrtEnvironment env, byte[] modelArray, OrtAllocator allocator, SessionOptions options)

anyway, in the end you always need to have "one consolidated onnx export" e.g. in some way to "merge" the files

I have spend a lot of time to understand the issue, finding that probably the principle is:
The current structure of initializer in model.onnx is always pointing to model_data (example):

  initializer {
    dims: 1024
    data_type: 1
    name: "encoder.layer.0.attention.output.LayerNorm.bias"
    external_data {
      key: "location"
      value: "model.onnx_data"
    }
    external_data {
      key: "offset"
      value: "1026146304"
    }
    external_data {
      key: "length"
      value: "4096"
    }
    data_location: EXTERNAL
  }

Expected resulting structure is without extarnal data, containing just raw_data (example):

  initializer {
    dims: 384
    data_type: 1
    name: "encoder.layer.0.attention.output.LayerNorm.bias"
    raw_data: "n\204v......."

I tried to:

  • load model.onnx
  • load model.onnx_data as ByteArray
  • change structure of the model

final python code is:

def combine_onnx_files2(model_path, data_path, output_path):
    # Load the ONNX model structure
    model = onnx.load(model_path)

    # Load onnx_data
    with open(data_path, 'rb') as f:
        tensor_data = f.read()

    for initializer in model.graph.initializer:
        if initializer.data_location == onnx.TensorProto.EXTERNAL:
            offset = 0
            length = 0
            for data in initializer.external_data:
                if data.key == "offset":
                    offset = int(data.value)
                elif data.key == "length":
                    length = int(data.value)

            raw_data = tensor_data[offset:offset + length]

            del initializer.external_data[:]
            initializer.ClearField("data_location")

            initializer.raw_data = raw_data

    onnx.save(model, output_path)

Unfortunately:
protoc --decode=onnx.ModelProto onnx.proto < model_comb.onnx > output_comb.txt
is saying, that the structure is noc correct...
what I am doing wrong?

@Craigacp
Copy link
Contributor

You can't combine them into a single file, it won't load as it will be over the file size limit. You can load the onnx file and let it read the onnx_data file from disk in the location it is in, or load in the onnx_data file in python and write the initializers out in something you can easily read in Java then add them to the SessionOptions using addExternalInitializers. I haven't added support for reading onnx_data files directly in Java, though we could look at doing this.

@JirHr
Copy link
Author

JirHr commented Jul 24, 2024

Thank you for reccomendation of addExternalInitializers

I have tried following code:

import ai.onnx.proto.OnnxMl;
import ai.onnxruntime.*;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.logging.Logger;

public class Main {
    public static void main(String[] args) {
        String modelPath = "/onnx/multilingual-e5-large/model.onnx";
        String dataPath = "/onnx/multilingual-e5-large/model.onnx_data";

        OnnxJavaType typeTest = OnnxJavaType.mapFromClass(Float.class);
        System.out.println("Type size: "+ String.valueOf(typeTest.size));
        System.out.println("Max value: "+ String.valueOf(Integer.MAX_VALUE - (8 * typeTest.size)));

        try {
            // Create the ONNX Runtime environment
            OrtEnvironment env = OrtEnvironment.getEnvironment();
            // Create SessionOptions
            OrtSession.SessionOptions sessionOptions = new OrtSession.SessionOptions();
            sessionOptions.setOptimizationLevel(OrtSession.SessionOptions.OptLevel.BASIC_OPT);
            // Load the ONNX model
            OnnxMl.ModelProto model = OnnxMl.ModelProto.parseFrom(new FileInputStream(modelPath));
            OnnxMl.GraphProto graph = model.getGraph();
            // Read _data file
            Map<String, OnnxTensorLike> initializers = new HashMap<>();
            for (OnnxMl.TensorProto initializer : graph.getInitializerList()) {
                if (initializer.getDataLocation() == OnnxMl.TensorProto.DataLocation.EXTERNAL) {
                    long offset = 0;
                    int length = 0;
                    String name = initializer.getName();
                    for (OnnxMl.StringStringEntryProto data : initializer.getExternalDataList()) {
                        if (data.getKey().equals("offset")) {
                            offset = Long.parseLong(data.getValue());
                        } else if (data.getKey().equals("length")) {
                            length = Integer.parseInt(data.getValue());
                        }
                    }
                    byte[] rawData = readExternalData(dataPath, offset, length);
                    ByteBuffer byteBuffer = ByteBuffer.wrap(rawData);
                    System.out.println("Raw data offset: " + String.valueOf(offset));
                    System.out.println("Raw data length: " + String.valueOf(length));
                    System.out.println("Raw data size: " + String.valueOf(rawData.length));
                    System.out.println("Buffer limit: " + String.valueOf(byteBuffer.limit()));


                    // Create OnnxTensor and use it as OnnxTensorLike
                    /*
                    OnnxTensorLike onnxTensorLike = OnnxTensor.createTensor(
                            env,
                            byteBuffer,
                            convertListToLongArray(initializer.getDimsList()),
                            OnnxJavaType.mapFromInt(initializer.getDataType())
                    );

                    initializers.put(name, onnxTensorLike);
                     */
                }
            }

            // Add external initializers to SessionOptions
            sessionOptions.addExternalInitializers(initializers);

            // Load the model and create a session
            OrtSession session = env.createSession(modelPath, sessionOptions);

            // Close resources
            session.close();
            sessionOptions.close();
            env.close();
        } catch (IOException | OrtException e) {
            e.printStackTrace();
        }
    }

    private static byte[] readExternalData(String dataPath, long offset, int length) throws IOException {
        try (FileInputStream fis = new FileInputStream(dataPath);
             FileChannel fileChannel = fis.getChannel()) {
            ByteBuffer buffer = ByteBuffer.allocate(length);
            fileChannel.position(offset);
            fileChannel.read(buffer);
            return buffer.array();
        }
    }

    public static long[] convertListToLongArray(List<Long> longList) {
        long[] longArray = new long[longList.size()];
        for (int i = 0; i < longList.size(); i++) {
            longArray[i] = longList.get(i);
        }
        return longArray;
    }
}

When you uncomment "Create OnnxTensor and use it as OnnxTensorLike" part, I am getting an error:

Cannot allocate a direct buffer of the requested size and type, size 1024008192, type = FLOAT

I am using Java 17 and added these VM options: -XX:MaxDirectMemorySize=4g -Xmx2g - did not help

Going to OrtUtil.java, line 492 I see following condition:

  static BufferTuple prepareBuffer(Buffer data, OnnxJavaType type) {
    if (type == OnnxJavaType.STRING || type == OnnxJavaType.UNKNOWN) {
      throw new IllegalStateException("Cannot create a " + type + " tensor from a buffer");
    }
    int bufferPos;
    long bufferSizeLong = data.remaining() * (long) type.size;
    if (bufferSizeLong > (Integer.MAX_VALUE - (8 * type.size))) {

Therefore I have commented "Create OnnxTensor and use it as OnnxTensorLike" part, getting for first largest initializer
Type size: 4 //e.g. OnnxJavaType typeTest = OnnxJavaType.mapFromClass(Float.class);System.out.println("Type size: "+ String.valueOf(typeTest.size));
Max value: 2147483615 //e.g. System.out.println("Max value: "+ String.valueOf(Integer.MAX_VALUE - (8 * typeTest.size)));
Raw data offset: 0 //e.g. initializer offset
Raw data length: 1024008192 //e.g. initializer length
Raw data size: 1024008192 //e.g. check of raw data size
Buffer limit: 1024008192 //e.g. byteBuffer.limit()

Raw data offset: 1024008192
Raw data length: 2105344
Raw data size: 2105344
Buffer limit: 2105344
.....

It seems, that even here is 2Gb limit (2147483615), and I do not understand why it fails if initializer has 1Gb...what I am doing wrong?

@JirHr
Copy link
Author

JirHr commented Jul 24, 2024

I was recommended this workaround, which seems to be working (further tests required):

byte[] rawData = readExternalData(dataPath, offset, length);
ByteBuffer byteBuffer = ByteBuffer.wrap(rawData);
byteBuffer.order(ByteOrder.LITTLE_ENDIAN);
FloatBuffer floatBuffer = byteBuffer.asFloatBuffer();
OnnxTensorLike onnxTensorLike = OnnxTensor.createTensor(env,floatBuffer,convertListToLongArray(initializer.getDimsList()));

@leichangqing
Copy link

I had this issue too. when can fix it?

@Craigacp
Copy link
Contributor

You can load in the initializers manually by processing the byte stream from the onnx_data file if you have it in a classpath resource, or load it off disk. This is really a Spring AI issue as they assume that everything can be loaded from classpath resources, but that's not true for ONNX models which are larger than 2GB.

skottmckay pushed a commit that referenced this issue Sep 13, 2024
…ining elements of a different type (#21774)

### Description
Fixes a bug where the buffer offset and position was incorrectly
computed if the user supplied a `ByteBuffer` to `createTensor` but set
the type of the tensor to something other than `INT8`. This would be
more common if the user was trying to load the initializers from a
serialized representation and didn't want to bother with the type
information (which is the case in #21321).

### Motivation and Context
Partial fix for #21321. The remainder of the fix is to add a helper
which allows users to load initializers out of an `onnx_data` file, but
that will require adding protobuf as a dependency for the Java API to
allow the parsing of an ONNX file separately from the native code. It
might be nicer to put that functionality into ORT's C API so it can
return the lengths & offsets of the initializers when provided with an
ONNX file containing external initializers. We hit this kind of thing in
Java more often than other languages as in Java models can be supplied
as classpath resources which we can easily read, but not materialize on
disk for the ORT native library to read.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api:Java issues related to the Java API model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.
Projects
None yet
Development

No branches or pull requests

5 participants