Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OutOfMemory Error when Loading a 4GB TinyLlama Model with wasi-nn Interface #9570

Open
maochenxi opened this issue Nov 6, 2024 · 2 comments
Labels
bug Incorrect behavior in the current implementation that needs fixing

Comments

@maochenxi
Copy link

Rust code for loading models

use std::convert::TryInto;
use std::fs;
use wasi_nn;
use rand::Rng;
use bytemuck::cast_slice;

pub fn main() {
    let xml = fs::read_to_string("fixture/model.xml").unwrap(); 
    println!("Read graph XML, first 50 characters: {}", &xml[..50]);

    let weights = fs::read("fixture/model.bin").unwrap(); 
    println!("Read graph weights, size in bytes: {}", weights.len());

    let graph = unsafe {
        wasi_nn::load(
            &[&xml.into_bytes(), &weights],
            wasi_nn::GRAPH_ENCODING_OPENVINO,
            wasi_nn::EXECUTION_TARGET_CPU,
        )
        .unwrap()
    };
    println!("Loaded graph into wasi-nn with ID: {}", graph);

    let context = unsafe { wasi_nn::init_execution_context(graph).unwrap() };
    println!("Created wasi-nn execution context with ID: {}", context);

    let input_text = "你好,今天的天气怎么样?";

    let tokenized_input = tokenize(input_text);
    let indexed_tokens: Vec<i32> = tokenized_input.iter().map(|&token| token as i32).collect(); 

    let tensor_a = wasi_nn::Tensor {
        dimensions: &[1, indexed_tokens.len() as u32], 
        r#type: wasi_nn::TENSOR_TYPE_I32, 
        data: bytemuck::cast_slice(&indexed_tokens), 
    };

    unsafe {
        wasi_nn::set_input(context, 0, tensor_a).unwrap(); 
    }

    unsafe {
        wasi_nn::compute(context).unwrap();
    }
    println!("Executed graph inference");

    let mut output_buffer = vec![0i32; 1]; 
    unsafe {
        wasi_nn::get_output(
            context,
            0,
            &mut output_buffer[..] as *mut [i32] as *mut u8,
            (output_buffer.len() * 4).try_into().unwrap(),
        )
        .unwrap();
    }
    println!("output: {:?}", output_buffer);
}

fn tokenize(input: &str) -> Vec<i32> {
    input.chars().map(|c| c as i32).collect() 
}

Steps to Reproduce

I encountered an OutOfMemory error when trying to load a TinyLlama model (with approximately 4GB of parameters) using the wasi-nn interface in Wasmtime. The model is in OpenVINO format. This is the url of TinyLlama:https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T.

Here is the command I used:
/home/maochenxi/wasm/runtime/wasmtime-v24.0.0-x86_64-linux/wasmtime run -S nn --dir=fixture::fixture target/wasm32-wasip1/release/wasi-nn-example.wasm

Actual Results

However, it throws the following error:
image

The error message suggests that the model might be exceeding Wasmtime's memory allocation limits, even though I set max-memory-size to larger momery. Such as:
/home/maochenxi/wasm/runtime/wasmtime-v24.0.0-x86_64-linux/wasmtime run -W max-m emory-size=10240000000 -S nn --dir=fixture::fixture target/wasm32-wasip1/release/wasi-nn-example.wasm

Versions and Environment

Wasmtime version or commit: 24.0.0

Operating system: Archlinux

Questions

  1. Is there a specific parameter in Wasmtime that can further increase memory allocation or better manage memory for large models?
  2. Are there any other workarounds or configurations within Wasmtime or wasi-nn that could help with loading models of this size?
@maochenxi maochenxi added the bug Incorrect behavior in the current implementation that needs fixing label Nov 6, 2024
@bjorn3
Copy link
Contributor

bjorn3 commented Nov 6, 2024

Wasm32 is limited to 4GB of linear memory. Subtract from that static data and the emulated stack and you have less than 4GB of memory you have to fit the weights and all other memory allocations in. Wasm64 allows significantly more memory to be used, but I'm not sure if wasi-nn works with wasm64. You can try compiling for the wasm64-wasip1 rustc target. Make sure to also pass the right flags to wasmtime to enable the memory64 proposal.

@maochenxi
Copy link
Author

Thank you for your response! I tried wasm64, and it does seem that wasi-nn does not support wasm64. Therefore, I’ll have to try switching to a smaller model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior in the current implementation that needs fixing
Projects
None yet
Development

No branches or pull requests

2 participants