OutOfMemory Error when Loading a 4GB TinyLlama Model with wasi-nn Interface #9570

maochenxi · 2024-11-06T09:04:38Z

Rust code for loading models

use std::convert::TryInto;
use std::fs;
use wasi_nn;
use rand::Rng;
use bytemuck::cast_slice;

pub fn main() {
    let xml = fs::read_to_string("fixture/model.xml").unwrap(); 
    println!("Read graph XML, first 50 characters: {}", &xml[..50]);

    let weights = fs::read("fixture/model.bin").unwrap(); 
    println!("Read graph weights, size in bytes: {}", weights.len());

    let graph = unsafe {
        wasi_nn::load(
            &[&xml.into_bytes(), &weights],
            wasi_nn::GRAPH_ENCODING_OPENVINO,
            wasi_nn::EXECUTION_TARGET_CPU,
        )
        .unwrap()
    };
    println!("Loaded graph into wasi-nn with ID: {}", graph);

    let context = unsafe { wasi_nn::init_execution_context(graph).unwrap() };
    println!("Created wasi-nn execution context with ID: {}", context);

    let input_text = "你好，今天的天气怎么样？";

    let tokenized_input = tokenize(input_text);
    let indexed_tokens: Vec<i32> = tokenized_input.iter().map(|&token| token as i32).collect(); 

    let tensor_a = wasi_nn::Tensor {
        dimensions: &[1, indexed_tokens.len() as u32], 
        r#type: wasi_nn::TENSOR_TYPE_I32, 
        data: bytemuck::cast_slice(&indexed_tokens), 
    };

    unsafe {
        wasi_nn::set_input(context, 0, tensor_a).unwrap(); 
    }

    unsafe {
        wasi_nn::compute(context).unwrap();
    }
    println!("Executed graph inference");

    let mut output_buffer = vec![0i32; 1]; 
    unsafe {
        wasi_nn::get_output(
            context,
            0,
            &mut output_buffer[..] as *mut [i32] as *mut u8,
            (output_buffer.len() * 4).try_into().unwrap(),
        )
        .unwrap();
    }
    println!("output: {:?}", output_buffer);
}

fn tokenize(input: &str) -> Vec<i32> {
    input.chars().map(|c| c as i32).collect() 
}

Steps to Reproduce

I encountered an OutOfMemory error when trying to load a TinyLlama model (with approximately 4GB of parameters) using the wasi-nn interface in Wasmtime. The model is in OpenVINO format. This is the url of TinyLlama:https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T.

Here is the command I used:
/home/maochenxi/wasm/runtime/wasmtime-v24.0.0-x86_64-linux/wasmtime run -S nn --dir=fixture::fixture target/wasm32-wasip1/release/wasi-nn-example.wasm

Actual Results

However, it throws the following error:

The error message suggests that the model might be exceeding Wasmtime's memory allocation limits, even though I set max-memory-size to larger momery. Such as:
/home/maochenxi/wasm/runtime/wasmtime-v24.0.0-x86_64-linux/wasmtime run -W max-m emory-size=10240000000 -S nn --dir=fixture::fixture target/wasm32-wasip1/release/wasi-nn-example.wasm

Versions and Environment

Wasmtime version or commit: 24.0.0

Operating system: Archlinux

Questions

Is there a specific parameter in Wasmtime that can further increase memory allocation or better manage memory for large models?
Are there any other workarounds or configurations within Wasmtime or wasi-nn that could help with loading models of this size?

The text was updated successfully, but these errors were encountered:

bjorn3 · 2024-11-06T09:17:25Z

Wasm32 is limited to 4GB of linear memory. Subtract from that static data and the emulated stack and you have less than 4GB of memory you have to fit the weights and all other memory allocations in. Wasm64 allows significantly more memory to be used, but I'm not sure if wasi-nn works with wasm64. You can try compiling for the wasm64-wasip1 rustc target. Make sure to also pass the right flags to wasmtime to enable the memory64 proposal.

maochenxi · 2024-11-07T07:44:56Z

Thank you for your response! I tried wasm64, and it does seem that wasi-nn does not support wasm64. Therefore, I’ll have to try switching to a smaller model.

maochenxi added the bug Incorrect behavior in the current implementation that needs fixing label Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OutOfMemory Error when Loading a 4GB TinyLlama Model with wasi-nn Interface #9570

OutOfMemory Error when Loading a 4GB TinyLlama Model with wasi-nn Interface #9570

maochenxi commented Nov 6, 2024

bjorn3 commented Nov 6, 2024

maochenxi commented Nov 7, 2024

OutOfMemory Error when Loading a 4GB TinyLlama Model with wasi-nn Interface #9570

OutOfMemory Error when Loading a 4GB TinyLlama Model with wasi-nn Interface #9570

Comments

maochenxi commented Nov 6, 2024

Rust code for loading models

Steps to Reproduce

Actual Results

Versions and Environment

Questions

bjorn3 commented Nov 6, 2024

maochenxi commented Nov 7, 2024