Memory leak in NPU inference after each one session.run #21587

WTian-Yu · 2024-08-01T09:10:09Z

Describe the issue

Hi all, I'm running Microsoft DirectML NPU inference sample code. which is OnnxRuntime + DML as execution provider.
It seems that memory keep in increasing slowly in the performance test for loop. (Check To reproduce part1)

If I change the input buffer from DML to CPU memory, the memory increase dramatically (Check To reproduce part2).

Did I miss anything? Thanks.

To reproduce

Every time.

Part1:

    constexpr int fenceValueStart = 2;
    constexpr int numIterations = 100000;
    for (int i = fenceValueStart; i < (numIterations + fenceValueStart); i++)
    {
        session.Run(Ort::RunOptions{ nullptr }, &inputName, &inputTensor, 1, &outputName, &outputTensor, 1);

        {
            // Synchronize with CPU before queuing more inference runs
            THROW_IF_FAILED(commandQueue->Signal(fence.Get(), i));
            THROW_HR_IF(E_FAIL, ResetEvent(fenceEvent.get()) == 0);
            THROW_IF_FAILED(fence->SetEventOnCompletion(i, fenceEvent.get()));
            THROW_HR_IF(E_FAIL, WaitForSingleObject(fenceEvent.get(), INFINITE) != WAIT_OBJECT_0);
        }
    }

Part2:

void main()
{
    ComPtr<ID3D12Device1> d3dDevice;
    ComPtr<IDMLDevice> dmlDevice;
    ComPtr<ID3D12CommandQueue> commandQueue;
    InitializeDirectML(d3dDevice.GetAddressOf(), commandQueue.GetAddressOf(), dmlDevice.GetAddressOf());

    // Add the DML execution provider to ORT using the DML Device and D3D12 Command Queue created above.
    if (!dmlDevice)
    {
        printf("No NPU device found\n");
        return;
    }

    ////////////////////////////////////////
    // Get API, and setup environment.
    OrtApi const& ortApi = Ort::GetApi(); // Uses ORT_API_VERSION
    const OrtDmlApi* ortDmlApi;
    ortApi.GetExecutionProviderApi("DML", ORT_API_VERSION, reinterpret_cast<const void**>(&ortDmlApi));
    Ort::Env environment(ORT_LOGGING_LEVEL_WARNING, "DirectML_Direct3D_TensorAllocation_Test"); // Note ORT_LOGGING_LEVEL_VERBOSE is useful too.

    ////////////////////////////////////////
    // Set model-specific session options.
    Ort::SessionOptions sessionOptions;
    sessionOptions.SetExecutionMode(ExecutionMode::ORT_SEQUENTIAL); // For DML EP
    sessionOptions.DisableMemPattern(); // For DML EP
    ortApi.AddFreeDimensionOverrideByName(sessionOptions, "batch_size", 1);
    ortDmlApi->SessionOptionsAppendExecutionProvider_DML1(sessionOptions, dmlDevice.Get(), commandQueue.Get());

    Ort::Session session(environment, L"mobilenetv2-7-fp16.onnx", sessionOptions);

    std::vector<Ort::Value> inputTensors;
    std::vector<int64_t> inputShape = { 1, 3, 224, 224 };
    std::vector<float> my_data(3 * 224 * 224, 0.0f);
    std::vector<Ort::Float16_t> inputTensorValues;
    for (auto n : my_data)
        inputTensorValues.push_back(Ort::Float16_t(n));

    Ort::MemoryInfo memoryInfo = Ort::MemoryInfo::CreateCpu(OrtAllocatorType::OrtArenaAllocator, OrtMemType::OrtMemTypeDefault);
    inputTensors.push_back(Ort::Value::CreateTensor<Ort::Float16_t>(memoryInfo, inputTensorValues.data(), inputTensorValues.size(), inputShape.data(), inputShape.size()));

    std::vector<char const*> inputNames = { "input" };
    std::vector<char const*> outputNames = { "output" };

    ////////////////////////////////////////
    // Execute the model with the given inputs and named outputs.
    for (int i = 0; i < 1000000; i++) {
        std::vector<Ort::Value> outputs = session.Run(Ort::RunOptions{}, inputNames.data(), inputTensors.data(), inputTensors.size(), outputNames.data(), outputNames.size());
        outputs.clear();
    }
}

Urgency

Urgent.

Platform

Windows

OS Version

11 23H2

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.17.0

ONNX Runtime API

C++

Architecture

X64

Execution Provider

DirectML

Execution Provider Library Version

DirectML 1.15.0

Other info

CPU: Intel Ultra 7 155U
NPU driver: 32.0.100.2540
GPU driver(intel): 32.0.15.5585

The text was updated successfully, but these errors were encountered:

WTian-Yu · 2024-08-20T03:33:03Z

Hi, for part 1 leak, a quick workaround I found was adding ReleaseCompletedReferences() in CommandQueue::QueueReference in CommandQueue.cpp, and rebuild the onnxruntime.dll.
Here's the detail code.

void CommandQueue::QueueReference(IUnknown* object, bool waitForUnsubmittedWork)
{
    // If the CommandQueue is closing, then m_queuedReferences is being cleared -- it is not OK
    // to queue additional references at this time, since those references would be leaked. This
    // affects any objects in m_queuedReferences whose destructors indirectly call QueueReference;
    // for example, an allocation from BucketizedBufferAllocator attempts to queue a reference
    // to its underlying D3D resource when freed. Furthermore, these references are unnecessary
    // since Close() already blocks for scheduled GPU work before clearing m_queuedReferences.
    if (!m_closing)
    {
        QueuedReference queuedReference = {GetLastFenceValue(), object};

        // If something has been recorded into a command list but not submitted yet, it means that the *next* fence
        // value is the one to signal completion.
        if (waitForUnsubmittedWork)
        {
            ++queuedReference.fenceValue;
        }

        m_queuedReferences.push_back(queuedReference);

        ReleaseCompletedReferences(); // new line here
    }
}

github-actions bot added the ep:DML issues related to the DirectML execution provider label Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak in NPU inference after each one session.run #21587

Memory leak in NPU inference after each one session.run #21587

WTian-Yu commented Aug 1, 2024 •

edited

Loading

WTian-Yu commented Aug 20, 2024

Memory leak in NPU inference after each one session.run #21587

Memory leak in NPU inference after each one session.run #21587

Comments

WTian-Yu commented Aug 1, 2024 • edited Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Other info

WTian-Yu commented Aug 20, 2024

WTian-Yu commented Aug 1, 2024 •

edited

Loading