Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in NPU inference after each one session.run #21587

Open
WTian-Yu opened this issue Aug 1, 2024 · 1 comment
Open

Memory leak in NPU inference after each one session.run #21587

WTian-Yu opened this issue Aug 1, 2024 · 1 comment
Labels
ep:DML issues related to the DirectML execution provider

Comments

@WTian-Yu
Copy link

WTian-Yu commented Aug 1, 2024

Describe the issue

Hi all, I'm running Microsoft DirectML NPU inference sample code. which is OnnxRuntime + DML as execution provider.
It seems that memory keep in increasing slowly in the performance test for loop. (Check To reproduce part1)

If I change the input buffer from DML to CPU memory, the memory increase dramatically (Check To reproduce part2).

Did I miss anything? Thanks.

To reproduce

Every time.

Part1:

    constexpr int fenceValueStart = 2;
    constexpr int numIterations = 100000;
    for (int i = fenceValueStart; i < (numIterations + fenceValueStart); i++)
    {
        session.Run(Ort::RunOptions{ nullptr }, &inputName, &inputTensor, 1, &outputName, &outputTensor, 1);

        {
            // Synchronize with CPU before queuing more inference runs
            THROW_IF_FAILED(commandQueue->Signal(fence.Get(), i));
            THROW_HR_IF(E_FAIL, ResetEvent(fenceEvent.get()) == 0);
            THROW_IF_FAILED(fence->SetEventOnCompletion(i, fenceEvent.get()));
            THROW_HR_IF(E_FAIL, WaitForSingleObject(fenceEvent.get(), INFINITE) != WAIT_OBJECT_0);
        }
    }

Part2:

void main()
{
    ComPtr<ID3D12Device1> d3dDevice;
    ComPtr<IDMLDevice> dmlDevice;
    ComPtr<ID3D12CommandQueue> commandQueue;
    InitializeDirectML(d3dDevice.GetAddressOf(), commandQueue.GetAddressOf(), dmlDevice.GetAddressOf());

    // Add the DML execution provider to ORT using the DML Device and D3D12 Command Queue created above.
    if (!dmlDevice)
    {
        printf("No NPU device found\n");
        return;
    }

    ////////////////////////////////////////
    // Get API, and setup environment.
    OrtApi const& ortApi = Ort::GetApi(); // Uses ORT_API_VERSION
    const OrtDmlApi* ortDmlApi;
    ortApi.GetExecutionProviderApi("DML", ORT_API_VERSION, reinterpret_cast<const void**>(&ortDmlApi));
    Ort::Env environment(ORT_LOGGING_LEVEL_WARNING, "DirectML_Direct3D_TensorAllocation_Test"); // Note ORT_LOGGING_LEVEL_VERBOSE is useful too.

    ////////////////////////////////////////
    // Set model-specific session options.
    Ort::SessionOptions sessionOptions;
    sessionOptions.SetExecutionMode(ExecutionMode::ORT_SEQUENTIAL); // For DML EP
    sessionOptions.DisableMemPattern(); // For DML EP
    ortApi.AddFreeDimensionOverrideByName(sessionOptions, "batch_size", 1);
    ortDmlApi->SessionOptionsAppendExecutionProvider_DML1(sessionOptions, dmlDevice.Get(), commandQueue.Get());

    Ort::Session session(environment, L"mobilenetv2-7-fp16.onnx", sessionOptions);

    std::vector<Ort::Value> inputTensors;
    std::vector<int64_t> inputShape = { 1, 3, 224, 224 };
    std::vector<float> my_data(3 * 224 * 224, 0.0f);
    std::vector<Ort::Float16_t> inputTensorValues;
    for (auto n : my_data)
        inputTensorValues.push_back(Ort::Float16_t(n));

    Ort::MemoryInfo memoryInfo = Ort::MemoryInfo::CreateCpu(OrtAllocatorType::OrtArenaAllocator, OrtMemType::OrtMemTypeDefault);
    inputTensors.push_back(Ort::Value::CreateTensor<Ort::Float16_t>(memoryInfo, inputTensorValues.data(), inputTensorValues.size(), inputShape.data(), inputShape.size()));

    std::vector<char const*> inputNames = { "input" };
    std::vector<char const*> outputNames = { "output" };

    ////////////////////////////////////////
    // Execute the model with the given inputs and named outputs.
    for (int i = 0; i < 1000000; i++) {
        std::vector<Ort::Value> outputs = session.Run(Ort::RunOptions{}, inputNames.data(), inputTensors.data(), inputTensors.size(), outputNames.data(), outputNames.size());
        outputs.clear();
    }
}

Screenshot 2024-08-01 165404

Urgency

Urgent.

Platform

Windows

OS Version

11 23H2

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.17.0

ONNX Runtime API

C++

Architecture

X64

Execution Provider

DirectML

Execution Provider Library Version

DirectML 1.15.0

Other info

CPU: Intel Ultra 7 155U
NPU driver: 32.0.100.2540
GPU driver(intel): 32.0.15.5585

@github-actions github-actions bot added the ep:DML issues related to the DirectML execution provider label Aug 1, 2024
@WTian-Yu
Copy link
Author

Hi, for part 1 leak, a quick workaround I found was adding ReleaseCompletedReferences() in CommandQueue::QueueReference in CommandQueue.cpp, and rebuild the onnxruntime.dll.
Here's the detail code.

void CommandQueue::QueueReference(IUnknown* object, bool waitForUnsubmittedWork)
{
    // If the CommandQueue is closing, then m_queuedReferences is being cleared -- it is not OK
    // to queue additional references at this time, since those references would be leaked. This
    // affects any objects in m_queuedReferences whose destructors indirectly call QueueReference;
    // for example, an allocation from BucketizedBufferAllocator attempts to queue a reference
    // to its underlying D3D resource when freed. Furthermore, these references are unnecessary
    // since Close() already blocks for scheduled GPU work before clearing m_queuedReferences.
    if (!m_closing)
    {
        QueuedReference queuedReference = {GetLastFenceValue(), object};

        // If something has been recorded into a command list but not submitted yet, it means that the *next* fence
        // value is the one to signal completion.
        if (waitForUnsubmittedWork)
        {
            ++queuedReference.fenceValue;
        }

        m_queuedReferences.push_back(queuedReference);

        ReleaseCompletedReferences(); // new line here
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:DML issues related to the DirectML execution provider
Projects
None yet
Development

No branches or pull requests

1 participant