-
Notifications
You must be signed in to change notification settings - Fork 3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Introduce custom external data loader (#21634)
### Description This PR introduces support for custom external data loader. An EP can register a custom external data loader to override the default behavior, making it possible to upload initializers directly to GPU. ### Motivation and Context - In ONNX Runtime Web, WebAssembly uses 32-bit as pointer type (`sizeof(size_t)==4`), which means there is a 4GB hard limit on the maximum memory. As the ONNX models get larger, this becomes a blocker for supporting medium-sized language models. - ORT runs out of memory because the current code always loads data into CPU memory, including the .onnx file (protobuf) and external data file(s). However, if using GPU EP, the big data does not need to be kept on CPU because the only thing that ORT does is to load the data into memory, upload to GPU and then release them. - Some platforms has offered developers way to upload data directly to GPU. For example, webgpu allows uploading from any ArrayBuffer (it can be a side buffer, not count into the 4GB) to GPU directly. This helps to keep the CPU memory usage significantly. ### Design Class `ExternalDataLoader` and `ExternalDataLoaderManager` are introduced. They are similar to `DataTransfer` and `DataTransferManager`. `InferenceSession` owns the manager object, and `SessionState` keeps a reference to it. Added a new method `GetExternalDataLoader` in `IExecutionProvider`. An EP can override the method to register an instance of custom external data loader. The key function in a `ExternalDataLoader` class is method `LoadTensor`: ```c++ // the tensor is pre-created using the TensorProto info of the initializer and the MemoryInfo (from allocation plan). virtual common::Status LoadTensor(const Env& env, const std::filesystem::path& data_file_path, FileOffsetType data_offset, SafeInt<size_t> data_length, Tensor& tensor) const; ``` This function can be registered by EP, going through a few layers and eventually get into `DeserializeTensorProto()` in the finalizing stage of session initialization. In this step, initializer tensors are created. Behavior is changed to first look up for a registered external data loader that can handle the current memory info. If any instance is available, use the loader; otherwise respect the old code path.
- Loading branch information
Showing
25 changed files
with
448 additions
and
102 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
// Copyright (c) Microsoft Corporation. All rights reserved. | ||
// Licensed under the MIT License. | ||
|
||
#include "core/framework/external_data_loader.h" | ||
#ifndef SHARED_PROVIDER | ||
#include "core/framework/tensor.h" | ||
#endif | ||
#if defined(__wasm__) | ||
#include <emscripten.h> | ||
#endif | ||
|
||
namespace onnxruntime { | ||
|
||
common::Status IExternalDataLoader::LoadTensor([[maybe_unused]] const Env& env, | ||
[[maybe_unused]] const std::filesystem::path& data_file_path, | ||
[[maybe_unused]] FileOffsetType data_offset, | ||
[[maybe_unused]] SafeInt<size_t> data_length, | ||
[[maybe_unused]] Tensor& tensor) const { | ||
ORT_NOT_IMPLEMENTED(__FUNCTION__, " is not implemented"); | ||
} | ||
|
||
#if defined(__wasm__) | ||
|
||
common::Status LoadWebAssemblyExternalData(const Env& env, | ||
const std::filesystem::path& data_file_path, | ||
FileOffsetType data_offset, | ||
SafeInt<size_t> data_length, | ||
ExternalDataLoadType load_type, | ||
void* tensor_data) { | ||
auto err_code = EM_ASM_INT(({ | ||
// If available, "Module.MountedFiles" is a Map for all preloaded files. | ||
if (typeof Module == 'undefined' || !Module.MountedFiles) { | ||
return 1; // "Module.MountedFiles" is not available. | ||
} | ||
let fileName = UTF8ToString($0 >>> 0); | ||
if (fileName.startsWith('./')) { | ||
fileName = fileName.substring(2); | ||
} | ||
const fileData = Module.MountedFiles.get(fileName); | ||
if (!fileData) { | ||
return 2; // File not found in preloaded files. | ||
} | ||
const offset = $1 >>> 0; | ||
const length = $2 >>> 0; | ||
const dataIdOrBuffer = $3 >>> 0; | ||
const loadType = $4; | ||
|
||
if (offset + length > fileData.byteLength) { | ||
return 3; // Out of bounds. | ||
} | ||
|
||
try { | ||
const data = fileData.subarray(offset, offset + length); | ||
switch (loadType) { | ||
case 0: | ||
// Load external data to CPU memory. | ||
// Copy the file data (fileData,offset,length) into WebAssembly memory | ||
// (HEAPU8,buffer,length). | ||
HEAPU8.set(data, dataIdOrBuffer); | ||
break; | ||
case 1: | ||
// Load external data to GPU. | ||
Module.jsepUploadExternalBuffer(dataIdOrBuffer, data); | ||
break; | ||
default: | ||
return 4; // Unknown error occurred in memory copy. | ||
} | ||
return 0; | ||
} catch { | ||
return 4; | ||
} | ||
}), | ||
data_file_path.c_str(), | ||
static_cast<int32_t>(data_offset), | ||
static_cast<int32_t>(data_length), | ||
tensor_data, | ||
static_cast<int32_t>(load_type)); | ||
const char* err_msg; | ||
switch (err_code) { | ||
case 0: | ||
return Status::OK(); | ||
case 1: | ||
err_msg = "Module.MountedFiles is not available."; | ||
break; | ||
case 2: | ||
err_msg = "File not found in preloaded files."; | ||
break; | ||
case 3: | ||
err_msg = "Out of bounds."; | ||
break; | ||
default: | ||
err_msg = "Unknown error occurred in memory copy."; | ||
} | ||
return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Failed to load external data file \"", data_file_path, | ||
"\", error: ", err_msg); | ||
} | ||
|
||
#endif | ||
|
||
} // namespace onnxruntime |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
// Copyright (c) Microsoft Corporation. All rights reserved. | ||
// Licensed under the MIT License. | ||
|
||
#pragma once | ||
|
||
#include <functional> | ||
#include <vector> | ||
#include <filesystem> | ||
|
||
#include "core/common/common.h" | ||
#include "core/common/safeint.h" | ||
#include "core/platform/env.h" | ||
|
||
struct OrtMemoryInfo; | ||
|
||
namespace onnxruntime { | ||
#ifndef SHARED_PROVIDER | ||
class Tensor; | ||
#endif | ||
class Stream; | ||
|
||
namespace common { | ||
class Status; | ||
} | ||
|
||
// Data transfer interface. | ||
class IExternalDataLoader { | ||
public: | ||
virtual ~IExternalDataLoader() = default; | ||
|
||
virtual bool CanLoad(const OrtMemoryInfo& target_memory_info) const = 0; | ||
|
||
// Tensor should be already allocated with the correct memory info and size. | ||
virtual common::Status LoadTensor(const Env& env, | ||
const std::filesystem::path& data_file_path, | ||
FileOffsetType data_offset, | ||
SafeInt<size_t> data_length, | ||
Tensor& tensor) const; | ||
}; | ||
|
||
#if defined(__wasm__) | ||
|
||
enum class ExternalDataLoadType { | ||
CPU = 0, | ||
#if defined(USE_JSEP) | ||
WEBGPU_BUFFER = 1, | ||
#endif | ||
}; | ||
|
||
// Entry point for loading external data implementation using inline JavaScript. | ||
common::Status LoadWebAssemblyExternalData(const Env& env, | ||
const std::filesystem::path& data_file_path, | ||
FileOffsetType data_offset, | ||
SafeInt<size_t> data_length, | ||
ExternalDataLoadType load_type, | ||
void* tensor_data); | ||
|
||
#endif | ||
|
||
} // namespace onnxruntime |
29 changes: 29 additions & 0 deletions
29
onnxruntime/core/framework/external_data_loader_manager.cc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
// Copyright (c) Microsoft Corporation. All rights reserved. | ||
// Licensed under the MIT License. | ||
|
||
#include "core/framework/external_data_loader_manager.h" | ||
#include "core/framework/tensor.h" | ||
|
||
namespace onnxruntime { | ||
using namespace common; | ||
|
||
Status ExternalDataLoaderManager::RegisterExternalDataLoader(std::unique_ptr<IExternalDataLoader> external_data_loader) { | ||
if (nullptr == external_data_loader) { | ||
return Status(ONNXRUNTIME, INVALID_ARGUMENT, "external_data_loader registered is nullptr."); | ||
} | ||
external_data_loaders_.push_back(std::move(external_data_loader)); | ||
return Status::OK(); | ||
} | ||
|
||
const IExternalDataLoader* ExternalDataLoaderManager::GetExternalDataLoader(const OrtMemoryInfo& target_memory_info) const { | ||
for (auto& external_data_loader : external_data_loaders_) { | ||
if (!external_data_loader->CanLoad(target_memory_info)) { | ||
continue; | ||
} | ||
|
||
return external_data_loader.get(); | ||
} | ||
return nullptr; | ||
} | ||
|
||
} // namespace onnxruntime |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
// Copyright (c) Microsoft Corporation. All rights reserved. | ||
// Licensed under the MIT License. | ||
|
||
#pragma once | ||
|
||
#include "core/common/status.h" | ||
#include "core/common/common.h" | ||
#include "core/framework/external_data_loader.h" | ||
|
||
namespace onnxruntime { | ||
|
||
// The external data loader manager manages all registered external data loaders to allow custom | ||
// external data loading implemented by execution providers. | ||
class ExternalDataLoaderManager { | ||
public: | ||
ExternalDataLoaderManager() = default; | ||
|
||
common::Status RegisterExternalDataLoader(std::unique_ptr<IExternalDataLoader> external_data_loader); | ||
|
||
const IExternalDataLoader* GetExternalDataLoader(const OrtMemoryInfo& target_memory_info) const; | ||
|
||
private: | ||
ORT_DISALLOW_COPY_ASSIGNMENT_AND_MOVE(ExternalDataLoaderManager); | ||
|
||
// It's assumed that external data loaders in this array have no overlap in terms of copying functionality. | ||
std::vector<std::unique_ptr<IExternalDataLoader>> external_data_loaders_; | ||
}; | ||
} // namespace onnxruntime |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.