You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
does run_async really offer a non-blocking interface to integrated accelerators?
I'm trying to create a number of outstanding requests to a single accelerator. My intent is to hide the submission latency by pipelining the submissions in this way.
I cannot get it to work however. Whether I run the commands with the blocking session.run or with a non-blocking session.run_async, the submission rate to the accelerator looks the same.
Kevin
To reproduce
importonnxruntimeasortimportnumpyasnpimportosimportthreadingimporttime_OUTSTANDING_REQ_=4model_path='mobilenetv2_035_96.onnx'# Create session optionssession_options=ort.SessionOptions()
session=ort.InferenceSession(model_path, providers=[('OpenVINOExecutionProvider', {'device_type': 'CPU'})], sess_options=session_options)
print("Providers:", session.get_providers())
# Get model input informationinput_name=session.get_inputs()[0].nameinput_shape=session.get_inputs()[0].shapeinput_type=session.get_inputs()[0].type# Prepare input data - Replace this with real input data matching the model's input shape and typedummy_input=np.random.randn(*input_shape).astype(np.float32)
_SEC_OFFSET_=86400classrun_async_inf:
def__init__(self):
self.__event=threading.Event()
self.__outputs=Noneself.__err=''deffill_outputs(self, outputs, err):
self.__outputs=outputsself.__err=errself.__event.set()
defget_outputs(self):
ifself.__err!='':
raiseException(self.__err)
returnself.__outputs;
defwait(self, sec):
self.__event.wait(sec)
self.__event.clear()
defreset(self):
self.__event=threading.Event()
self.__outputs=Noneself.__err=''def_callback_(outputs: np.ndarray, state: run_async_inf, err: str) ->None:
state.fill_outputs(outputs, err)
infer_requests= [run_async_inf() for_inrange(_OUTSTANDING_REQ_)]
# Run inferencestart_t_s=time.time() %_SEC_OFFSET_print("> starting asyncronous submissions")
foridx, _infer_request_inenumerate(infer_requests):
print("spawning request asynchronously.....", idx)
session.run_async(None, {input_name: dummy_input}, _callback_, _infer_request_)
forxinrange(0,int(40000/_OUTSTANDING_REQ_)):
foridx, _infer_request_inenumerate(infer_requests):
_infer_request_.wait(10)
_infer_request_.reset()
session.run_async(None, {input_name: dummy_input}, _callback_, _infer_request_)
end_t_s=time.time() %_SEC_OFFSET_duration_in_sec=end_t_s-start_t_sduration_in_sec=duration_in_sec-duration_in_sec%1print("> duration (sec) =", duration_in_sec )
exit(0)
Urgency
No response
Platform
Windows
OS Version
Windows11
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.19.0
ONNX Runtime API
Python
Architecture
X64
Execution Provider
OpenVINO
Execution Provider Library Version
1.18.0
The text was updated successfully, but these errors were encountered:
Describe the issue
Hi folks -
does run_async really offer a non-blocking interface to integrated accelerators?
I'm trying to create a number of outstanding requests to a single accelerator. My intent is to hide the submission latency by pipelining the submissions in this way.
I cannot get it to work however. Whether I run the commands with the blocking session.run or with a non-blocking session.run_async, the submission rate to the accelerator looks the same.
Kevin
To reproduce
Urgency
No response
Platform
Windows
OS Version
Windows11
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.19.0
ONNX Runtime API
Python
Architecture
X64
Execution Provider
OpenVINO
Execution Provider Library Version
1.18.0
The text was updated successfully, but these errors were encountered: