[WebNN EP] Cache MLTensors between runs (#22278)

### Description This change enables caching `MLTensor`s between inferences runs. This is done by keeping a reference to `MLTensor`s alive after they have been released. `MLTensor`s are only destroyed once the sessions goes out of scope. ### Motivation and Context Creating and destroying `MTensor`s on every run has a non-trivial performance penalty. This performance penalty materializes when using `ort.Tensors`[location=cpu] for inputs/outputs or when using the CPU EP as a fallback EP for unsupported operators. The former could be mitigated by developer using `ort.Tensors`[location=ml-tensor]. The latter cannot be mitigated by developers.
microsoft · Oct 18, 2024 · 1e5bda8 · 1e5bda8
1 parent b4cb937
commit 1e5bda8
Show file tree

Hide file tree

Showing 2 changed files with 166 additions and 128 deletions.
diff --git a/js/web/lib/wasm/jsep/backend-webnn.ts b/js/web/lib/wasm/jsep/backend-webnn.ts
@@ -91,12 +91,12 @@ export class WebNNBackend {
       // Current session is not a WebNN session.
       return;
     }
+    this.tensorManager.releaseTensorsForSession(sessionId);
     this.mlContextBySessionId.delete(sessionId);
     const sessionIds = this.sessionIdsByMLContext.get(mlContext)!;
     sessionIds.delete(sessionId);
     if (sessionIds.size === 0) {
       this.sessionIdsByMLContext.delete(mlContext);
-      this.tensorManager.releaseTensorsForContext(mlContext);
     }
   }