You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#108 added a tensor pool that enables re-use of output buffers for different steps of graph execution. The entire pool is currently freed at the end of the run. For recurrent / autoregressive models where the caller invokes Model::run in a loop, buffer reuse could be further improved by persisting the pool across runs.
Possible APIs:
Add an optional pool parameter to Model::run which allows the user to specify a pool.
Make the pool a field of the Model or Graph. This would require some changes to the pool to enable it to be used from multiple threads concurrently.
The text was updated successfully, but these errors were encountered:
As an extension of this, it would be useful to be able to pass owned tensors as inputs to graph execution, rather than views, so that their buffers can be added to the pool and used to fulfill allocation requests. An example of when this matters are KV-cache outputs that are returned from transformer decoder models. These caches are then fed as inputs into the next graph execution. Currently new KV-cache buffers will get allocated on each run, but it would be more efficient if they could just be recycled.
This was done for sharing between the main graph and subgraphs in #312. This is simpler because the interpreter loop for a subgraph runs on the same thread as the loop for the parent graphs, so doesn't require making TensorPool usable across threads.
#108 added a tensor pool that enables re-use of output buffers for different steps of graph execution. The entire pool is currently freed at the end of the run. For recurrent / autoregressive models where the caller invokes
Model::run
in a loop, buffer reuse could be further improved by persisting the pool across runs.Possible APIs:
pool
parameter toModel::run
which allows the user to specify a pool.Model
orGraph
. This would require some changes to the pool to enable it to be used from multiple threads concurrently.The text was updated successfully, but these errors were encountered: