Skip to content

Commit

Permalink
Optimize MlasComputeSoftmax with prefetch (microsoft#20393)
Browse files Browse the repository at this point in the history
The prefetching instructions (_mm_prefetch) is used to anticipate memory
accesses by prefetching the next row of the input buffer. This
optimization is designed to reduce the impact of memory latency, thereby
enhancing the performance of the MlasComputeSoftmax function. As a
result, the worst-case performance of the OCR model has improved by
approximately 50ms, which equates to a 3% improvement.
  • Loading branch information
yihonglyu authored Apr 25, 2024
1 parent a077330 commit edffa2a
Showing 1 changed file with 16 additions and 0 deletions.
16 changes: 16 additions & 0 deletions onnxruntime/core/mlas/lib/compute.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -850,8 +850,24 @@ Return Value:
const float* Input = WorkBlock->Input + n * D;
float* Output = WorkBlock->Output + n * D;

#if defined(MLAS_SSE2_INTRINSICS)
// TODO: Use std::hardware_constructive_interference_size
constexpr size_t CacheLineSize = 64;
constexpr size_t ElementsPerCacheLine = CacheLineSize / sizeof(float);
#endif

while (CountN > 0) {

#if defined(MLAS_SSE2_INTRINSICS)
//
// Prefetch the next row of the input buffer.
//

for (size_t i = 0; i * ElementsPerCacheLine < D; i++) {
_mm_prefetch((char*)(Input + D) + i * CacheLineSize, _MM_HINT_T0);
}
#endif

//
// Find the maximum value for the row.
//
Expand Down

0 comments on commit edffa2a

Please sign in to comment.