Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
CANN Backend support
Introduction
CANN
(Compute Architecture of Neural Networks), developed by Huawei, is a heterogeneous computing architecture for AI scenarios.It provides multi-layer programming interfaces to help users quickly build AI applications and services based on the Ascend platform.
CANN backend in
CTranslate2
, enables running AI models on the Ascend NPU extending the existingCPU
&CUDA
workflows. One can find more on Ascend NPU and CANN library here.Examples of projects that support CANN include ONNX Runtime & OpenCV.
resolves #1609
Notes
c
output shape incorrectly when inputa
is transpose? #1583 .Implementation
CANN
backend support implementation introducesDevice::CANN
similarly toCPU
&CUDA
.CANN
workflow can be enabled using-DWITH_CANN=ON
incmake
configuration (seeexamples/cann
). As toCUDA
,CANN
can coexist alongsideCPU
workflow.CANN
workflow is accessible through examples (examples/cann/main.cc
),cli
orPython
module.Operators & primitives were implemented for
CANN
in order for the end-to-end example in ctranslate2 documentation to run successfully.Tests
Tests were extended for
Device::CANN
& respectiveDataType
. Additional tests were also implemented involving extra/edge cases. Gtest output: gtest_cann.logEnvironment Setup
AArch64.run
category (current implementation involvedCANN 7.0.RC1.alpha001
).docker/cann
.For details about how to set up the development environment and operating environment, see Development and Operating Environment Setup
and CANN Software Installation Guide.
Build CANN Python module
CANN Python module is expected to be built using the respective
Docker
files. Nevertheless, here we provide a quick way for building, ideal for testing and benchmarking.Build CANN C++ example
Samples
Python
C++
Execution example in
C++
can be found inexamples/cann
.CLI
echo "▁H ello ▁world !" | ./ct2-translator --model "./ende_ctranslate2/"
Benchmark
We conducted several runs measuring translation latency using all 192 CPU cores and 1 NPU device for a single batch.
In specific, experiments demonstrate results for 4 consecutive runs involving 4 and 306 tokens respectively. NPU proved
faster in all cases.
Input tokens
4 tokens
306 tokens
Hardware
Experiments