-
Notifications
You must be signed in to change notification settings - Fork 116
Error when build on arm64 with neon support? #11
Comments
@Vincent-Echo We successfully include an arm64/Neon target when we use this card.io-dmz repo to build the card.io-source repo, in order to create the card.io SDK. Are you building for iOS or for another OS? What error messages are you getting? |
I build it for ios. In file "processor_support.h", the value of DMZ_HAS_NEON_COMPILETIME will be 0 when build it on arm64 for ios. This will make dmz no support neon. So i change the value to 1. I get error messages like register 'r0' is not exists in file conv.cpp. |
@Vincent-Echo my apologies -- you're quite right that our arm64 build sets When we first updated our code to build for arm64 devices, the resulting library performed faster than the existing 32-bit versions of card.io. Therefore, apparently, we didn't even notice that our NEON support was being removed at compile time! I know that the NEON instruction set did change with the move to the arm64 architecture, so it's not too surprising that our NEON code would need updating as well. HOWEVER (and I suspect that this is probably the main reason that we did not notice any drop in performance with our 64-bit builds) the Clang compiler has gotten much smarter than it used to be regarding code vectorization. I strongly suspect that Clang is now automatically generating appropriate vector-processor instructions on its own. That wasn't the case a few years ago, when we needed to explicitly use NEON intrinsics in our code to ensure that time-critical sections would be executed on the vector processor. If you'd like to try to update our NEON code so that it builds successfully for both 32- and 64-bit architectures, that would be great - we'd love to review a Pull Request with such changes. But my guess is that Clang has now evolved to a point where this won't actually affect the performance of card.io. |
I very much doubt that Clang's codegen has improved to the point that it will outpace our hand-tuned implementations. Automatic vectorization is very hard, and many of our uses are not the sort of thing that are obviously vectorizable. (The convolution code in particular is not obviously vectorizable, and it was the single slowest operation last time I checked.) I'd love to be wrong about this, of course. A PR with ARM64 NEON implementations--particularly of the 7x7 sobel convolutions--would be awesome. Yes, processors are now fast enough that it isn't the limiting factor, but it'd still save users' battery life and enable us to do more expensive per-frame things later, say during expiry/name scanning. We might also want to check how Eigen's ARM64 NEON support is coming along. That might also have a big impact on card.io performance. |
I must defer to @josharian's much greater experience in this area! I did just try the experiment of adding However, this experiment introduced me to the tl;dr: yes, a Pull Request that enables our NEON code for arm64 would be of great interest. 😺 |
I did update our Eigen to the latest version, 3.2.4, a couple of months ago. Their website claims ETA: Hmm. Well, reviewing their various notes and statements, I haven't yet found an explicit statement re arm64 NEON. |
@dgoldman-ebay to measure the perf impact of a change, things to try include:
As for Eigen, ARM64 NEON is different than ARM NEON, so if they don't mention it, they probably don't support it. Another great opportunity for a motivated hacker to make some contributions. :) |
Thanks, @josharian!
Actually, subsequent googling for |
I want to build those dmz code on arm64 platform with DMZ_HAS_NEON_COMPILETIME = 1, but it failed. Can dmz support arm64 with neon or not, do you have any suggestion if i want to build it on arm64.
The text was updated successfully, but these errors were encountered: