Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize popcount implementation #348

Merged
merged 1 commit into from
Oct 18, 2023

Commits on Oct 17, 2023

  1. optimize popcount implementation

    In the current implementation, the gcc backend of rustc currently emits the
    following for a function that implements popcount for a u32 (x86_64 targeting
    AVX2, using standard unix calling convention):
    
        popcount:
            mov     eax, edi
            and     edi, 1431655765
            shr     eax
            and     eax, 1431655765
            add     edi, eax
            mov     edx, edi
            and     edi, 858993459
            shr     edx, 2
            and     edx, 858993459
            add     edx, edi
            mov     eax, edx
            and     edx, 252645135
            shr     eax, 4
            and     eax, 252645135
            add     eax, edx
            mov     edx, eax
            and     eax, 16711935
            shr     edx, 8
            and     edx, 16711935
            add     edx, eax
            movzx   eax, dx
            shr     edx, 16
            add     eax, edx
            ret
    
    Rather than using this implementation, gcc could be told to use Wenger's
    algorithm.  This would give the same function the following implementation:
    
        popcount:
            xor eax, eax
            xor edx, edx
            popcnt eax, edi
            test edi, edi
            cmove eax, edx
            ret
    
    This patch implements the popcount operation in terms of Wenger's algorithm in
    all cases.
    
    Signed-off-by: Andy Sadler <[email protected]>
    sadlerap committed Oct 17, 2023
    Configuration menu
    Copy the full SHA
    64abf58 View commit details
    Browse the repository at this point in the history