diff --git a/README.md b/README.md index 2a9dfb7..ed2333d 100644 --- a/README.md +++ b/README.md @@ -2,10 +2,9 @@ High Performance Combinatorics in C++ using vector instructions v1.0.1 HPCombi is a C++17 header-only library using the SSE and AVX instruction sets, -and some equivalents, for very fast manipulation of small combinatorial objects such -as transformations, permutations, and boolean matrices. The goal -of this project is to implement various new algorithms and benchmark them on -various compiler and architectures. +and some equivalents, for very fast manipulation of small combinatorial objects +such as transformations, permutations, and boolean matrices. HPCombi implements +new algorithms and benchmarks them on various compilers and architectures. HPCombi was initially designed using the SSE and AVX instruction sets, and did not work on machines without these instructions (such as ARM). From v1.0.1 diff --git a/include/hpcombi/epu8.hpp b/include/hpcombi/epu8.hpp index 8c47b5e..c5f9b3b 100644 --- a/include/hpcombi/epu8.hpp +++ b/include/hpcombi/epu8.hpp @@ -52,9 +52,9 @@ epu8 stands for *Extended Packed Unsigned, grouped by 8 bits*; this is the low level type chosen by Intel for their API to intrinsics, ie a SIMD vector of 16 unsigned bytes (16×8 = 128bits). Functions using this type use semantically equivalent types, -eg a _m128 which is 2 vect of 64bits. -a flag tells the compiler to silently consider those types equivalent. - */ +eg a _m128 which is a vector containing 2 signed 64 bits integers. +A flag tells the compiler to silently consider those types equivalent. +*/ using epu8 = uint8_t __attribute__((vector_size(16))); static_assert(alignof(epu8) == 16, diff --git a/include/hpcombi/hpcombi.hpp b/include/hpcombi/hpcombi.hpp index c4cacad..33c2b1b 100644 --- a/include/hpcombi/hpcombi.hpp +++ b/include/hpcombi/hpcombi.hpp @@ -53,9 +53,9 @@ applying a permutation on a vector only takes a few CPU cycles. Further ideas are: - Vectorization (MMX, SSE, AVX instructions sets) and careful memory alignment, -- Careful memory management: avoiding all dynamic allocation during the computation, -- Avoid all unnecessary copies (often needed to rewrite the containers), -- Due to combinatorial explosion, sets often don’t fit in the computer’s memory or disks and are enumerated on the fly. +- Careful memory management: avoid all dynamic allocation during the computation, +- Avoid all unnecessary copies (it is often needed to rewrite the containers), +- Due to combinatorial explosion, sets often don’t fit in memory or disk and are enumerated on the fly. Here are some examples, the speedup is in comparison to an implementation without vector instructions: