Skip to content
Victor Derks edited this page Sep 16, 2024 · 1 revision

Introduction

This section of the wiki contains all the notes and design decisions done for optimal performance.

General

CharLS uses exceptions to report errors. In principle exceptions are zero cost as long as they don't throw. (see 1).

Decoding

Step 9: Decode the mapped error value Merrval

To read the unary code from the limited Golomb function back and convert it to the mapped error value (MErrval) there are 2 options:

  1. Follow the step as documented in F.1, step 9.

  2. Read a byte from the bit stream and use it lookup the mapped error value and the bit count.

The required lookup tables can be computed once at startup or at compile time. Calculating the tables at runtime cost 5 μs (Ryzen 9 5950x). The tables can also be computed at compile time (constexpr). This will add 256 x 16 x 8 = 32769 bytes to the .rdata section (instead of the .data section). As the runtime cost is minimal and only once it more benifical to not do it at compile time to reduce the size of the binary. Loading 32K would also take some time.

Decoding test image mg1.jls with method 2 enabled takes 200 ms, with option 2 disables it takes 243 ms, which is a significant speedup.

Compiler settings

As CharLS is a C++ library it can target different CPU architectures.

x86

x86-64

4 microarchitecture levels are defined for this CPU architecture:

  • x86-64-v1 (baseline): SSE2
  • x86-64-v2: POPCNT, SSE3, SSE4_2 (Windows 11 24H2 requires POPCNT and SSE)
  • x86-64-v3: AVX, AVX2
  • x86-64-v4: AVX512

MSVC

The options for CPU architecture can be set by the /arch switch. This switch can be used to select the family (x86, x64, ARM64) and sub-family settings (AVX/AVX2/AVX-512) The default option is SSE2 (x86-64-v1) Another switch is /favor with the options blend/AMD64/INTEL64. The default is blend. Recommendation from AMD for Zen2 CPUs: /GL /arch:AVX2 /MT /fp:fast /favor:blend /GL = (Whole program optimization)

MSVC undocumented options

compiler front-end options: /d1 compiler back-end options: /d2

  • /d1PP: preserves macro definitions when preprocessing with /P

  • /d1reportAllClassLayout

  • /d1reportSingleClassLayoutXXX

  • /d2cgsummary : detailed breakdown of the time spent in the whole-program-optimization phase of the compiler.

  • /d2archSSE42 : Enable SSE 4.2

  • /d2vzeroupper : enables compiler-generated vzeroupper instruction (enabled by default, /d2vzeroupper- to turn off)

GCC

  • -march option

  • -Ofast

  • set(CMAKE_EXPORT_COMPILE_COMMANDS ON) allows to see which compiler settings are used

References

  1. When a Microsecond Is an Eternity: High Performance Trading Systems in C++, CppCon 2017 presentation.
  2. Optimizing software in C++ An optimization guide for Windows, Linux, and Mac platforms, Agner Fog.
  3. https://vaibhavsagar.com/blog/2019/09/08/popcount/
  4. https://pureinfotech.com/windows-11-24h2-system-requirements/