Skip to content

Publications

Tomasz Kantecki edited this page Sep 15, 2023 · 9 revisions

AES-GCM (PCLMULQDQ)

Advanced Encryption Standard Galois Counter Mode - Optimized GHASH Function Technology Guide Galois/Counter Mode (GCM) is a mode of operation for authenticated encryption. In this mode, data is encrypted via Advanced Encryption Standard (AES) block cipher and an authentication tag is generated by applying a hash function (GHASH) to the entire ciphertext. This paper introduces novel techniques to further improve the performance of GHASH. These techniques present algorithmic improvements, which can be utilized in any setting for GCM implementation.

Intel® AVX-512 - High Performance IPsec with Intel® Xeon® Scalable Processor Technology Guide Compared to the 3rd Gen Intel® Xeon® Scalable processor, the 4th Gen Intel Xeon Scalable processor has quite a few hardware specification improvements relevant to network applications like IPsec. In this technology guide we show that 4th Gen Intel Xeon Scalable processor can achieve significant IPsec performance improvement over its predecessor and reach a record-breaking milestone of nearly 2 Tbps on a single server platform.

3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Technology Guide The document describes how the latest Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions and Intel® Advanced Encryption Standard New Instructions (Intel® AES-NI) enabled in the latest Intel® 3rd Generation Xeon® Scalable Processor are used to significantly increase and achieve 1 Tb of IPsec throughput.

Intel® Carry-Less Multiplication Instruction and its Usage for Computing the GCM Mode This paper provides information on the instruction, and its usage for computing the Galois Hash. It also provides code examples for the usage of PCLMULQDQ, together with the Intel® AES New Instructions for efficient implementation of AES in Galois Counter Mode (AES-GCM).

Enabling High Performance Galois-Counter Mode on Intel® Architecture Processors With the recent introduction of AES-NI instructions (including PCLMULQDQ), highly-optimized implementations of GCM mode of operation were made possible on Intel® Architecture Processors. In this paper, we describe techniques to improve GCM performance further and describe a few versions of optimized code with performance data.

AES-GCM for Efficient Authenticated Encryption – Ending the Reign of HMAC-SHA-1? Workshop on Real-World Cryptography Stanford University Jan. 9-11, 2013

Optimized Galois-Counter Mode on Intel® Architecture Processors

AES-CBCS

Multi-buffer AVX-512 Accelerated Parallelization of CBCS Common Encryption Mode Combining crypto enhancements on the latest Intel® Xeon® Processors with optimized software implementations can dramatically accelerate MPEG DRM encryption. This reduces the processing resource requirements of the “packager” and reduces the total cost of ownership for an over-the-top service.

CRC

Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction This paper presents a fast and efficient method of computing CRC on IA processors with generic polynomials using the carry-less multiplication instruction – PCLMULQDQ.

Multi-buffer

Fast Multi-buffer IPsec Implementations on Intel® Architecture Processors This paper describes the Intel® Multi-Buffer Crypto for IPsec Library, a family of highly-optimized software implementations of the core cryptographic processing for IPsec, which provides industry-leading performance on a range of Intel® Processors.

Processing Multiple Buffers in Parallel to Increase Performance on Intel® Architecture Processors

Function Stitching

Fast Cryptographic Computation on Intel® Architecture Processors Via Function Stitching Cryptographic applications often run more than one independent algorithm such as encryption and authentication. This fact provides a high level of parallelism which can be exploited by software and converted into instruction level parallelism to improve overall performance on modern super-scalar processors. We present fast and efficient methods of computing such pairs of functions on IA processors using a method called “function stitching”.

DES

Software Optimizations for DES This paper describes some software optimizations for the classical Data Encryption Standard (DES) cipher DES applicable for modern processor architectures that have SIMD instructions. Performance is gained by processing several messages in parallel, compared to processing single messages serially. An added value that the proposed optimizations offer is that the resulting implementations are also side channel protected, unlike other implementations that are found in open source libraries. For comparison, when measured on the latest Intel server processor (Architecture Codename Skylake), our side channel safe implementation is 3.2× faster than that of OpenSSL.

RSA

New Instructions Supporting Large Integer Arithmetic on Intel(R) Architecture Processors New instructions mulx, adcx and adox are being introduced on Intel(R) Architecture Processors. The adcx and adox instructions are being introduced one generation later than mulx. These new instructions will enable users to develop high-performance implementations of large integer arithmetic on Intel® Architecture.

Fast and Constant-Time Implementation of Modular Exponentiation Modular exponentiation is an important operation which requires a vast amount of computations. Therefore, it is crucial to build fast exponentiation schemes. Since Cache and data-dependent branching behavior can alter the runtime of an algorithm significantly, it is also important to build an exponentiation scheme with constant run-time. However, such approaches have traditionally added significant overhead to the performance of the exponentiation computations due to costly mitigation steps. We present a novel constant run-time approach that results in the world’s fastest modular exponentiation implementation on IA processors, bringing a 1.6X speedup to the fastest known modular exponentiation implementation in OpenSSL.