Skip to content

Latest commit

 

History

History
208 lines (147 loc) · 10.6 KB

ENCRYPTION.md

File metadata and controls

208 lines (147 loc) · 10.6 KB

Solr Encryption: Getting Started

A Java-level encryption-at-rest solution for Apache Solr.

Overview

This solution provides the encryption of the Lucene index files at the Java level. It encrypts all (or some) the files in a given index with a provided encryption key. It stores the id of the encryption key in the commit metadata (and obviously the key secret is never stored). It is possible to define a different key per Solr Core. This module also provides an EncryptionRequestHandler so that a client can trigger the (re)encryption of a Solr Core index. The (re)encryption is done concurrently while the Solr Core can continue to serve update and query requests.

In addition, the Solr update logs are also encrypted when the Solr Core index is encrypted. When the active encryption key changes for the Solr Core, the re-encryption of the update logs is done synchronously when an old log file is opened for addition. This re-encryption is nearly as fast as a file copy.

Comparing with an OS-level encryption:

  • OS-level encryption [1][2] is more performant and more adapted to let Lucene leverage the OS memory cache. It can manage encryption at block or filesystem level in the OS. This makes it possible to encrypt with different keys per-directory, making multi-tenant use-cases possible. If you can use OS-level encryption, prefer it and skip this Java-level encryption.

  • Java-level encryption can be used when the OS-level encryption management is not possible (e.g. host machine managed by a cloud provider), or when even admin rights should not allow to get clear access to the index files. It has an impact on performance: expect -20% on most queries, -60% on multi-term queries.

[1] https://wiki.archlinux.org/title/Fscrypt

[2] https://www.kernel.org/doc/html/latest/filesystems/fscrypt.html

Key Management - some coding required

In addition to configuring solrconfig.xml (see the section below), you will need to provide an implementation of the org.apache.solr.encryption.KeySupplier.Factory interface. Your KeySupplier will supply the encryption keys based on your setup (e.g. getting them from a Key Management System). You may also need to implement an extension of org.apache.solr.encryption.EncryptionRequestHandler, overriding the buildKeyCookie method, if your KeySupplier needs specific parameters to get a key.

Installing and Configuring the Encryption Plug-In

  1. Configure the sharedLib directory in solr.xml (e.g. sharedLIb=lib) and place the Encryption plug-in jar file into the specified folder.

solr.xml

<solr>

    <str name="sharedLib">${solr.sharedLib:}</str>

</solr>
  1. Configure the Encryption classes in solrconfig.xml.

solrconfig.xml

<config>

    <directoryFactory name="DirectoryFactory"
                      class="org.apache.solr.encryption.EncryptionDirectoryFactory">
        <str name="keySupplierFactory">com.yourApp.YourKeySupplier$Factory</str>
        <str name="encrypterFactory">org.apache.solr.encryption.crypto.CipherAesCtrEncrypter$Factory</str>
    </directoryFactory>

    <updateHandler class="org.apache.solr.encryption.EncryptionUpdateHandler">
        <updateLog class="org.apache.solr.encryption.EncryptionUpdateLog"/>
    </updateHandler>

    <requestHandler name="/admin/encrypt" class="org.apache.solr.encryption.EncryptionRequestHandler"/>

    <indexConfig>
        <mergePolicyFactory class="org.apache.solr.encryption.EncryptionMergePolicyFactory">
            <str name="wrapped.prefix">delegate</str>
            <str name="delegate.class">org.apache.solr.index.TieredMergePolicyFactory</str>
        </mergePolicyFactory>
    </indexConfig>

    <backup>
        <repository name="encryptionBackupRepository" class="org.apache.solr.encryption.EncryptionBackupRepository" default="true">
            <str name="delegateRepoName">yourBackupRepository</str>
        </repository>
    </backup>

</config>

EncryptionDirectoryFactory is the DirectoryFactory that encrypts/decrypts all (or some) the index files.

keySupplierFactory is a required parameter to specify your implementation of org.apache.solr.encryption.KeySupplier.Factory. This class is used to define your KeySupplier. You may use here the org.apache.solr.encryption.kms.KmsKeySupplier with your implementation of the org.apache.solr.encryption.kms.KmsClient. See more details in the KmsKeySupplier section below.

encrypterFactory is an optional parameter to specify the org.apache.solr.encryption.crypto.AesCtrEncrypterFactory to use. By default CipherAesCtrEncrypter$Factory is used. You can change to LightAesCtrEncrypter$Factory for a more lightweight and efficient implementation (+10% perf), but it calls an internal com.sun.crypto.provider.AESCrypt() constructor which either logs a JDK warning (Illegal reflective access) with JDK 16 and below, or with JDK 17 and above requires to open the access to the com.sun.crypto.provider package with the jvm arg --add-opens=java.base/com.sun.crypto.provider=ALL-UNNAMED. Both support encrypting files up to 17 TB.

EncryptionUpdateHandler replaces the standard DirectUpdateHandler2 (which it extends) to store persistently the encryption key id in the commit metadata. It supports all the configuration parameters of DirectUpdateHandler2.

EncryptionUpdateLog replaces the standard UpdateLog (which it extends) to support the encryption of the update logs.

EncryptionRequestHandler receives (re)encryption requests. See its dedicated EncryptionRequestHandler section below for its usage.

EncryptionMergePolicyFactory is a wrapper above a delegate MergePolicyFactory (e.g. the standard TieredMergePolicyFactory) to ensure all index segments are re-written (re-encrypted).

EncryptionBackupRepository ensures the encrypted files are copied encrypted to a delegate BackupRepository, but still verifies their checksum before the copy. It requires that you define a delegate BackupRepository

Getting keys from a Key Management System with KmsKeySupplier

If you have a Key Management System to manage the encryption keys lifecycle, then you can use the org.apache.solr.encryption.kms.KmsKeySupplier. In this case, it requires that the Solr client sends some key blob to the EncryptionRequestHandler in addition to the key id. The key blob contains an encrypted form of the key secret and enough data for your KMS to decrypt it and provide the clear-text key secret. The key blob is stored in the metadata of each index file. And when needed, the KmsKeySupplier calls your KMS with your KmsClient to decrypt the key blob and store the key secret in an in-memory key cache with automatic wiping of the cache entries after some short duration.

KmsKeySupplier requires to define KmsEncryptionRequestHandler as the EncryptionRequestHandler. It requires the parameters tenantId and encryptionKeyBlob to be sent in the SolrQueryRequest when calling KmsEncryptionRequestHandler.

solrconfig.xml

<config>

    <directoryFactory name="DirectoryFactory"
                      class="org.apache.solr.encryption.EncryptionDirectoryFactory">
        <str name="keySupplierFactory">org.apache.solr.encryption.kms.KmsKeySupplier$Factory</str>
        <str name="kmsClientFactory">com.yourApp.YourKmsClient$Factory</str>
    </directoryFactory>

    <requestHandler name="/admin/encrypt" class="org.apache.solr.encryption.kms.KmsEncryptionRequestHandler"/>

</config>

Calling EncryptionRequestHandler

Once Solr is set up, it is ready to encrypt. To set the encryption key id to use, the Solr client calls the EncryptionRequestHandler at /admin/encrypt.

EncryptionRequestHandler handles an encryption request for a specific Solr core.

The caller provides the mandatory encryptionKeyId request parameter to define the encryption key id to use to encrypt the index files. To decrypt the index to cleartext, the special parameter value no_key_id must be provided.

The encryption processing is asynchronous. The request returns immediately with two response parameters.

  • encryptionState parameter with value either pending, complete, or busy.
  • status parameter with values either success or failure.

The expected usage of this handler is to first send an encryption request with a key id, and to receive a response with status=success and encryptionState=pending. If the caller needs to know when the encryption is complete, it can (optionally) repeatedly send the same encryption request with the same key id, until it receives a response with status=success and encryptionState=complete.

If the handler returns a response with encryptionState=busy, it means that another encryption for a different key id is ongoing on the same Solr core. It cannot start a new encryption until it finishes.

If the handler returns a response with status=failure, it means the request did not succeed and should be retried by the caller (there should be error logs).

If your KeySupplier implementation requires specific parameters to supply keys, you may need to extend the EncryptionRequestHandler with your own class to override the buildKeyCookie method. The key cookie is passed to the KeySupplier to get a key.

Encryption Algorithm

This encryption module implements AES-CTR.

AES-CTR compared to AES-XTS: Lucene produces read-only files per index segment. Since we have a new random IV per file, we don't repeat the same AES encrypted blocks. So we are in a safe write-once case where AES-XTS and AES-CTR have the same strength [1][2]. CTR was chosen because it is simpler.

[1] https://crypto.stackexchange.com/questions/64556/aes-xts-vs-aes-ctr-for-write-once-storage

[2] https://crypto.stackexchange.com/questions/14628/why-do-we-use-xts-over-ctr-for-disk-encryption

Performance Notes

The performance benchmark was run in LUCENE-9379. Here is the summary:

  • An OS-level encryption is faster.
  • Otherwise, expect an average of -20% perf impact on most queries, -60% on multi-term queries.
  • You can use the LightAesCtrEncrypter$Factory to get +10% perf. This is a simple config change. See the solrconfig.xml configuration section above.
  • You can make the Lucene Codec store its FST on heap and expect +15% perf, at the price of more Java heap usage. This requires a code change. See org.apache.lucene.util.fst.FSTStore implementations and usage in org.apache.lucene.codecs.lucene90.blocktree.FieldReader.

Encryption tools

The org.apache.solr.encryption.crypto package contains utility classes to stream encryption/decryption with the AES/CTR/NoPadding transformation. CharStreamEncrypter can encrypt a character stream to a base 64 encoding compatible with JSON, and requires only a small work buffer.