Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 2.15 using qat_deflate with default docker image crashes node because of missing library #168

Closed
sandervandegeijn opened this issue Jul 25, 2024 · 31 comments
Labels
bug Something isn't working untriaged v2.16.0 Issues targeting release v2.16.0

Comments

@sandervandegeijn
Copy link

sandervandegeijn commented Jul 25, 2024

Describe the bug

Trying to use the qat_deflate compression. According to the docs it should be there from 2.14 on. This is not correct btw, in 2.14.0 it can't be used, gives an instant error. Created a PR for the docs for that one.

In 2.15 is does not throw the error of not supporting the codec but it does crash the node.

Related component

Storage

To Reproduce

PUT /telegraf-rabbitmq-dip-tst-2024.07.24-qat
{
  "settings": {
    "index": {
      "codec": "qat_deflate"
    }
  }
}

Then reindex:

POST /_reindex
{
  "source": {
    "index": "telegraf-rabbitmq-dip-tst-2024.07.24"
  },
  "dest": {
    "index": "telegraf-rabbitmq-dip-tst-2024.07.24-qat"
  }
}

Expected behavior

Do not crash

Additional Details

Base 2.15.0 docker image with the s3 plugin installed.

Log:

opensearch-data-nodes-hot-1 opensearch-data-nodes-hot [2024-07-25T20:02:25,225][ERROR][o.o.b.OpenSearchUncaughtExceptionHandler] [opensearch-data-nodes-hot-1] fatal error in thread [opensearch[opensearch-data-nodes-hot-1][write][T#3]], exiting
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot java.lang.UnsatisfiedLinkError: /tmp/opensearch-9601680627269700390/libqat-java12656953293004161880.so: libqatzip.so.3: cannot open shared object file: No such file or directory
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at java.base/jdk.internal.loader.NativeLibraries.load(Native Method) ~[?:?]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at java.base/jdk.internal.loader.NativeLibraries$NativeLibraryImpl.open(NativeLibraries.java:331) ~[?:?]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at java.base/jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:197) ~[?:?]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at java.base/jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:139) ~[?:?]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2418) ~[?:?]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at java.base/java.lang.Runtime.load0(Runtime.java:852) ~[?:?]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at java.base/java.lang.System.load(System.java:2025) ~[?:?]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at com.intel.qat.Native.lambda$loadLibrary$1(Native.java:99) ~[?:?]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at java.base/java.security.AccessController.doPrivileged(AccessController.java:319) ~[?:?]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at com.intel.qat.Native.loadLibrary(Native.java:102) ~[?:?]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at com.intel.qat.InternalJNI.<clinit>(InternalJNI.java:17) ~[?:?]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at com.intel.qat.QatZipper.<clinit>(QatZipper.java:97) ~[?:?]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.opensearch.index.codec.customcodecs.QatZipperFactory.createInstance(QatZipperFactory.java:34) ~[?:?]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.opensearch.index.codec.customcodecs.QatZipperFactory.createInstance(QatZipperFactory.java:166) ~[?:?]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.opensearch.index.codec.customcodecs.QatCompressionMode$QatCompressor.<init>(QatCompressionMode.java:96) ~[?:?]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.opensearch.index.codec.customcodecs.QatCompressionMode.newCompressor(QatCompressionMode.java:75) ~[?:?]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsWriter.<init>(Lucene90CompressingStoredFieldsWriter.java:118) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsFormat.fieldsWriter(Lucene90CompressingStoredFieldsFormat.java:140) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.opensearch.index.codec.customcodecs.Lucene99QatStoredFieldsFormat.fieldsWriter(Lucene99QatStoredFieldsFormat.java:124) ~[?:?]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.apache.lucene.index.StoredFieldsConsumer.initStoredFieldsWriter(StoredFieldsConsumer.java:50) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.apache.lucene.index.StoredFieldsConsumer.startDocument(StoredFieldsConsumer.java:57) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.apache.lucene.index.IndexingChain.startStoredFields(IndexingChain.java:535) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.apache.lucene.index.IndexingChain.processDocument(IndexingChain.java:566) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:263) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:425) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1558) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1843) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1483) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.opensearch.index.engine.InternalEngine.addDocs(InternalEngine.java:1281) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.opensearch.index.engine.InternalEngine.indexIntoLucene(InternalEngine.java:1217) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.opensearch.index.engine.InternalEngine.index(InternalEngine.java:1011) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.opensearch.index.shard.IndexShard.index(IndexShard.java:1215) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.opensearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:1160) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.opensearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:1051) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.opensearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:625) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.opensearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:471) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:941) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot fatal error in thread [opensearch[opensearch-data-nodes-hot-1][write][T#3]], exiting
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot java.lang.UnsatisfiedLinkError: /tmp/opensearch-9601680627269700390/libqat-java12656953293004161880.so: libqatzip.so.3: cannot open shared object file: No such file or directory
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at java.base/jdk.internal.loader.NativeLibraries.load(Native Method)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at java.base/jdk.internal.loader.NativeLibraries$NativeLibraryImpl.open(NativeLibraries.java:331)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at java.base/jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:197)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at java.base/jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:139)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2418)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at java.base/java.lang.Runtime.load0(Runtime.java:852)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at java.base/java.lang.System.load(System.java:2025)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at com.intel.qat.Native.lambda$loadLibrary$1(Native.java:99)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at java.base/java.security.AccessController.doPrivileged(AccessController.java:319)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at com.intel.qat.Native.loadLibrary(Native.java:102)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at com.intel.qat.InternalJNI.<clinit>(InternalJNI.java:17)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at com.intel.qat.QatZipper.<clinit>(QatZipper.java:97)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.opensearch.index.codec.customcodecs.QatZipperFactory.createInstance(QatZipperFactory.java:34)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.opensearch.index.codec.customcodecs.QatZipperFactory.createInstance(QatZipperFactory.java:166)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.opensearch.index.codec.customcodecs.QatCompressionMode$QatCompressor.<init>(QatCompressionMode.java:96)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.opensearch.index.codec.customcodecs.QatCompressionMode.newCompressor(QatCompressionMode.java:75)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsWriter.<init>(Lucene90CompressingStoredFieldsWriter.java:118)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsFormat.fieldsWriter(Lucene90CompressingStoredFieldsFormat.java:140)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.opensearch.index.codec.customcodecs.Lucene99QatStoredFieldsFormat.fieldsWriter(Lucene99QatStoredFieldsFormat.java:124)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.apache.lucene.index.StoredFieldsConsumer.initStoredFieldsWriter(StoredFieldsConsumer.java:50)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.apache.lucene.index.StoredFieldsConsumer.startDocument(StoredFieldsConsumer.java:57)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.apache.lucene.index.IndexingChain.startStoredFields(IndexingChain.java:535)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.apache.lucene.index.IndexingChain.processDocument(IndexingChain.java:566)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:263)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:425)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1558)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1843)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1483)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.opensearch.index.engine.InternalEngine.addDocs(InternalEngine.java:1281)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.opensearch.index.engine.InternalEngine.indexIntoLucene(InternalEngine.java:1217)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.opensearch.index.engine.InternalEngine.index(InternalEngine.java:1011)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.opensearch.index.shard.IndexShard.index(IndexShard.java:1215)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.opensearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:1160)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.opensearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:1051)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.opensearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:625)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.opensearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:471)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:941)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
opensearch-data-nodes-hot-1 opensearch-data-nodes-hot 	at java.base/java.lang.Thread.run(Thread.java:1583)

@sandervandegeijn sandervandegeijn added bug Something isn't working untriaged labels Jul 25, 2024
@sandervandegeijn sandervandegeijn changed the title [BUG] 2.15 using qat_deflate crashes node because of missing library [BUG] 2.15 using qat_deflate with default docker image crashes node because of missing library Jul 25, 2024
@dblock
Copy link
Member

dblock commented Jul 25, 2024

@sandervandegeijn Did you get a chance to dig into this? Might be a 2.16 showstopper.

@dblock dblock added the v2.16.0 Issues targeting release v2.16.0 label Jul 25, 2024
@sandervandegeijn
Copy link
Author

sandervandegeijn commented Jul 25, 2024

Any suggestions on how to provide more useful info? The only thing I need to do is to create a new index with the codec and reindex the data. As soon as I hit send the node crashes.

Dockerfile with which we extend the base image with the Azure plugin (sorry, we use S3 on the other cluster, this one uses Azure blob storage).

ARG osversion

FROM opensearchproject/opensearch:${osversion}

#RUN /usr/share/opensearch/bin/opensearch-plugin install --batch repository-s3
RUN /usr/share/opensearch/bin/opensearch-plugin install --batch repository-azure

It looks like a CI/CD packaging / path problem to me at first glance.

I hit another thing while testing the compression codecs, using zlib actually increases my index size vs no codec specified. Still investigating that one.

@peterzhuamazon
Copy link
Member

peterzhuamazon commented Jul 25, 2024

Hi @sandervandegeijn ,

Would you give us more details on It looks like a CI/CD packaging / path problem to me at first glance?
I am a bit confused here as this seems like a code issue related to the core.

Could you test again on the tarball artifact?
Since docker release is basically running the tarball.

Also sync @reta @sarthakaggarwal97 into the discussion since this looks like a opensearch-project/custom-codecs issue.

Thanks.

@dblock
Copy link
Member

dblock commented Jul 25, 2024

I was not able to reproduce on the default distribution of 2.15.

$ curl -k -u admin:$OPENSEARCH_PASSWORD -X GET https://localhost:9200/
{
  "name" : "02b751c49f0f",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "HA6t8-RuQQmMmWFn1SnTOw",
  "version" : {
    "distribution" : "opensearch",
    "number" : "2.15.0",
    "build_type" : "tar",
    "build_hash" : "61dbcd0795c9bfe9b81e5762175414bc38bbcadf",
    "build_date" : "2024-06-20T03:27:32.562036890Z",
    "build_snapshot" : false,
    "lucene_version" : "9.10.0",
    "minimum_wire_compatibility_version" : "7.10.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "The OpenSearch Project: https://opensearch.org/"
}
$ curl -k -u admin:$OPENSEARCH_PASSWORD -X PUT https://localhost:9200/my_index --json '{
  "settings": {
    "index": {
      "codec": "qat_deflate"
    }
  }
}'
{"acknowledged":true,"shards_acknowledged":true,"index":"my_index"}

$ curl -k -u admin:$OPENSEARCH_PASSWORD -X POST https://localhost:9200/_reindex --json '{
  "source": {
    "index": "my_index"
  },
  "dest": {
    "index": "their_index"
  }
}'

{"took":6,"timed_out":false,"total":0,"updated":0,"created":0,"deleted":0,"batches":0,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0,"failures":[]}

@sandervandegeijn
Copy link
Author

sandervandegeijn commented Jul 25, 2024

The reason I suspect a packaging problem is the error:

libqatzip.so.3: cannot open shared object file: No such file or directory

Looks like the lib is missing or in the wrong path.

@dblock have you tried doing a reindex to that newly created index? That's when the error occurs on my cluster.

Could it otherwise be that you have the lib on your system and it's linking dynamically?

@dblock
Copy link
Member

dblock commented Jul 25, 2024

@sandervandegeijn yes, sorry, I forgot to copy-paste the last part, works on my machine

@getsaurabh02
Copy link
Member

In 2.15 is does not throw the error of not supporting the codec but it does crash the node.

Should we also look at ways to disable this codec, while we figure out a way to actually get a fix in place. Moreover, since this codec is anyway not working since 2.14, if we can throw 4xx instead to prevent crash?

@peterzhuamazon
Copy link
Member

libqatzip.so.3

Let me do some testings directly on the docker release image.

Thanks.

@dblock
Copy link
Member

dblock commented Jul 25, 2024

I tried to add a document to my index and it crashed the node.

$ curl -k -u admin:$OPENSEARCH_PASSWORD -X POST https://localhost:9200/my_index/_doc --json '{"x":1}'
opensearch-cluster-1  | [2024-07-25T20:58:59,634][INFO ][o.o.p.PluginsService     ] [02b751c49f0f] PluginService:onIndexModule index:[my_index/cuvrKg3HReadlbWVxZfxvA]
opensearch-cluster-1  | [2024-07-25T20:58:59,637][INFO ][o.o.c.m.MetadataMappingService] [02b751c49f0f] [my_index/cuvrKg3HReadlbWVxZfxvA] create_mapping
opensearch-cluster-1  | [2024-07-25T20:58:59,660][INFO ][o.o.p.PluginsService     ] [02b751c49f0f] PluginService:onIndexModule index:[security-auditlog-2024.07.25/nPokmBqBTDqMlbw8Anxscg]
opensearch-cluster-1  | [2024-07-25T20:58:59,668][INFO ][o.o.c.m.MetadataMappingService] [02b751c49f0f] [security-auditlog-2024.07.25/nPokmBqBTDqMlbw8Anxscg] update_mapping [_doc]
opensearch-cluster-1  | [2024-07-25T20:58:59,664][ERROR][o.o.b.OpenSearchUncaughtExceptionHandler] [02b751c49f0f] fatal error in thread [opensearch[02b751c49f0f][write][T#4]], exiting
opensearch-cluster-1  | java.lang.ExceptionInInitializerError: null
opensearch-cluster-1  | 	at com.intel.qat.QatZipper.<clinit>(QatZipper.java:97) ~[?:?]
opensearch-cluster-1  | 	at org.opensearch.index.codec.customcodecs.QatZipperFactory.createInstance(QatZipperFactory.java:34) ~[?:?]
opensearch-cluster-1  | 	at org.opensearch.index.codec.customcodecs.QatZipperFactory.createInstance(QatZipperFactory.java:166) ~[?:?]
opensearch-cluster-1  | 	at org.opensearch.index.codec.customcodecs.QatCompressionMode$QatCompressor.<init>(QatCompressionMode.java:96) ~[?:?]
opensearch-cluster-1  | 	at org.opensearch.index.codec.customcodecs.QatCompressionMode.newCompressor(QatCompressionMode.java:75) ~[?:?]
opensearch-cluster-1  | 	at org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsWriter.<init>(Lucene90CompressingStoredFieldsWriter.java:118) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-cluster-1  | 	at org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsFormat.fieldsWriter(Lucene90CompressingStoredFieldsFormat.java:140) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-cluster-1  | 	at org.opensearch.index.codec.customcodecs.Lucene99QatStoredFieldsFormat.fieldsWriter(Lucene99QatStoredFieldsFormat.java:124) ~[?:?]
opensearch-cluster-1  | 	at org.apache.lucene.index.StoredFieldsConsumer.initStoredFieldsWriter(StoredFieldsConsumer.java:50) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-cluster-1  | 	at org.apache.lucene.index.StoredFieldsConsumer.startDocument(StoredFieldsConsumer.java:57) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-cluster-1  | 	at org.apache.lucene.index.IndexingChain.startStoredFields(IndexingChain.java:535) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-cluster-1  | 	at org.apache.lucene.index.IndexingChain.processDocument(IndexingChain.java:566) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-cluster-1  | 	at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:263) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-cluster-1  | 	at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:425) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-cluster-1  | 	at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1558) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-cluster-1  | 	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1843) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-cluster-1  | 	at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1483) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-cluster-1  | 	at org.opensearch.index.engine.InternalEngine.addDocs(InternalEngine.java:1281) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-cluster-1  | 	at org.opensearch.index.engine.InternalEngine.indexIntoLucene(InternalEngine.java:1217) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-cluster-1  | 	at org.opensearch.index.engine.InternalEngine.index(InternalEngine.java:1011) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-cluster-1  | 	at org.opensearch.index.shard.IndexShard.index(IndexShard.java:1215) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-cluster-1  | 	at org.opensearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:1160) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-cluster-1  | 	at org.opensearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:1051) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-cluster-1  | 	at org.opensearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:625) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-cluster-1  | 	at org.opensearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:471) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-cluster-1  | 	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:941) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-cluster-1  | 	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-cluster-1  | 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
opensearch-cluster-1  | 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
opensearch-cluster-1  | 	at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
opensearch-cluster-1  | Caused by: java.lang.UnsupportedOperationException: Unsupported OS/arch, cannot find /com/intel/qat/linux/aarch64/libqat-java.so. Please try building from source.
opensearch-cluster-1  | 	at com.intel.qat.Native.loadLibrary(Native.java:68) ~[?:?]
opensearch-cluster-1  | 	at com.intel.qat.InternalJNI.<clinit>(InternalJNI.java:17) ~[?:?]
opensearch-cluster-1  | 	... 30 more
opensearch-cluster-1  | fatal error in thread [opensearch[02b751c49f0f][write][T#4]], exiting
opensearch-cluster-1  | java.lang.ExceptionInInitializerError
opensearch-cluster-1  | 	at com.intel.qat.QatZipper.<clinit>(QatZipper.java:97)
opensearch-cluster-1  | 	at org.opensearch.index.codec.customcodecs.QatZipperFactory.createInstance(QatZipperFactory.java:34)
opensearch-cluster-1  | 	at org.opensearch.index.codec.customcodecs.QatZipperFactory.createInstance(QatZipperFactory.java:166)
opensearch-cluster-1  | 	at org.opensearch.index.codec.customcodecs.QatCompressionMode$QatCompressor.<init>(QatCompressionMode.java:96)
opensearch-cluster-1  | 	at org.opensearch.index.codec.customcodecs.QatCompressionMode.newCompressor(QatCompressionMode.java:75)
opensearch-cluster-1  | 	at org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsWriter.<init>(Lucene90CompressingStoredFieldsWriter.java:118)
opensearch-cluster-1  | 	at org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsFormat.fieldsWriter(Lucene90CompressingStoredFieldsFormat.java:140)
opensearch-cluster-1  | 	at org.opensearch.index.codec.customcodecs.Lucene99QatStoredFieldsFormat.fieldsWriter(Lucene99QatStoredFieldsFormat.java:124)
opensearch-cluster-1  | 	at org.apache.lucene.index.StoredFieldsConsumer.initStoredFieldsWriter(StoredFieldsConsumer.java:50)
opensearch-cluster-1  | 	at org.apache.lucene.index.StoredFieldsConsumer.startDocument(StoredFieldsConsumer.java:57)
opensearch-cluster-1  | 	at org.apache.lucene.index.IndexingChain.startStoredFields(IndexingChain.java:535)
opensearch-cluster-1  | 	at org.apache.lucene.index.IndexingChain.processDocument(IndexingChain.java:566)
opensearch-cluster-1  | 	at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:263)
opensearch-cluster-1  | 	at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:425)
opensearch-cluster-1  | 	at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1558)
opensearch-cluster-1  | 	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1843)
opensearch-cluster-1  | 	at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1483)
opensearch-cluster-1  | 	at org.opensearch.index.engine.InternalEngine.addDocs(InternalEngine.java:1281)
opensearch-cluster-1  | 	at org.opensearch.index.engine.InternalEngine.indexIntoLucene(InternalEngine.java:1217)
opensearch-cluster-1  | 	at org.opensearch.index.engine.InternalEngine.index(InternalEngine.java:1011)
opensearch-cluster-1  | 	at org.opensearch.index.shard.IndexShard.index(IndexShard.java:1215)
opensearch-cluster-1  | 	at org.opensearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:1160)
opensearch-cluster-1  | 	at org.opensearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:1051)
opensearch-cluster-1  | 	at org.opensearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:625)
opensearch-cluster-1  | 	at org.opensearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:471)
opensearch-cluster-1  | 	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:941)
opensearch-cluster-1  | 	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
opensearch-cluster-1  | 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
opensearch-cluster-1  | 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
opensearch-cluster-1  | 	at java.base/java.lang.Thread.run(Thread.java:1583)
opensearch-cluster-1  | Caused by: java.lang.UnsupportedOperationException: Unsupported OS/arch, cannot find /com/intel/qat/linux/aarch64/libqat-java.so. Please try building from source.
opensearch-cluster-1  | 	at com.intel.qat.Native.loadLibrary(Native.java:68)
opensearch-cluster-1  | 	at com.intel.qat.InternalJNI.<clinit>(InternalJNI.java:17)
opensearch-cluster-1  | 	... 30 more

@peterzhuamazon
Copy link
Member

opensearch-cluster-1 | Caused by: java.lang.UnsupportedOperationException: Unsupported OS/arch, cannot find /com/intel/qat/linux/aarch64/libqat-java.so. Please try building from source.

That seems like a different issue:
opensearch-cluster-1 | Caused by: java.lang.UnsupportedOperationException: Unsupported OS/arch, cannot find /com/intel/qat/linux/aarch64/libqat-java.so. Please try building from source.

@peterzhuamazon
Copy link
Member

opensearch-cluster-1 | Caused by: java.lang.UnsupportedOperationException: Unsupported OS/arch, cannot find /com/intel/qat/linux/aarch64/libqat-java.so. Please try building from source.

That seems like a different issue:
opensearch-cluster-1 | Caused by: java.lang.UnsupportedOperationException: Unsupported OS/arch, cannot find /com/intel/qat/linux/aarch64/libqat-java.so. Please try building from source

@sandervandegeijn
Copy link
Author

sandervandegeijn commented Jul 25, 2024

opensearch-cluster-1 | Caused by: java.lang.UnsupportedOperationException: Unsupported OS/arch, cannot find /com/intel/qat/linux/aarch64/libqat-java.so. Please try building from source.

That seems like a different issue: opensearch-cluster-1 | Caused by: java.lang.UnsupportedOperationException: Unsupported OS/arch, cannot find /com/intel/qat/linux/aarch64/libqat-java.so. Please try building from source

Confirmed this one on a Mac with Apple Silicon running the image under docker desktop as well. Didn't expect that error either, from the docs: it should fall back to the software implementation instead of the hardware accelerated one, but it should still work.

Seperate bug?

@peterzhuamazon
Copy link
Member

peterzhuamazon commented Jul 25, 2024

I just tried using the docker images and I dont see the issue as well:


% docker run -it -d -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" -e "OPENSEARCH_INITIAL_ADMIN_PASSWORD=$OPENSEARCH_PASSWORD" opensearchproject/opensearch:2.15.0

3b3758158c66e63686ab22613cdbf78b1567ad1125105bb48532db0a26265ff6


% docker ps
CONTAINER ID   IMAGE                                 COMMAND                  CREATED         STATUS        PORTS                                                                                                      NAMES
3b3758158c66   opensearchproject/opensearch:2.15.0   "./opensearch-docker…"   2 seconds ago   Up 1 second   0.0.0.0:9200->9200/tcp, :::9200->9200/tcp, 9300/tcp, 0.0.0.0:9600->9600/tcp, :::9600->9600/tcp, 9650/tcp   adoring_hoover


% curl -k -u admin:$OPENSEARCH_PASSWORD -X PUT https://localhost:9200/my_index --json '{
  "settings": {
    "index": {
      "codec": "qat_deflate"
    }
  }
}'
{"acknowledged":true,"shards_acknowledged":true,"index":"my_index"}%


% curl -k -u admin:$OPENSEARCH_PASSWORD -X POST https://localhost:9200/_reindex --json '{
  "source": {
    "index": "my_index"
  },
  "dest": {
    "index": "their_index"
  }
}'

{"took":13,"timed_out":false,"total":0,"updated":0,"created":0,"deleted":0,"batches":0,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0,"failures":[]}%

@peterzhuamazon
Copy link
Member

peterzhuamazon commented Jul 25, 2024

opensearch-cluster-1 | Caused by: java.lang.UnsupportedOperationException: Unsupported OS/arch, cannot find /com/intel/qat/linux/aarch64/libqat-java.so. Please try building from source.

That seems like a different issue: opensearch-cluster-1 | Caused by: java.lang.UnsupportedOperationException: Unsupported OS/arch, cannot find /com/intel/qat/linux/aarch64/libqat-java.so. Please try building from source

Confirmed this one on a Mac with Apple Silicon running the image under docker desktop as well. Didn't expect that error either, from the docs: it should fall back to the software implementation instead of the hardware accelerated one, but it should still work.

Seperate bug?

Yeah that probably is a separate issue because in custom-codecs we see this

[opensearch@3b3758158c66 test]$ unzip qat-java-1.1.1.jar
Archive:  qat-java-1.1.1.jar
   creating: META-INF/
  inflating: META-INF/MANIFEST.MF
   creating: com/
   creating: com/intel/
   creating: com/intel/qat/
   creating: com/intel/qat/linux/
   creating: com/intel/qat/linux/amd64/
   creating: META-INF/maven/
   creating: META-INF/maven/com.intel.qat/
   creating: META-INF/maven/com.intel.qat/qat-java/
  inflating: com/intel/qat/Native.class
  inflating: com/intel/qat/QatZipper$Mode.class
  inflating: com/intel/qat/QatException.class
  inflating: com/intel/qat/QatZipper$QatCleaner.class
  inflating: com/intel/qat/QatDecompressorInputStream.class
  inflating: com/intel/qat/package-info.class
  inflating: com/intel/qat/QatCompressorOutputStream.class
  inflating: com/intel/qat/InternalJNI.class
  inflating: com/intel/qat/QatZipper.class
  inflating: com/intel/qat/QatZipper$PollingMode.class
  inflating: com/intel/qat/QatZipper$Algorithm.class
  inflating: com/intel/qat/linux/amd64/libqat-java.so
  inflating: META-INF/maven/com.intel.qat/qat-java/pom.xml
  inflating: META-INF/maven/com.intel.qat/qat-java/pom.properties
  inflating: module-info.class

It seems like it does not support arm64 at this point.

@peterzhuamazon
Copy link
Member

peterzhuamazon commented Jul 25, 2024

Related to this PR in custom-codecs repo

Transfer to there and adding @reta @sarthakaggarwal97 @andrross to take a look. Thanks.

@peterzhuamazon peterzhuamazon transferred this issue from opensearch-project/OpenSearch Jul 25, 2024
@sandervandegeijn
Copy link
Author

sandervandegeijn commented Jul 25, 2024

opensearch-cluster-1 | Caused by: java.lang.UnsupportedOperationException: Unsupported OS/arch, cannot find /com/intel/qat/linux/aarch64/libqat-java.so. Please try building from source.

That seems like a different issue: opensearch-cluster-1 | Caused by: java.lang.UnsupportedOperationException: Unsupported OS/arch, cannot find /com/intel/qat/linux/aarch64/libqat-java.so. Please try building from source

Confirmed this one on a Mac with Apple Silicon running the image under docker desktop as well. Didn't expect that error either, from the docs: it should fall back to the software implementation instead of the hardware accelerated one, but it should still work.
Seperate bug?

Yeah that probably is a separate issue because in custom-codecs we see this

[opensearch@3b3758158c66 test]$ unzip qat-java-1.1.1.jar
Archive:  qat-java-1.1.1.jar
   creating: META-INF/
  inflating: META-INF/MANIFEST.MF
   creating: com/
   creating: com/intel/
   creating: com/intel/qat/
   creating: com/intel/qat/linux/
   creating: com/intel/qat/linux/amd64/
   creating: META-INF/maven/
   creating: META-INF/maven/com.intel.qat/
   creating: META-INF/maven/com.intel.qat/qat-java/
  inflating: com/intel/qat/Native.class
  inflating: com/intel/qat/QatZipper$Mode.class
  inflating: com/intel/qat/QatException.class
  inflating: com/intel/qat/QatZipper$QatCleaner.class
  inflating: com/intel/qat/QatDecompressorInputStream.class
  inflating: com/intel/qat/package-info.class
  inflating: com/intel/qat/QatCompressorOutputStream.class
  inflating: com/intel/qat/InternalJNI.class
  inflating: com/intel/qat/QatZipper.class
  inflating: com/intel/qat/QatZipper$PollingMode.class
  inflating: com/intel/qat/QatZipper$Algorithm.class
  inflating: com/intel/qat/linux/amd64/libqat-java.so
  inflating: META-INF/maven/com.intel.qat/qat-java/pom.xml
  inflating: META-INF/maven/com.intel.qat/qat-java/pom.properties
  inflating: module-info.class

It seems like it does not support arm64 at this point.

Still it should fall back to a pure software implementation right? Should I open a seperate issue in the custom-codecs repo?

The cluster runs on: Intel(R) Xeon(R) Gold 6242R CPU @ 3.10GHz so that should work.

@andrross
Copy link
Member

These look like similar issues...the QAT codecs do not gracefully handle the case where it cannot be loaded (either due to incompatible hardware or missing library).

I would propose the following options for the 2.16 release:

  1. Fix the issue so that the service gracefully fails with a proper error message at index creation time if the codec is not supported. (probably not feasible to do this right in the short time that we have, but happy to be proven wrong)
  2. Remove the QAT codecs (this is a breaking change with no work-around available for any existing users that are using one of these codecs)
  3. Add a setting to opt-in to use these new codecs, but make them unavailable by default (this is a breaking change but gives a work-around to re-enable them. any user is still at risk of a node crash if they enable this setting in the wrong environment)

@getsaurabh02
Copy link
Member

getsaurabh02 commented Jul 26, 2024

@andrross thanks for sharing the options.
Do we know if any existing users can successfully upgrade their clusters with indices on the QAT codec without encountering failures, and will it work seamlessly?
If yes, then can we move forward with a one or combination of 2 and 3 above where enablement for new indices can be prevented?

@dblock
Copy link
Member

dblock commented Jul 26, 2024

I just tried using the docker images and I dont see the issue as well:

Make sure to have a document in the index.

curl -k -u admin:$OPENSEARCH_PASSWORD -X POST https://localhost:9200/my_index/_doc --json '{"x":1}'

@dblock
Copy link
Member

dblock commented Jul 26, 2024

@andrross Why isn't there a catch all in IndexWriter, maybe initStoredFieldsWriter on failure to initialize any of these codecs?

@peterzhuamazon
Copy link
Member

Able to reproduce:


fatal error in thread [opensearch[039c85055a8f][write][T#4]], exiting
java.lang.UnsatisfiedLinkError: /tmp/opensearch-18442301159217191627/libqat-java8257943536944747477.so: libqatzip.so.3: cannot open shared object file: No such file or directory

@sarthakaggarwal97
Copy link
Collaborator

Do we know if any existing users can successfully upgrade their clusters with indices on the QAT codec without encountering failures, and will it work seamlessly?

If the qat codec is unavailable, I doubt the shards will be green. The users can change the codec before upgrading (force merge to 1 segment, so that all the segment has a old / stable codec), and then upgrade.

Add a setting to opt-in to use these new codecs, but make them unavailable by default

I think this is what we wanted always, but we couldn't come up with the consensus on how to mark these codecs experimental. Discussions over here opensearch-project/OpenSearch#13992

I'm good with 3rd option, if we have can come up with a mitigation plan to fix the codecs. If we do not see that happening soon, I will vote for 2nd option.

@andrross
Copy link
Member

@andrross Why isn't there a catch all in IndexWriter, maybe initStoredFieldsWriter on failure to initialize any of these codecs?

@dblock These are java.lang.Error instances, not exceptions. It is generally unsafe to catch Errors as it usually indicates that the JVM is not able to continue operating properly. I suspect the right solution here is to introspect at runtime whether these codecs are available, otherwise don't register them.

@sarthakaggarwal97
Copy link
Collaborator

adding @mulugetam to the discussion (contributor for QAT codec)

@peterzhuamazon
Copy link
Member

peterzhuamazon commented Jul 26, 2024

Hi @sarthakaggarwal97 @mulugetam ,

Do we know if user needs to explicitly install libqatzip.so.3 on their machine?
In our docker image we also do not have this lib it seems, and I cant seem to find it from any existing repositories.
If so, regardless of which option we need to add it to the docker image.
Please let me know the installation of these or you would include it with custom-codecs plugin.

Thanks.

@dblock
Copy link
Member

dblock commented Jul 26, 2024

@andrross Why isn't there a catch all in IndexWriter, maybe initStoredFieldsWriter on failure to initialize any of these codecs?

@dblock These are java.lang.Error instances, not exceptions. It is generally unsafe to catch Errors as it usually indicates that the JVM is not able to continue operating properly. I suspect the right solution here is to introspect at runtime whether these codecs are available, otherwise don't register them.

But this is java.lang.UnsatisfiedLinkError. An explicit loadLibrary for codecs along with a catch for this should be ok, no?

@backslasht
Copy link

1. Fix the issue so that the service gracefully fails with a proper error message at index creation time if the codec is not supported. (probably not feasible to do this right in the short time that we have, but happy to be proven wrong)

I agree, we shouldn't return the codecs if they are not supported in that platform.

3. Add a setting to opt-in to use these new codecs, but make them unavailable by default (this is a breaking change but gives a work-around to re-enable them. any user is still at risk of a node crash if they enable this setting in the wrong environment)

Not a requirement for 2.16. But, we should implement this sooner, today the installation of custom-codes gets all the codecs in it causing issues like this.

@andrross
Copy link
Member

But this is java.lang.UnsatisfiedLinkError. An explicit loadLibrary for codecs along with a catch for this should be ok, no?

@dblock Yes, @sarthakaggarwal97 has implemented something like this. But I don't think we should have a general catch-all in the core layer.

@andrross
Copy link
Member

One thing to note here is that I believe from reading the EC2 documentation that only the metal sizes of the m7i, r7i, and c7i instance types have the QAT hardware acceleration. We don't use those instance sizes in any of our testing infrastructure (to my knowledge) so I don't think we're actually testing this codec anywhere in practice. Some of the documentation suggests that this should fallback to software acceleration but in my test the isQatAvailable always seems to evaluate to false, even on a stock Ubuntu EC2 instance using an intel processor.

@andrross
Copy link
Member

Fixed by #169. Closing.

@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in Storage Project Board Jul 26, 2024
@sandervandegeijn
Copy link
Author

Thanks guys

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working untriaged v2.16.0 Issues targeting release v2.16.0
Projects
Status: ✅ Done
Development

No branches or pull requests

7 participants