Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High Memory Pressure In RestElasticSearchClient #4684

Open
criminosis opened this issue Sep 26, 2024 · 0 comments
Open

High Memory Pressure In RestElasticSearchClient #4684

criminosis opened this issue Sep 26, 2024 · 0 comments

Comments

@criminosis
Copy link
Contributor

criminosis commented Sep 26, 2024

Describe the feature:
In RestElasticSearchClient there are a few places that can cause a high amount of memory pressure unnecessarily.

  1. The submitted mutations are passed in via a collection, which means a reference to the collection is still valid at the call site. Once the mutations are serialized however those unserialized forms are no longer necessary to keep around. The unserialized mutations could be released from the passed in collection so they're no longer prevented from being GC'ed. The collection itself though may stick around.
  2. The serialized mutations are presented to the client by reading them off an iterator. This means there's a reference to the collection of all serialized mutations so long as that iterator lives, and even though things have been successfully sent to ElasticSearch the collection prevents the successfully submitted requests from being garbage collected.
  3. The bulking of the serialized requests can cause unnecessary byte arrays to be allocated as the ByteArrayOutputStream grows its underlying byte []. These may then be released as the array grows, but then another allocation, usually 2x the previous one, is done which may itself be transitory. However the final step causes a copy of the final state of the byte[] within the ByteArrayOutputStream, which can be avoided.

As a reminder the individual elements of each mutation batch can reach 100MB by default. The current client logic will split up each element into chunks so that they're below that limit, but a chunk of 1 is valid. So the copying here may reach non-trivial amounts, especially against concurrent client requests. So anything that can be done to release memory around these mutations as soon as possible maximizes the space JanusGraph can work with if the garbage collector starts getting desperate.

For what it's worth I've had a few OOM's occur in around the present writeTo logic:

Caused by: java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOf(Unknown Source) ~[?:?]
at java.base/java.io.ByteArrayOutputStream.grow(Unknown Source) ~[?:?]
at java.base/java.io.ByteArrayOutputStream.ensureCapacity(Unknown Source) ~[?:?]
at java.base/java.io.ByteArrayOutputStream.write(Unknown Source) ~[?:?]
at java.base/java.io.OutputStream.write(Unknown Source) ~[?:?]
at org.janusgraph.diskstorage.es.rest.RestElasticSearchClient$RequestBytes.writeTo(RestElasticSearchClient.java:441) ~[janusgraph-es-1.1.0-20240628-052016.a3393a1.jar:?]
at org.janusgraph.diskstorage.es.rest.RestElasticSearchClient$RequestBytes.access$500(RestElasticSearchClient.java:398) ~[janusgraph-es-1.1.0-20240628-052016.a3393a1.jar:?]
at org.janusgraph.diskstorage.es.rest.RestElasticSearchClient.buildBulkRequestInput(RestElasticSearchClient.java:450) ~[janusgraph-es-1.1.0-20240628-052016.a3393a1.jar:?]
at org.janusgraph.diskstorage.es.rest.RestElasticSearchClient.bulkRequest(RestElasticSearchClient.java:556) ~[janusgraph-es-1.1.0-20240628-052016.a3393a1.jar:?]
at org.janusgraph.diskstorage.es.ElasticSearchIndex.mutate(ElasticSearchIndex.java:909) ~[janusgraph-es-1.1.0-20240628-052016.a3393a1.jar:?]
at org.janusgraph.diskstorage.util.MetricInstrumentedIndexProvider.lambda$mutate$0(MetricInstrumentedIndexProvider.java:68) ~[janusgraph-core-1.1.0-20240628-052016.a3393a1.jar:?]
at org.janusgraph.diskstorage.util.MetricInstrumentedIndexProvider$$Lambda$1408/0x0000000840a53840.run(Unknown Source) ~[?:?]
at org.janusgraph.diskstorage.util.MetricInstrumentedIndexProvider.runWithMetrics(MetricInstrumentedIndexProvider.java:153) ~[janusgraph-core-1.1.0-20240628-052016.a3393a1.jar:?]
at org.janusgraph.diskstorage.util.MetricInstrumentedIndexProvider.mutate(MetricInstrumentedIndexProvider.java:68) ~[janusgraph-core-1.1.0-20240628-052016.a3393a1.jar:?]
at org.janusgraph.diskstorage.indexing.IndexTransaction$1.call(IndexTransaction.java:151) ~[janusgraph-core-1.1.0-20240628-052016.a3393a1.jar:?]
at org.janusgraph.diskstorage.indexing.IndexTransaction$1.call(IndexTransaction.java:148) ~[janusgraph-core-1.1.0-20240628-052016.a3393a1.jar:?]
at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:66) ~[janusgraph-core-1.1.0-20240628-052016.a3393a1.jar:?]
at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:52) ~[janusgraph-core-1.1.0-20240628-052016.a3393a1.jar:?]
at org.janusgraph.diskstorage.indexing.IndexTransaction.flushInternal(IndexTransaction.java:148) ~[janusgraph-core-1.1.0-20240628-052016.a3393a1.jar:?]
at org.janusgraph.diskstorage.indexing.IndexTransaction.commit(IndexTransaction.java:129) ~[janusgraph-core-1.1.0-20240628-052016.a3393a1.jar:?]
at org.janusgraph.diskstorage.BackendTransaction.commitIndexes(BackendTransaction.java:156) ~[janusgraph-core-1.1.0-20240628-052016.a3393a1.jar:?]
at org.janusgraph.graphdb.database.StandardJanusGraph.commit(StandardJanusGraph.java:1023) [janusgraph-core-1.1.0-20240628-052016.a3393a1.jar:?]
at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.commit(StandardJanusGraphTx.java:1604) [janusgraph-core-1.1.0-20240628-052016.a3393a1.jar:?]
at org.janusgraph.graphdb.tinkerpop.JanusGraphBlueprintsGraph$GraphTransaction.doCommit(JanusGraphBlueprintsGraph.java:322) [janusgraph-core-1.1.0-20240628-052016.a3393a1.jar:?]
at org.apache.tinkerpop.gremlin.structure.util.AbstractTransaction.commit(AbstractTransaction.java:104) [gremlin-core-3.7.2.jar:3.7.2]
at org.janusgraph.graphdb.tinkerpop.JanusGraphBlueprintsGraph$GraphTransaction.commit(JanusGraphBlueprintsGraph.java:300) [janusgraph-core-1.1.0-20240628-052016.a3393a1.jar:?]
at org.janusgraph.graphdb.management.JanusGraphManager.lambda$commitAll$2(JanusGraphManager.java:201) [janusgraph-core-1.1.0-20240628-052016.a3393a1.jar:?]
at org.janusgraph.graphdb.management.JanusGraphManager$$Lambda$1061/0x00000008408e1440.accept(Unknown Source) [janusgraph-core-1.1.0-20240628-052016.a3393a1.jar:?]
at java.base/java.util.concurrent.ConcurrentHashMap.forEach(Unknown Source) ~[?:?]
at org.janusgraph.graphdb.management.JanusGraphManager.commitAll(JanusGraphManager.java:199) ~[janusgraph-core-1.1.0-20240628-052016.a3393a1.jar:?]
at org.apache.tinkerpop.gremlin.server.handler.AbstractSession.closeTransaction(AbstractSession.java:845) ~[gremlin-server-3.7.2.jar:3.7.2]
at org.apache.tinkerpop.gremlin.server.handler.AbstractSession.handleIterator(AbstractSession.java:545) ~[gremlin-server-3.7.2.jar:3.7.2]

Describe a specific use case for the feature:
[If possible add a description of the specific use case.]

criminosis added a commit to criminosis/janusgraph that referenced this issue Sep 26, 2024
criminosis added a commit to criminosis/janusgraph that referenced this issue Sep 26, 2024
criminosis added a commit to criminosis/janusgraph that referenced this issue Sep 27, 2024
criminosis added a commit to criminosis/janusgraph that referenced this issue Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant