Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graylog container stops working with java.lang.OutOfMemoryError: Java heap space #19806

Open
pbrzica opened this issue Jul 2, 2024 · 5 comments

Comments

@pbrzica
Copy link

pbrzica commented Jul 2, 2024

We have noticed since upgrading to major version 6 from 5 a new issue. Every couple of days or so Graylog first starts logging the following:

docker-graylog-1  | 01:10:49.650 [processbufferprocessor-1] WARN  org.graylog2.streams.StreamRouterEngine - Error matching stream rule <646b7aa40063d45f4807fe8a>  <REGEX/^prod-logging> for stream Random Stream Name
docker-graylog-1  | java.util.concurrent.TimeoutException: null
docker-graylog-1  |     at java.base/java.util.concurrent.FutureTask.get(Unknown Source) ~[?:?]
docker-graylog-1  |     at com.google.common.util.concurrent.SimpleTimeLimiter.callWithTimeout(SimpleTimeLimiter.java:153) ~[graylog.jar:?]
docker-graylog-1  |     at org.graylog2.streams.StreamRouterEngine$Rule.matchWithTimeOut(StreamRouterEngine.java:325) [graylog.jar:?]
docker-graylog-1  |     at org.graylog2.streams.StreamRouterEngine.match(StreamRouterEngine.java:206) [graylog.jar:?]
docker-graylog-1  |     at org.graylog2.streams.StreamRouter.route(StreamRouter.java:104) [graylog.jar:?]
docker-graylog-1  |     at org.graylog2.messageprocessors.StreamMatcherFilterProcessor.route(StreamMatcherFilterProcessor.java:66) [graylog.jar:?]
docker-graylog-1  |     at org.graylog2.messageprocessors.StreamMatcherFilterProcessor.process(StreamMatcherFilterProcessor.java:81) [graylog.jar:?]
docker-graylog-1  |     at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.handleMessage(ProcessBufferProcessor.java:167) [graylog.jar:?]
docker-graylog-1  |     at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.dispatchMessage(ProcessBufferProcessor.java:137) [graylog.jar:?]
docker-graylog-1  |     at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:107) [graylog.jar:?]
docker-graylog-1  |     at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:52) [graylog.jar:?]
docker-graylog-1  |     at org.graylog2.shared.buffers.PartitioningWorkHandler.onEvent(PartitioningWorkHandler.java:52) [graylog.jar:?]
docker-graylog-1  |     at com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:167) [graylog.jar:?]
docker-graylog-1  |     at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:122) [graylog.jar:?]
docker-graylog-1  |     at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66) [graylog.jar:?]
docker-graylog-1  |     at java.base/java.lang.Thread.run(Unknown Source) [?:?]

As time goes by more and more of these logs appear and then everything starts crashing with Java heap space errors. Example:

docker-graylog-1  | 02:16:55.439 [scheduled-daemon-23] ERROR org.graylog2.shared.bindings.SchedulerBindings - Thread scheduled-daemon-23 failed by not catching exception: java.lang.OutOfMemoryError: Java heap space.
docker-graylog-1  |
docker-graylog-1  | Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "I/O dispatcher 30"
docker-graylog-1  |
docker-graylog-1  | Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "I/O dispatcher 25"
docker-graylog-1  |
docker-graylog-1  | Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "I/O dispatcher 27"
docker-graylog-1  |
docker-graylog-1  | Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "I/O dispatcher 20"
docker-graylog-1  |
docker-graylog-1  | Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "I/O dispatcher 2"
docker-graylog-1  |
docker-graylog-1  | Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "AMQP Connection 127.0.0.1:5672"
docker-graylog-1  |
docker-graylog-1  | Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "stream-router-62"
docker-graylog-1  |
docker-graylog-1  | Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "I/O dispatcher 31"
docker-graylog-1  |
docker-graylog-1  | Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "I/O dispatcher 5"
docker-graylog-1  |
docker-graylog-1  | Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "I/O dispatcher 31"
docker-graylog-1  |
docker-graylog-1  | Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "I/O dispatcher 22"
docker-graylog-1  |
docker-graylog-1  | Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "aws-instance-lookup-refresher-0"
docker-graylog-1  |
docker-graylog-1  | Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "I/O dispatcher 17"
docker-graylog-1  |
docker-graylog-1  | Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "I/O dispatcher 15"
docker-graylog-1  |
docker-graylog-1  | Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "I/O dispatcher 20"
docker-graylog-1  |
docker-graylog-1  | Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "pool-71-thread-1"
docker-graylog-1  |
docker-graylog-1  | Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "I/O dispatcher 26"
docker-graylog-1  |
docker-graylog-1  | Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "cluster-rtt-ClusterId{value='6680ee1f1b2f3946cfffaba1', description='null'}-127.0.0.1:27017"
docker-graylog-1  |
docker-graylog-1  | Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "inputbufferprocessor-1"
docker-graylog-1  | 02:52:12.221 [inputbufferprocessor-4] WARN  org.graylog2.shared.buffers.InputBufferImpl - Unable to process event RawMessageEvent{raw=null, uuid=cc5ef3b5-3818-11ef-9476-4a8499687792, encodedLength=1711}, sequence 458134355
docker-graylog-1  | java.lang.OutOfMemoryError: Java heap space: failed reallocation of scalar replaced objects
docker-graylog-1  | 02:52:12.221 [scheduled-daemon-0] ERROR org.graylog2.shared.bindings.SchedulerBindings - Thread scheduled-daemon-0 failed by not catching exception: java.lang.OutOfMemoryError: Java heap space.

We receive the most logs from our RabbitMQ input and we average around 3-4k messages per second per node.
I've tried increasing the heap and decreasing the number of processors but nothing seems to help.

I've also attached load and memory graphs (at around 5:00 is when Graylog completely stops working, but prior to that load and memory usage is completely normal)
Screenshot from 2024-07-02 11-01-32

Expected Behavior

Graylog doesn't run out of heap space

Current Behavior

Graylog works fine for some time and then at random intervals starts crashing due to heap errors

Possible Solution

I've noticed that increasing the heap increases the duration that Graylog stays healthy so is it possible this is a memory leak some where?

Steps to Reproduce (for bugs)

  1. Run Graylog and ingest logs
  2. After some time Graylog crashes with heap errors (happens even if heap is increased by more than 2x)

Context

Self explanatory

Your Environment

Using the official Graylog docker image

  • Graylog Version: 6.0.2-1
  • Java Version: OpenJDK 64-Bit Server VM Temurin-17.0.11+9 (build 17.0.11+9, mixed mode, sharing)
  • ElasticSearch Version: 7.10.2
  • MongoDB Version: 5.0.17-14
  • Operating System: Ubuntu 22.04.4 LTS (in image, host is running on CentOS 7)
  • RabbitMQ Version: 3.7.28
  • Memory: 128 GB (31 GB assigned to JVM HEAP for Graylog, Elastic is on different servers so it has plenty of space)
  • CPU: 32 threads
@pbrzica pbrzica added the bug label Jul 2, 2024
@tellistone
Copy link

Hello, thanks for raising this, looks like a memory leak

re: Error matching stream rule <646b7aa40063d45f4807fe8a> <REGEX/^prod-logging> for stream Random Stream Name

Is the associated stream receiving messages from the rabbitMQ input?

@pbrzica
Copy link
Author

pbrzica commented Jul 3, 2024

Hi, just checked the logs.

The associated stream is receiving logs from RabbitMQ but we start getting these errors on all of our streams, including ones using GELF TCP inputs (above was just an example). I am guessing as memory gets lower it happens more and more until it finally runs out. If it can help, almost all of our streams (76 of them, excluding the system ones) use regex on the stream rules. Mainly 2 or 3 rules in the style of:

source: ^prod-
or
channel: ^service$

I can try setting up some more Graylog metrics if you think they'd be helpful (just need info if you have any specific ones in mind).

I'll also try stopping/starting the inputs after some time to see if that can maybe help.

@pbrzica
Copy link
Author

pbrzica commented Jul 4, 2024

Just noticed #19629
I don't know if anything specific uses it but we will update to 6.0.4 today and report back.

@pbrzica
Copy link
Author

pbrzica commented Jul 15, 2024

Reporting back. Graylog has been up for 11 days without any issues on version 6.0.4

@thll
Copy link
Contributor

thll commented Jul 16, 2024

Thanks for the feedback, @pbrzica. Very much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants