Skip to content
This repository has been archived by the owner on Oct 2, 2023. It is now read-only.

ClamAV stopped scanning files #93

Open
nidhigwari opened this issue Jul 21, 2022 · 11 comments
Open

ClamAV stopped scanning files #93

nidhigwari opened this issue Jul 21, 2022 · 11 comments

Comments

@nidhigwari
Copy link

nidhigwari commented Jul 21, 2022

We use Clamav for scanning files for our application. We use S3, SQS, clamAV integration. It seems to have stopped working suddenly.
Adding: clam av version: ClamAV 0.103.6

@andreaswittig
Copy link
Contributor

Sorry, we are not providing support for this free/open-source project. Check out our solution bucketAV with professional support included: https://bucketav.com

@andreaswittig andreaswittig closed this as not planned Won't fix, can't repro, duplicate, stale Jul 21, 2022
@rmerrellgr
Copy link

Oddly enough, we had this happen to us yesterday as well. Any instance of the s3-virusscan that we had running on a t3.micro all suddenly died at the same. Log inspection lead us to find that they all ran out of RAM and OOM killer killed clamd, but when systemd tried restarting it, it couldn't. I don't know enough about how clamd works when it phones home to get signature updates, but one theory is that it pulled an update yesterday that maxed out all the ram on the smaller instances. We fixed it just by launching new instances.

@andreaswittig andreaswittig reopened this Jul 21, 2022
@andreaswittig
Copy link
Contributor

@nidhigwari Sorry, I was to fast and harsh.

@rmerrellgr Thanks for providing more context.

@nidhigwari
Copy link
Author

Thanks @rmerrellgr!
We have launched new instances, but still the service is not working. We also see freshclam related error "WARNING: FreshClam previously received error code 429 or 403 from the ClamAV Content Delivery Network (CDN).
This means that you have been rate limited or blocked by the CDN."

@andreaswittig
Copy link
Contributor

andreaswittig commented Jul 21, 2022

@nidhigwari ClamAV introduced very strict throttling limits. We have been running into those limits as well and are now hosting our own mirror of the malware database.

@andreaswittig
Copy link
Contributor

Oddly enough, we had this happen to us yesterday as well. Any instance of the s3-virusscan that we had running on a t3.micro all suddenly died at the same. Log inspection lead us to find that they all ran out of RAM and OOM killer killed clamd, but when systemd tried restarting it, it couldn't. I don't know enough about how clamd works when it phones home to get signature updates, but one theory is that it pulled an update yesterday that maxed out all the ram on the smaller instances. We fixed it just by launching new instances.

Is it possible that you tried to scan a "large" S3 object? Did you check the dead-letter queue?

@michaelwittig
Copy link
Contributor

@rmerrellgr what is the value of the SwapSize parameter?

@awsnicolemurray
Copy link

@nidhigwari ClamAV introduced very strict throttling limits. We have been running into those limits as well and are now hosting our own mirror of the malware database.

What is the recommendation? How does the customer determine if the issue is because of throttling? Currently no files are being scanned and the issue impacts dev, stag, and prod environments. All appear to have been impacted on the same day.

Please help us understand what changes were made since July 15th so we can determine the best course of action for troubleshooting.

@rmerrellgr
Copy link

rmerrellgr commented Jul 21, 2022

@andreaswittig Nope, no large file scans (no scans at all for some time before the crash, actually). But as I suspected, this is what we find in the logs

  • Jul 20 11:41:09 clamd[27447]: Database correctly reloaded (8622752 signatures)
  • Jul 20 11:41:11 clamd[27447]: Activating the newly loaded database...
  • Jul 20 11:41:13 kernel: amazon-cloudwat invoked oom-killer:
  • (Followed by 100+ lines of OOM killer output, which ultimately lead to clamd being killed)
  • Jul 20 11:41:13 kernel: Killed process 27447 (clamd)
  • Jul 20 11:41:13 systemd: Unit [email protected] entered failed state.
  • Jul 20 11:41:14 systemd: [email protected] holdoff time over, scheduling restart.
  • Jul 20 11:48:14 systemd: [email protected] start operation timed out. Terminating.

At which point it just loops forever in this state of trying to start back up, but it can't. At this point, I just decided it would be easier to just launch replacement instances and be done with it.

I think it's safe to say that this isn't a Widdix problem. We have production level instances running on larger instances and they did not suffer the same fate as these. I just found it peculiar that our dev servers died unexpectedly and then someone else reported that there's did as well. I do not believe any action needs to be taken on your part, however.

And to answer your other question, these t3.micro instances have the SwapSize set to 2 in the CF config.

@andreaswittig
Copy link
Contributor

@awsnicolemurray I'd recommend to check the logs.

@andreaswittig
Copy link
Contributor

@rmerrellgr Interesting, haven't observed something like this before.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants