Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Lock acquire/release for snapshot jobs emit warning and error logs #14428

Closed
spapadop opened this issue Jun 18, 2024 · 3 comments
Closed
Labels
bug Something isn't working Storage:Snapshots

Comments

@spapadop
Copy link

Describe the bug

I have a simple snapshot management job running daily:

{
	"name": "daily-monit-qa-backup_v2",
	"description": "Daily snapshot policy",
	"schema_version": 19,
	"creation": {
	  "schedule": {
	    "cron": {
	      "expression": "0 12 * * *",
	      "timezone": "Europe/Zurich"
	    }
	  }
	},
	"deletion": {
	  "schedule": {
	    "cron": {
	      "expression": "0 12 * * *",
	      "timezone": "Europe/Zurich"
	    }
	  },
	  "condition": {
	    "max_age": "3d",
	    "min_count": 1
	  }
	},
	"snapshot_config": {
	  "indices": "monit_qa*",
	  "ignore_unavailable": true,
	  "repository": "s3-monitqa1-bucket",
	  "partial": true
	},
	"schedule": {
	  "interval": {
	    "start_time": 1718098008729,
	    "period": 1,
	    "unit": "Minutes"
	  }
	},
	"enabled": true,
	"last_updated_time": 1718709061300,
	"enabled_time": 1718098008729
}

Everyday it produces around 70 warning logs like:

Cannot acquire lock for snapshot management job daily-monit-qa-backup_v2

followed by 2 error logs:

Could not release lock [.opendistro-ism-config-daily-monit-qa-backup_v2-sm-policy] for daily-monit-qa-backup_v2-sm-policy.

These two error logs cause two failure notifications, if the notification channel for this is configured.

However, the snapshot has actually a "SUCCESS" status, so these logs seem rather insignificant or at least not worth further digging, as my snapshot is successful. Not sure what should happen here, I guess either "downgrade" these logs significance to "INFO" or "DEBUG", but definitely not "ERROR" as this makes the Failure notification functionality non-reliable.

Related component

Storage:Snapshots

To Reproduce

  1. Create a daily snapshot policy to an s3 bucket, like the one I specify above.
  2. Observe the WARN/ERROR logs emited when the snapshot is getting created accordingly.

Expected behavior

If the snapshot is successful, it should produce no ERROR logs.

Additional Details

Plugins
All default ones + repository-s3

Host/Environment (please complete the following information):

  • OS: AlmaLinux
  • Version 9.4

Additional context
Tested on OpenSearch v2.11.1

@gbbafna
Copy link
Collaborator

gbbafna commented Jun 27, 2024

[Storage Triage - attendees 1 2 3 4 5 6 7 8 9 10 ]

We will need this to be moved to ISM plugin . Can you please open an issue in ISM plugin repo ?

@gbbafna gbbafna closed this as completed Jun 27, 2024
@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in Storage Project Board Jun 27, 2024
@jetnet
Copy link

jetnet commented Nov 12, 2024

Any news on that? It happens in our environment after upgrading from 2.12 to 2.17.1.
Now, the warning appears after every OS restart. No new snapshots get created.
Workaround: delete the snapshot policies and create them again (just updating with the same content/config does not help).

@Pigueiras
Copy link

@jetnet opensearch-project/index-management#1199

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Storage:Snapshots
Projects
Status: ✅ Done
Development

No branches or pull requests

4 participants