Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

not able to clear alarm in etcd 3.5.3 version by using etcdctl alarm disarm command #17318

Closed
4 tasks
rahulbapumore opened this issue Jan 24, 2024 · 18 comments
Closed
4 tasks
Labels

Comments

@rahulbapumore
Copy link
Contributor

Bug report criteria

What happened?

Alarm was raised on ETCD cluster, but later DB size got reduced lesser than ETCD_BACKEND_QUOTA_BYTE, but still cant put values into DB because Nospace alarm still exist and not cleared.
When We try to clear it using etcdctl alarm disarm command it logs timeout error inside the ogs

What did you expect to happen?

Alarm should be cleared and data should be able to be inserted

How can we reproduce it (as minimally and precisely as possible)?

NA

Anything else we need to know?

No response

Etcd version (please run commands below)

ETCD version 3.5.3

Etcd configuration (command line flags or environment variables)

paste your configuration here

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

No response

@rahulbapumore
Copy link
Contributor Author

Hi @ahrtr ,
We faced this issue and blocking to insert data, Any inputs for clearing the alarm?

Thanks

@ahrtr
Copy link
Member

ahrtr commented Jan 24, 2024

Please provide detailed steps to reproduce the issue. Or the detailed steps what you did.

@ahrtr
Copy link
Member

ahrtr commented Jan 24, 2024

Please ensure you have correctly compacted & defragmented the db, and finally disalarmed.

References:

@Elbehery could you followup this ticket until it's closed?

@Elbehery
Copy link
Member

On it 🙏🏽

@rahulbapumore
Copy link
Contributor Author

Hi @Elbehery ,
We think our issue is similar to #14379 this ticket
Any workaround to resolve this issue by performing any procedure or steps?

Thanks

@ahrtr
Copy link
Member

ahrtr commented Jan 24, 2024

Please also double confirm whether you can still see this issue on latest release 3.5.11.

@rahulbapumore
Copy link
Contributor Author

rahulbapumore commented Jan 24, 2024

@Elbehery
Latest version we havent seen it in 3.5.11, but we need any work around for live deployment which is having 3.5.3.
Because even if we have db space available, due to alarm we are not able to put data in etcd

@rahulbapumore
Copy link
Contributor Author

@Elbehery ,
Also etcdctl member list does not show any alarm
also etcdctl endpoint health does not show any alarm
but etcdctl endpoint status --write-out=table shows nospace alarm in output

@Elbehery
Copy link
Member

@rahulbapumore

can you please give details how to reproduce ?

also can u paste some logs ?

@rahulbapumore
Copy link
Contributor Author

Can deleting wal folder will resolve this issue?
@Elbehery

@Elbehery
Copy link
Member

did you try to restart the etcd pod on the node which raised the alarm ?

so when the pod will start, it will re-read the snapshot and the wal, this might help

@Elbehery
Copy link
Member

@rahulbapumore ^^

@rahulbapumore
Copy link
Contributor Author

Yes I tried deleting pods but same issue

@Elbehery
Copy link
Member

can u upgrade to the latest release ??

also please describe the environment you use, and if some logs are available would be gr8

@rahulbapumore
Copy link
Contributor Author

rahulbapumore commented Jan 25, 2024

Hi @Elbehery ,
By upgrading
will alarm issue get resolved?

@rahulbapumore
Copy link
Contributor Author

We are using ETCD inside kubernetes pods controlled by statefulset.
We will try to get logs and provide you

@rahulbapumore
Copy link
Contributor Author

Hi @Elbehery , By upgrading will alarm issue get resolved?

anything on this?

@Elbehery
Copy link
Member

i really cant help without logs

did u try restarting the node that caused the alarm ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants