Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres-instance stopped working "could not write lock file "postmaster.pid": No space left on device" #275

Open
rg2609 opened this issue Jul 15, 2024 · 1 comment
Labels
question Further information is requested triaged

Comments

@rg2609
Copy link

rg2609 commented Jul 15, 2024

I have installed the Crunchydata Postgres operator using the helm chart and I'm using the NFS file system storage as PVC with 200GB data. When I run kubectl get pods, the output shows that 1 instance has stopped working out of 4.

kubectl get pods | grep -E "READY|postgres-instance"
NAME                                    READY   STATUS             RESTARTS   AGE
abc-postgres-instance1-c7ck-0           3/4     Running            0          32d

The error that I'm encountering is:

2024-07-12 21:20:21,317 INFO:  stderr=2024-07-12 21:20:21.317 UTC [2615762] FATAL:  could not write lock file "postmaster.pid": No space left on device

Upon further investigation, I found that the /pgdata/pg16_wal is using up all the space.

When I run kubectl exec -it dravoka-postgres-instance1-c7ck-0 -- bin/bash and then df -h, I get the following output:

Filesystem                                                      Size  Used Avail Use% Mounted on
overlay                                                         124G   51G   74G  41% /
tmpfs                                                            64M     0   64M   0% /dev
172.16.215.54:/export/pvc-c862217c-39c4-4be0-a8a1-1717c450b2d1  196G  196G     0 100% /pgdata
/dev/root                                                       124G   51G   74G  41% /tmp
tmpfs                                                            57G   24K   57G   1% /pgconf/tls
tmpfs                                                            57G   24K   57G   1% /etc/database-containerinfo
tmpfs                                                            57G   16K   57G   1% /etc/patroni
tmpfs                                                            57G     0   57G   0% /dev/shm
tmpfs                                                            57G   24K   57G   1% /etc/pgbackrest/conf.d
tmpfs                                                            57G   12K   57G   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs                                                            32G     0   32G   0% /proc/acpi
tmpfs                                                            32G     0   32G   0% /proc/scsi
tmpfs                                                            32G     0   32G   0% /sys/firmware

Is there a way to configure the WAL size limit and archive the old WAL files while also deleting the old archived files?

@dsessler7
Copy link
Contributor

Hey @rg2609!

When we see the WAL fill up like this, it is almost always due to backups not being run frequently enough. When a backup runs, it captures the current state of the database, allowing postgres to clear out the WAL files as they are no longer needed.

I recommend that you set up a schedule for your backups, or if you already have a schedule set, try increasing the frequency with which you run your backups so that the WAL can be regularly flushed out. Here are the docs for all things backups:

https://access.crunchydata.com/documentation/postgres-operator/latest/tutorials/backups-disaster-recovery

If you need further assistance, I recommend that you join our Discord group and ask questions there as it is a more active forum for the postgres-operator community.

@dsessler7 dsessler7 added question Further information is requested triaged labels Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested triaged
Projects
None yet
Development

No branches or pull requests

2 participants