Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Agent-based Installer] PostgreSQL database fails to initialize properly in OKD 4.18 installation #2086

Closed
devopsdonker opened this issue Jan 13, 2025 · 2 comments

Comments

@devopsdonker
Copy link

Description

When attempting to use the agent-based installer for OKD 4.18, the PostgreSQL database service (assisted-service-db.service) fails to initialize properly, preventing the installation from proceeding. This should work out of the box but currently requires manual intervention.

Environment

  • OKD Version: 4.18
  • Installation Method: Agent-based installer
  • Network Configuration: 172.10.10.0/24 (machine network)
  • Boot: USB created with dd command
  • Storage: NVME drive (1.8T)
  • Service Image: quay.io/okd/scos-content@sha256:7f597fb44334b5f5b6296321934df6476db527ba24d0b0e679f91f3ad771ac31

Current Behavior

The assisted-service-db.service fails to start with the following sequence:

  1. Initial failure due to lock file directory issues
  2. Service attempts to restart but fails repeatedly
  3. PostgreSQL starts briefly but then immediately shuts down

Logs

Jan 13 10:31:33 master1.donker.vip podman[344792]: waiting for server to start.... done
Jan 13 10:31:33 master1.donker.vip podman[344792]: server started
Jan 13 10:31:34 master1.donker.vip postgres-container[344813]: waiting for server to shut down.... done
Jan 13 10:31:34 master1.donker.vip postgres-container[344813]: server stopped

Earlier attempts showed:

FATAL: could not create lock file "/var/run/postgresql/.s.PGSQL.5432.lock": No such file or directory

Expected Behavior

The PostgreSQL database service should:

  1. Initialize correctly on first boot
  2. Create necessary directories and files automatically
  3. Start and remain running to support the installation process

Technical Analysis

The issues appear to be:

  1. Lock file directory (/var/run/postgresql) is not properly created/mounted in the container
  2. Service dependencies may not be properly ordered
  3. Potential permission issues with the PostgreSQL data directory
  4. Possible race condition in service startup sequence

Working Configuration Attempted

networkType: OVNKubernetes
clusterNetwork:
- cidr: 10.128.0.0/14
  hostPrefix: 23
serviceNetwork:
- 172.30.0.0/16
machineNetwork:
- cidr: 172.10.10.0/24

Environment Variables Present

POSTGRESQL_DATABASE=installer
POSTGRESQL_PASSWORD=admin
POSTGRESQL_USER=admin

Impact

This issue prevents new OKD cluster installations using the agent-based installer, requiring manual intervention and troubleshooting which should not be necessary for a standard installation process.

Suggested Fix

Consider implementing one or more of the following:

  1. Ensure the container runtime properly creates and sets permissions for /var/run/postgresql
  2. Add pre-start checks in the systemd service to verify directory existence and permissions
  3. Include proper volume mounts in the default container configuration
  4. Review service startup sequence to prevent race conditions

Additional Notes

Multiple attempts to resolve this through systemd service modifications and manual directory creation have shown that this is a fundamental issue with the service configuration rather than a local environment problem.

@titou10titou10
Copy link

@devopsdonker please delete this issue, you created 2 similar issues, ie #2087

@GingerGeek
Copy link
Member

Thank you for your issue report. This is a duplicate of #2071

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants