Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[enterprise-4.7] Document how to configure kdump on CoreOS #28657

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions _topic_map.yml
Original file line number Diff line number Diff line change
Expand Up @@ -388,6 +388,9 @@ Topics:
File: verifying-node-health
- Name: Troubleshooting CRI-O container runtime issues
File: troubleshooting-crio-issues
- Name: Troubleshooting operating system issues
File: troubleshooting-operating-system-issues
Distros: openshift-enterprise,openshift-origin
- Name: Troubleshooting Operator issues
File: troubleshooting-operator-issues
- Name: Investigating pod issues
Expand Down
105 changes: 105 additions & 0 deletions modules/investigating-kernel-crashes.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
// Module included in the following assemblies:
//
// * support/troubleshooting/troubleshooting-operating-system-issues.adoc

[id="investigating-kernel-crashes"]
= Investigating kernel crashes

== Enabling `kdump`

The `kdump` service, included in `kexec-tools`, provides a crash-dumping mechanism. You can use this service to save the contents of the system’s memory for later analysis.

{op-system} ships with `kexec-tools`, but manual configuration is required to enable `kdump`.

.Procedure

Perform the following steps to enable `kdump` on {op-system}.

. To reserve memory for the crash kernel during the first kernel booting, provide kernel arguments by entering the following command:
+
[source, terminal]
----
# rpm-ostree kargs --append='crashkernel=256M'
----

. By default, the path in which the vmcore will be saved is `/var/crash`. It is also possible to write the dump over the network or to some other location on the local system by editing `/etc/kdump.conf`. For example, assuming `/var/usrlocal/cores` exists, enter the following command to edit `/etc/kdump.conf` to save the vmcore to `/var/usrlocal/cores`:
+
[source, terminal]
----
# sed -i "s/^path.*/path \/var\/usrlocal\/cores/" /etc/kdump.conf
----
+
For additional information, see `kdump.conf`, a manual page for the `/etc/kdump.conf` configuration file containing the full documentation of available options, and note the comments in `/etc/kdump.conf` and `/etc/sysconfig/kdump`.
ifdef::openshift-enterprise[]
Also refer to the link:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/system_design_guide/installing-and-configuring-kdump_system-design-guide#configuring-the-kdump-target_configuring-kdump-on-the-command-line[RHEL `kdump` documentation] for further information on configuring the dump target.
endif::[]
+

ifdef::openshift-enterprise[]
. Because `kdump` has trouble finding the correct bootimage location on {op-system}, the `KDUMP_BOOTDIR` variable must be manually set in `/etc/kdump.conf`. You can use `/proc/cmdline` to figure out the `ostree` boot location. For example:
+
[source, terminal]
----
# BOOT_LOC=/boot$(cat /proc/cmdline | egrep -o "/ostree/.*/vmlinuz" | sed -e "s|/vmlinuz||g")
# sed -i "s|^#KDUMP_BOOTDIR=\"/boot\"|KDUMP_BOOTDIR=\"${BOOT_LOC}\"|" /etc/sysconfig/kdump
----

. Configure `/etc/sysconfig/kdump` to not use the default link:https://man7.org/linux/man-pages/man2/kexec_file_load.2.html[file-based `kexec` syscall] for loading the crash kernel.
+
[source, terminal]
----
# sed -i "s|^KEXEC_ARGS=\"-s\"|KEXEC_ARGS=\"\"|" /etc/sysconfig/kdump
----
endif::[]

. Enable the `kdump` systemd service.
+
[source, terminal]
----
# systemctl enable kdump.service
----

. Reboot your system.
+
[source, terminal]
----
# systemctl reboot
----

. Ensure that `kdump` has loaded a crash kernel by checking that the `kdump.service` has started and exited successfully and that `cat /sys/kernel/kexec_crash_loaded` prints `1`.

== Enabling kdump on day-1
The `kdump` service is intended to be enabled per-node to debug kernel problems. It is not recommended to enable `kdump` on all of your nodes in the cluster. Although machine-specific machine configs are not yet supported, you can perform the previous steps through a `systemd` unit in a `MachineConfig` object on day-1 and have kdump enabled on all nodes in the cluster. You can create a machine config object and inject that object into the set of manifest files used by Ignition during cluster setup. See "Customizing nodes" in the _Installing -> Installation configuration_ section for more information and examples on how to use Ignition configs.

== Testing the kdump configuration

ifdef::openshift-enterprise[]
See the link:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/system_design_guide/installing-and-configuring-kdump_system-design-guide#testing-the-kdump-configuration_installing-and-configuring-kdump[Testing the kdump configuration] section in the {op-system-base} documentation for `kdump`.
endif::[]

ifdef::openshift-origin[]
See the link:https://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes#Step_2:_Capturing_the_Dump[Capturing the Dump] section in the {op-system-base} documentation for `kdump`.
endif::[]

== Analyzing a core dump

ifdef::openshift-enterprise[]
See the link:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/system_design_guide/installing-and-configuring-kdump_system-design-guide#analyzing-a-core-dump_installing-and-configuring-kdump[Analyzing a core dump] section in the {op-system-base} documentation for `kdump`.
endif::[]

ifdef::openshift-origin[]
See the link:https://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes#Step_3:_Dump_Analysis[Dump Analysis] section in the {op-system-base} documentation for `kdump`.
endif::[]

.Additional resources
ifdef::openshift-origin[]
* link:https://docs.fedoraproject.org/en-US/fedora-coreos/debugging-kernel-crashes/[Fedora CoreOS Docs on debugging kernel crashes]
* link:https://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes[Setting up kdump in Fedora]
endif::[]
ifdef::openshift-enterprise[]
* link:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/system_design_guide/installing-and-configuring-kdump_system-design-guide[Setting up kdump in RHEL]
endif::[]
* link:https://www.kernel.org/doc/html/latest/admin-guide/kdump/kdump.html[Linux kernel documentation for kdump]
* kdump.conf(5) — a manual page for the `/etc/kdump.conf` configuration file containing the full documentation of available options
* kexec(8) — a manual page for `kexec`
* link:https://access.redhat.com/site/solutions/6038[Red Hat Knowledgebase article] regarding `kexec` and `kdump`.
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[id="troubleshooting-operating-system-issues"]
= Troubleshooting operating system issues
include::modules/common-attributes.adoc[]
:context: troubleshooting-operating-system-issues

toc::[]

{product-title} runs on {op-system}. You can follow these procedures to troubleshoot problems related to the operating system.

// Investigating kernel crashes
include::modules/investigating-kernel-crashes.adoc[leveloffset=+1]