From 45160d5ec01d84f50c31fb63de7f49da922fa241 Mon Sep 17 00:00:00 2001 From: Kelvin Fan Date: Tue, 15 Dec 2020 23:24:31 -0500 Subject: [PATCH] Document how to configure kdump on CoreOS RHCOS 4.7 includes `kexec-tools` (required for `kdump`) so investigating kernel crashes through `kdump` is now supported. --- _topic_map.yml | 3 + modules/investigating-kernel-crashes.adoc | 105 ++++++++++++++++++ ...oubleshooting-operating-system-issues.adoc | 11 ++ 3 files changed, 119 insertions(+) create mode 100644 modules/investigating-kernel-crashes.adoc create mode 100644 support/troubleshooting/troubleshooting-operating-system-issues.adoc diff --git a/_topic_map.yml b/_topic_map.yml index 7cc21152f820..25642d43beac 100644 --- a/_topic_map.yml +++ b/_topic_map.yml @@ -388,6 +388,9 @@ Topics: File: verifying-node-health - Name: Troubleshooting CRI-O container runtime issues File: troubleshooting-crio-issues + - Name: Troubleshooting operating system issues + File: troubleshooting-operating-system-issues + Distros: openshift-enterprise,openshift-origin - Name: Troubleshooting Operator issues File: troubleshooting-operator-issues - Name: Investigating pod issues diff --git a/modules/investigating-kernel-crashes.adoc b/modules/investigating-kernel-crashes.adoc new file mode 100644 index 000000000000..2d60ad184f44 --- /dev/null +++ b/modules/investigating-kernel-crashes.adoc @@ -0,0 +1,105 @@ +// Module included in the following assemblies: +// +// * support/troubleshooting/troubleshooting-operating-system-issues.adoc + +[id="investigating-kernel-crashes"] += Investigating kernel crashes + +== Enabling `kdump` + +The `kdump` service, included in `kexec-tools`, provides a crash-dumping mechanism. You can use this service to save the contents of the system’s memory for later analysis. + +{op-system} ships with `kexec-tools`, but manual configuration is required to enable `kdump`. + +.Procedure + +Perform the following steps to enable `kdump` on {op-system}. + +. To reserve memory for the crash kernel during the first kernel booting, provide kernel arguments by entering the following command: ++ +[source, terminal] +---- +# rpm-ostree kargs --append='crashkernel=256M' +---- + +. By default, the path in which the vmcore will be saved is `/var/crash`. It is also possible to write the dump over the network or to some other location on the local system by editing `/etc/kdump.conf`. For example, assuming `/var/usrlocal/cores` exists, enter the following command to edit `/etc/kdump.conf` to save the vmcore to `/var/usrlocal/cores`: ++ +[source, terminal] +---- +# sed -i "s/^path.*/path \/var\/usrlocal\/cores/" /etc/kdump.conf +---- ++ +For additional information, see `kdump.conf`, a manual page for the `/etc/kdump.conf` configuration file containing the full documentation of available options, and note the comments in `/etc/kdump.conf` and `/etc/sysconfig/kdump`. +ifdef::openshift-enterprise[] +Also refer to the link:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/system_design_guide/installing-and-configuring-kdump_system-design-guide#configuring-the-kdump-target_configuring-kdump-on-the-command-line[RHEL `kdump` documentation] for further information on configuring the dump target. +endif::[] ++ + +ifdef::openshift-enterprise[] +. Because `kdump` has trouble finding the correct bootimage location on {op-system}, the `KDUMP_BOOTDIR` variable must be manually set in `/etc/kdump.conf`. You can use `/proc/cmdline` to figure out the `ostree` boot location. For example: ++ +[source, terminal] +---- +# BOOT_LOC=/boot$(cat /proc/cmdline | egrep -o "/ostree/.*/vmlinuz" | sed -e "s|/vmlinuz||g") +# sed -i "s|^#KDUMP_BOOTDIR=\"/boot\"|KDUMP_BOOTDIR=\"${BOOT_LOC}\"|" /etc/sysconfig/kdump +---- + +. Configure `/etc/sysconfig/kdump` to not use the default link:https://man7.org/linux/man-pages/man2/kexec_file_load.2.html[file-based `kexec` syscall] for loading the crash kernel. ++ +[source, terminal] +---- +# sed -i "s|^KEXEC_ARGS=\"-s\"|KEXEC_ARGS=\"\"|" /etc/sysconfig/kdump +---- +endif::[] + +. Enable the `kdump` systemd service. ++ +[source, terminal] +---- +# systemctl enable kdump.service +---- + +. Reboot your system. ++ +[source, terminal] +---- +# systemctl reboot +---- + +. Ensure that `kdump` has loaded a crash kernel by checking that the `kdump.service` has started and exited successfully and that `cat /sys/kernel/kexec_crash_loaded` prints `1`. + +== Enabling kdump on day-1 +The `kdump` service is intended to be enabled per-node to debug kernel problems. It is not recommended to enable `kdump` on all of your nodes in the cluster. Although machine-specific machine configs are not yet supported, you can perform the previous steps through a `systemd` unit in a `MachineConfig` object on day-1 and have kdump enabled on all nodes in the cluster. You can create a machine config object and inject that object into the set of manifest files used by Ignition during cluster setup. See "Customizing nodes" in the _Installing -> Installation configuration_ section for more information and examples on how to use Ignition configs. + +== Testing the kdump configuration + +ifdef::openshift-enterprise[] +See the link:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/system_design_guide/installing-and-configuring-kdump_system-design-guide#testing-the-kdump-configuration_installing-and-configuring-kdump[Testing the kdump configuration] section in the {op-system-base} documentation for `kdump`. +endif::[] + +ifdef::openshift-origin[] +See the link:https://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes#Step_2:_Capturing_the_Dump[Capturing the Dump] section in the {op-system-base} documentation for `kdump`. +endif::[] + +== Analyzing a core dump + +ifdef::openshift-enterprise[] +See the link:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/system_design_guide/installing-and-configuring-kdump_system-design-guide#analyzing-a-core-dump_installing-and-configuring-kdump[Analyzing a core dump] section in the {op-system-base} documentation for `kdump`. +endif::[] + +ifdef::openshift-origin[] +See the link:https://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes#Step_3:_Dump_Analysis[Dump Analysis] section in the {op-system-base} documentation for `kdump`. +endif::[] + +.Additional resources +ifdef::openshift-origin[] +* link:https://docs.fedoraproject.org/en-US/fedora-coreos/debugging-kernel-crashes/[Fedora CoreOS Docs on debugging kernel crashes] +* link:https://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes[Setting up kdump in Fedora] +endif::[] +ifdef::openshift-enterprise[] +* link:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/system_design_guide/installing-and-configuring-kdump_system-design-guide[Setting up kdump in RHEL] +endif::[] +* link:https://www.kernel.org/doc/html/latest/admin-guide/kdump/kdump.html[Linux kernel documentation for kdump] +* kdump.conf(5) — a manual page for the `/etc/kdump.conf` configuration file containing the full documentation of available options +* kexec(8) — a manual page for `kexec` +* link:https://access.redhat.com/site/solutions/6038[Red Hat Knowledgebase article] regarding `kexec` and `kdump`. diff --git a/support/troubleshooting/troubleshooting-operating-system-issues.adoc b/support/troubleshooting/troubleshooting-operating-system-issues.adoc new file mode 100644 index 000000000000..f647ff2d993e --- /dev/null +++ b/support/troubleshooting/troubleshooting-operating-system-issues.adoc @@ -0,0 +1,11 @@ +[id="troubleshooting-operating-system-issues"] += Troubleshooting operating system issues +include::modules/common-attributes.adoc[] +:context: troubleshooting-operating-system-issues + +toc::[] + +{product-title} runs on {op-system}. You can follow these procedures to troubleshoot problems related to the operating system. + +// Investigating kernel crashes +include::modules/investigating-kernel-crashes.adoc[leveloffset=+1]