Skip to content

Commit

Permalink
Merge pull request #28657 from openshift-cherrypick-robot/cherry-pick…
Browse files Browse the repository at this point in the history
…-28164-to-enterprise-4.7

[enterprise-4.7] Document how to configure kdump on CoreOS
  • Loading branch information
Bob Furu authored Jan 18, 2021
2 parents 032d6ef + 45160d5 commit 60e548a
Show file tree
Hide file tree
Showing 3 changed files with 119 additions and 0 deletions.
3 changes: 3 additions & 0 deletions _topic_map.yml
Original file line number Diff line number Diff line change
Expand Up @@ -388,6 +388,9 @@ Topics:
File: verifying-node-health
- Name: Troubleshooting CRI-O container runtime issues
File: troubleshooting-crio-issues
- Name: Troubleshooting operating system issues
File: troubleshooting-operating-system-issues
Distros: openshift-enterprise,openshift-origin
- Name: Troubleshooting Operator issues
File: troubleshooting-operator-issues
- Name: Investigating pod issues
Expand Down
105 changes: 105 additions & 0 deletions modules/investigating-kernel-crashes.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
// Module included in the following assemblies:
//
// * support/troubleshooting/troubleshooting-operating-system-issues.adoc

[id="investigating-kernel-crashes"]
= Investigating kernel crashes

== Enabling `kdump`

The `kdump` service, included in `kexec-tools`, provides a crash-dumping mechanism. You can use this service to save the contents of the system’s memory for later analysis.

{op-system} ships with `kexec-tools`, but manual configuration is required to enable `kdump`.

.Procedure

Perform the following steps to enable `kdump` on {op-system}.

. To reserve memory for the crash kernel during the first kernel booting, provide kernel arguments by entering the following command:
+
[source, terminal]
----
# rpm-ostree kargs --append='crashkernel=256M'
----

. By default, the path in which the vmcore will be saved is `/var/crash`. It is also possible to write the dump over the network or to some other location on the local system by editing `/etc/kdump.conf`. For example, assuming `/var/usrlocal/cores` exists, enter the following command to edit `/etc/kdump.conf` to save the vmcore to `/var/usrlocal/cores`:
+
[source, terminal]
----
# sed -i "s/^path.*/path \/var\/usrlocal\/cores/" /etc/kdump.conf
----
+
For additional information, see `kdump.conf`, a manual page for the `/etc/kdump.conf` configuration file containing the full documentation of available options, and note the comments in `/etc/kdump.conf` and `/etc/sysconfig/kdump`.
ifdef::openshift-enterprise[]
Also refer to the link:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/system_design_guide/installing-and-configuring-kdump_system-design-guide#configuring-the-kdump-target_configuring-kdump-on-the-command-line[RHEL `kdump` documentation] for further information on configuring the dump target.
endif::[]
+

ifdef::openshift-enterprise[]
. Because `kdump` has trouble finding the correct bootimage location on {op-system}, the `KDUMP_BOOTDIR` variable must be manually set in `/etc/kdump.conf`. You can use `/proc/cmdline` to figure out the `ostree` boot location. For example:
+
[source, terminal]
----
# BOOT_LOC=/boot$(cat /proc/cmdline | egrep -o "/ostree/.*/vmlinuz" | sed -e "s|/vmlinuz||g")
# sed -i "s|^#KDUMP_BOOTDIR=\"/boot\"|KDUMP_BOOTDIR=\"${BOOT_LOC}\"|" /etc/sysconfig/kdump
----

. Configure `/etc/sysconfig/kdump` to not use the default link:https://man7.org/linux/man-pages/man2/kexec_file_load.2.html[file-based `kexec` syscall] for loading the crash kernel.
+
[source, terminal]
----
# sed -i "s|^KEXEC_ARGS=\"-s\"|KEXEC_ARGS=\"\"|" /etc/sysconfig/kdump
----
endif::[]

. Enable the `kdump` systemd service.
+
[source, terminal]
----
# systemctl enable kdump.service
----

. Reboot your system.
+
[source, terminal]
----
# systemctl reboot
----

. Ensure that `kdump` has loaded a crash kernel by checking that the `kdump.service` has started and exited successfully and that `cat /sys/kernel/kexec_crash_loaded` prints `1`.

== Enabling kdump on day-1
The `kdump` service is intended to be enabled per-node to debug kernel problems. It is not recommended to enable `kdump` on all of your nodes in the cluster. Although machine-specific machine configs are not yet supported, you can perform the previous steps through a `systemd` unit in a `MachineConfig` object on day-1 and have kdump enabled on all nodes in the cluster. You can create a machine config object and inject that object into the set of manifest files used by Ignition during cluster setup. See "Customizing nodes" in the _Installing -> Installation configuration_ section for more information and examples on how to use Ignition configs.

== Testing the kdump configuration

ifdef::openshift-enterprise[]
See the link:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/system_design_guide/installing-and-configuring-kdump_system-design-guide#testing-the-kdump-configuration_installing-and-configuring-kdump[Testing the kdump configuration] section in the {op-system-base} documentation for `kdump`.
endif::[]

ifdef::openshift-origin[]
See the link:https://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes#Step_2:_Capturing_the_Dump[Capturing the Dump] section in the {op-system-base} documentation for `kdump`.
endif::[]

== Analyzing a core dump

ifdef::openshift-enterprise[]
See the link:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/system_design_guide/installing-and-configuring-kdump_system-design-guide#analyzing-a-core-dump_installing-and-configuring-kdump[Analyzing a core dump] section in the {op-system-base} documentation for `kdump`.
endif::[]

ifdef::openshift-origin[]
See the link:https://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes#Step_3:_Dump_Analysis[Dump Analysis] section in the {op-system-base} documentation for `kdump`.
endif::[]

.Additional resources
ifdef::openshift-origin[]
* link:https://docs.fedoraproject.org/en-US/fedora-coreos/debugging-kernel-crashes/[Fedora CoreOS Docs on debugging kernel crashes]
* link:https://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes[Setting up kdump in Fedora]
endif::[]
ifdef::openshift-enterprise[]
* link:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/system_design_guide/installing-and-configuring-kdump_system-design-guide[Setting up kdump in RHEL]
endif::[]
* link:https://www.kernel.org/doc/html/latest/admin-guide/kdump/kdump.html[Linux kernel documentation for kdump]
* kdump.conf(5) — a manual page for the `/etc/kdump.conf` configuration file containing the full documentation of available options
* kexec(8) — a manual page for `kexec`
* link:https://access.redhat.com/site/solutions/6038[Red Hat Knowledgebase article] regarding `kexec` and `kdump`.
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[id="troubleshooting-operating-system-issues"]
= Troubleshooting operating system issues
include::modules/common-attributes.adoc[]
:context: troubleshooting-operating-system-issues

toc::[]

{product-title} runs on {op-system}. You can follow these procedures to troubleshoot problems related to the operating system.

// Investigating kernel crashes
include::modules/investigating-kernel-crashes.adoc[leveloffset=+1]

0 comments on commit 60e548a

Please sign in to comment.