Skip to content

Commit

Permalink
Fixes to onboard x86_64 servers in the baremetal qe infra (#36487)
Browse files Browse the repository at this point in the history
* Support Dell IPMI power commands

On Dell servers, `ipmi power (off|on|reset)` returns errors when the host is in a state that doesn't allow the requested transition. Enforcing two commands (on + off) instead of reset, and ignoring any power off errors to ignore those validation errors.

* Set the efi boot order after installing RHCOS in UPI/UEFI/PXE scenarios

Some servers' firmware push any new detected boot options to the tail of the boot order.
When other boot options are present and bootable, such a server will boot from them instead of the new one.
As a (temporary?) workaround, we manually add the boot option.
NOTE: it's assumed that old OSes boot options are removed from the boot options list during the wipe operations.
 xrefs: https://bugzilla.redhat.com/show_bug.cgi?id=1997805
        coreos/fedora-coreos-tracker#946
        coreos/fedora-coreos-tracker#947
  • Loading branch information
aleskandro authored Feb 23, 2023
1 parent 22eeef2 commit d091404
Show file tree
Hide file tree
Showing 4 changed files with 41 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ function prepare_bmc() {
chassis bootparam set bootflag force_pxe options=PEF,watchdog,reset,power
ipmitool -I lanplus -H "$bmc_address" \
-U "$bmc_user" -P "$bmc_pass" \
power off
power off || echo "Already off"
}

function update_image_registry() {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,21 @@ function reset_host() {
chassis bootparam set bootflag force_pxe options=PEF,watchdog,reset,power
ipmitool -I lanplus -H "$bmc_address" \
-U "$bmc_user" -P "$bmc_pass" \
power reset
power off || echo "Already off"
# If the host is not already powered off, the power on command can fail while the host is still powering off.
# Let's retry the power on command multiple times to make sure the command is received in the correct state.
for i in {1..10} max; do
if [ "$i" == "max" ]; then
echo "Failed to reset $bmc_address"
return 1
fi
ipmitool -I lanplus -H "$bmc_address" \
-U "$bmc_user" -P "$bmc_pass" \
power on && break
echo "Failed to power on $bmc_address, retrying..."
sleep 5
done

if ! wait_for_power_down "$bmc_address" "$bmc_user" "$bmc_pass" "${name}"; then
echo "$bmc_address" >> /tmp/failed
fi
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,17 @@ systemd:
--delete-karg console=ttyS0,115200n8 $(join_by_semicolon "${console_kargs}" "--append-karg console=" "") \
--ignition-url ${base_url%%*(/)}/${role}.ign \
--insecure-ignition --copy-network
# Some servers' firmware push any new detected boot options to the tail of the boot order.
# When other boot options are present and bootable, such a server will boot from them instead of the new one.
# As a (temporary?) workaround, we manually add the boot option.
# NOTE: it's assumed that old OSes boot options are removed from the boot options list during the wipe operations.
# xrefs: https://bugzilla.redhat.com/show_bug.cgi?id=1997805
# https://github.com/coreos/fedora-coreos-tracker/issues/946
# https://github.com/coreos/fedora-coreos-tracker/issues/947
ExecStart=/usr/bin/bash -c ' \
ARCH=\$(uname -m | sed "s/x86_64/x64/;s/aarch64/aa64/"); \
/usr/sbin/efibootmgr -c -d "$root_device" -p 2 -c -L "Red Hat CoreOS" -l "\\\\EFI\\\\redhat\\\\shim\$ARCH.efi" \
'
ExecStart=/usr/bin/systemctl --no-block reboot
StandardOutput=kmsg+console
StandardError=kmsg+console
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,20 @@ function reset_host() {
chassis bootparam set bootflag force_pxe options=PEF,watchdog,reset,power
ipmitool -I lanplus -H "$bmc_address" \
-U "$bmc_user" -P "$bmc_pass" \
power reset
power off || echo "Already off"
# If the host is not already powered off, the power on command can fail while the host is still powering off.
# Let's retry the power on command multiple times to make sure the command is received in the correct state.
for i in {1..10} max; do
if [ "$i" == "max" ]; then
echo "Failed to reset $bmc_address"
return 1
fi
ipmitool -I lanplus -H "$bmc_address" \
-U "$bmc_user" -P "$bmc_pass" \
power on && break
echo "Failed to power on $bmc_address, retrying..."
sleep 5
done
}

function approve_csrs() {
Expand Down

0 comments on commit d091404

Please sign in to comment.