New system not able to unlock after running role #150

aheath1992 · 2024-02-01T21:02:55Z

New system is unable to unlock after running the nbde_client role, after running the role get an all good from Ansible but upon reboot the system stops at the Luks encryption screen.

    - name: Import nbde_client role
      ansible.builtin.import_role:
        name: linux-system-roles.nbde_client
      vars:
        nbde_client_bindings:
          - device: "{{ root_disk | d('/dev/vda2') }}"
            encryption_password: "{{ current_password }}"
            servers: "{{ tang_servers }}"

The text was updated successfully, but these errors were encountered:

richm · 2024-02-01T21:06:33Z

What version of the role are you using?
What version of ansible are you using?
What is the platform/version of your control node?
What is the platform/version of your managed node?
@sergio-correia what other debugging information do we need?

aheath1992 · 2024-02-01T22:15:03Z

What version of the role are you using - 1.71.1
What version of ansible are you using - 2.16
What is the platform/version of your control node - fedora 39
What is the platform/version of your managed node - RHEL 8.8

sergio-correia · 2024-02-02T09:21:34Z

Hello. Here are some more info that may be helpful to debug this:

what is the version of clevis in the managed node?
what are all the encrypted devices in the managed node? is it that /dev/vda2 or do we have others?
please check whether the initrd from the managed node includes the clevis machinery to perform the unlocking in early boot (if that is the case); something like lsinitrd | grep clevis can help here
also, check whether network is enabled for early boot, which it will need in order to access the tang servers; the role includes "rd.neednet=1" to indicate this. It will likely be included in the initrd in a file named etc/cmdline.d/01-default.conf. Perhaps something like this could help to verify this: lsinitrd /boot/initramfs-$(uname -r).img etc/cmdline.d/01-default.conf
if you have access to the tang server, also please check whether there are any requests coming from the client (clevis)
check also whether clevis-luks-askpass.path unit is enabled: systemctl status clevis-luks-askpass.path; it will be used if we are going to decrypt a disk in late boot phase
check also the clevis configuration for the specified device; e.g.: clevis luks list -d /dev/vda2

@richm: I wonder if it makes sense to have some "action"/"state" to collect some of these information from the managed hosts, to help troubleshooting such issues?

aheath1992 · 2024-02-02T13:55:16Z

what is the version of clevis in the managed node? - clevis-15-15.el8.x86_64
what are all the encrypted devices in the managed node? is it that /dev/vda2 or do we have others? - just the root device in this case /dev/vda2
please check whether the initrd from the managed node includes the clevis machinery to perform the unlocking in early boot (if that is the case); something like lsinitrd | grep clevis can help here

lsinitrd | grep clevis
clevis
clevis-pin-null
clevis-pin-sss
clevis-pin-tang
clevis-pin-tpm2
lrwxrwxrwx   1 root     root           48 Jan 20  2023 etc/systemd/system/cryptsetup.target.wants/clevis-luks-askpass.path -> /usr/lib/systemd/system/clevis-luks-askpass.path
-rwxr-xr-x   1 root     root         1679 Jan 20  2023 usr/bin/clevis
-rwxr-xr-x   1 root     root         1654 Oct 28  2020 usr/bin/clevis-decrypt
-rwxr-xr-x   1 root     root         1148 Jan 20  2023 usr/bin/clevis-decrypt-null
-rwxr-xr-x   1 root     root        25296 Jan 20  2023 usr/bin/clevis-decrypt-sss
-rwxr-xr-x   1 root     root         3560 Jan 20  2023 usr/bin/clevis-decrypt-tang
-rwxr-xr-x   1 root     root         5121 Oct 28  2020 usr/bin/clevis-decrypt-tpm2
-rw-r--r--   1 root     root        32885 Jan 20  2023 usr/bin/clevis-luks-common-functions
-rwxr-xr-x   1 root     root         2115 Oct 28  2020 usr/bin/clevis-luks-list
-rwxr-xr-x   1 root     root         2466 Jan 20  2023 usr/libexec/clevis-luks-askpass
-rw-r--r--   1 root     root          302 Oct 28  2020 usr/lib/systemd/system/clevis-luks-askpass.path
-rw-r--r--   1 root     root          190 Jan 20  2023 usr/lib/systemd/system/clevis-luks-askpass.service

lsinitrd /boot/initramfs-$(uname -r).img etc/cmdline.d/01-default.conf
 rd.neednet=1

systemctl status clevis-luks-askpass.path
● clevis-luks-askpass.path - Forward Password Requests to Clevis Directory Watch
   Loaded: loaded (/usr/lib/systemd/system/clevis-luks-askpass.path; enabled; vendor preset: enabled)
   Active: active (waiting) since Fri 2024-02-02 13:54:24 UTC; 24s ago
     Docs: man:clevis-luks-unlockers(7)

clevis luks list -d /dev/vda2
1: sss '{"t":1,"pins":{"tang":[{"url":"http://tang1"},{"url":"http://tang2"}]}}'

sergio-correia · 2024-02-02T14:36:12Z

At a first glance, it looks OK -- could you also check `journalctl , to see if any useful information shows up, please? (I forgot to mention beforehand, but feel free to redact any IP addresses, if required)

journalctl -xf -u clevis-luks-askpass.service

aheath1992 · 2024-02-02T15:25:34Z

Feb 02 15:21:20 clevis-test.ansi-001.prod.iad2.dc.redhat.com clevis-luks-askpass[11941]: Error communicating with the server http://tang1
Feb 02 15:21:20 clevis-test.ansi-001.prod.iad2.dc.redhat.com clevis-luks-askpass[11942]: Error communicating with the server http://tang2

telnet tang1 80
Trying tang1...
Connected to tang1.
Escape character is '^]'.

telnet tang2 80
Trying tang2...
Connected to tang2.
Escape character is '^]'.

richm · 2024-07-17T12:54:35Z

@sergio-correia any idea?

xeluior · 2024-07-30T20:29:00Z

I've been seeing this as well. I have found that adding the _netdev option to the relevant fstab entry allows the unlocking to proceed (tested on Rocky 8 and 9 clients, both early and late boot, and Debian 11 and 12 clients, late boot only). I have added an awk script task into my playbook after the role runs to add this option.

- name: Update fstab options
  ansible.builtin.shell: |
    name="$(awk '$2 == "{{ item.device }}" { print $1 }' /etc/crypttab | head -n 1)"
    awk -v mapper_path="/dev/mapper/$name" '{
      if ($1 == mapper_path && index($4, "_netdev") == 0) {
        $4 = $4 ",_netdev"
      }
      print
    }' /etc/fstab > /tmp/fstab
    diff -q /tmp/fstab /etc/fstab || echo changed
    mv /tmp/fstab /etc/fstab
  loop: '{{ nbde_client_bindings }}'
  register: fstab
  changed_when: '"changed" in fstab.stdout'

I believe this behavior is tied to systemd's ordering of mount units, that is, it orders fstab entries with _netdev after network.online which is necessary for clevis to work. (ref)

sergio-correia · 2024-08-02T13:42:08Z

I've been seeing this as well. I have found that adding the _netdev option to the relevant fstab entry allows the unlocking to proceed (tested on Rocky 8 and 9 clients, both early and late boot, and Debian 11 and 12 clients, late boot only). I have added an awk script task into my playbook after the role runs to add this option.
- name: Update fstab options
  ansible.builtin.shell: |
    name="$(awk '$2 == "{{ item.device }}" { print $1 }' /etc/crypttab | head -n 1)"
    awk -v mapper_path="/dev/mapper/$name" '{
      if ($1 == mapper_path && index($4, "_netdev") == 0) {
        $4 = $4 ",_netdev"
      }
      print
    }' /etc/fstab > /tmp/fstab
    diff -q /tmp/fstab /etc/fstab || echo changed
    mv /tmp/fstab /etc/fstab
  loop: '{{ nbde_client_bindings }}'
  register: fstab
  changed_when: '"changed" in fstab.stdout'
I believe this behavior is tied to systemd's ordering of mount units, that is, it orders fstab entries with _netdev after network.online which is necessary for clevis to work. (ref)

Yeah, this is likely in the right direction.

We may need to have _netdev in crypttab, to mark the device as requiring network, and to prevent a dependency loop, we also need to add _netdev to fstab as well, if the device is specified there for a mount point. Additionally, we may also have to enable the remote-cryptsetup.target unit.

xeluior · 2024-08-20T17:29:11Z

I have some more information that should probably be considered here from doing some testing with this role. I did have to add the _netdev option in both /etc/fstab and /etc/crypttab for automatic unlock. This works fine, however, on SystemD versions < 245, the crypttab generator creates a weird ordering issue with the dev-mapper-{name}.device unit that will hang shutdown indefinitely. This can be fixed by adding the x-systemd.requires=systemd-cryptsetup@{name}.service option to the appropriate device in /etc/fstab as well. I have an Ansible-native solution in the playbook I used to deploy this which I could turn into a PR, but it requires several new options per device in nbde_client_bindings so that it can create the appropriate crypttab and fstab entries.

EDIT: the SystemD issue mentioned systemd/systemd#8472

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New system not able to unlock after running role #150

New system not able to unlock after running role #150

aheath1992 commented Feb 1, 2024

richm commented Feb 1, 2024

aheath1992 commented Feb 1, 2024

sergio-correia commented Feb 2, 2024

aheath1992 commented Feb 2, 2024 •

edited

Loading

sergio-correia commented Feb 2, 2024

aheath1992 commented Feb 2, 2024 •

edited

Loading

richm commented Jul 17, 2024

xeluior commented Jul 30, 2024 •

edited

Loading

sergio-correia commented Aug 2, 2024

xeluior commented Aug 20, 2024 •

edited

Loading

New system not able to unlock after running role #150

New system not able to unlock after running role #150

Comments

aheath1992 commented Feb 1, 2024

richm commented Feb 1, 2024

aheath1992 commented Feb 1, 2024

sergio-correia commented Feb 2, 2024

aheath1992 commented Feb 2, 2024 • edited Loading

sergio-correia commented Feb 2, 2024

aheath1992 commented Feb 2, 2024 • edited Loading

richm commented Jul 17, 2024

xeluior commented Jul 30, 2024 • edited Loading

sergio-correia commented Aug 2, 2024

xeluior commented Aug 20, 2024 • edited Loading

aheath1992 commented Feb 2, 2024 •

edited

Loading

aheath1992 commented Feb 2, 2024 •

edited

Loading

xeluior commented Jul 30, 2024 •

edited

Loading

xeluior commented Aug 20, 2024 •

edited

Loading