Automatically add Longhorn device to the group id 6 (disk group) by default #79

PhanLe1010 · 2024-03-07T04:00:07Z

More explanation is at longhorn/longhorn#8088 (comment)

…efault More explanation is at longhorn/longhorn#8088 (comment) longhorn-8088 Signed-off-by: Phan Le <[email protected]>

PhanLe1010 · 2024-03-07T04:19:22Z

Test run https://ci.longhorn.io/job/private/job/longhorn-tests-regression/6602/

james-munson

I agree. LGTM.

markhillgit · 2024-03-07T18:52:53Z

Should we be using the symbolic name "disk" for the group instead of hard coding the number 6? I don't know if it is guaranteed by linux or posix that disk will always be the number 6 (although it's 6 in every example I have looked at so far.)

PhanLe1010 · 2024-03-07T20:03:22Z

Should we be using the symbolic name "disk" for the group instead of hard coding the number 6? I don't know if it is guaranteed by linux or posix that disk will always be the number 6 (although it's 6 in every example I have looked at so far.)

Thanks @markhillgit for the feedback! I thought about this before but have a few concerns

From the user usage (from the pod's point of view) a fixed number (6) will be more predictable. User can safely define the pod as to have group number 6 and knowing that it will always work like:
```
  securityContext:
    runAsGroup: 1000
    runAsNonRoot: true
    runAsUser: 1000
    supplementalGroups:
    - 6
```
If Longhorn dynamically choose a number based on the OS information like sometime it is 6 in CentOS, REHL, Ubuntu or sometime it is 493 in SUSE OS family, user would have a harder time as they would need to figureout which group to put into the pod (6 or 493). So in this case, dynamically choosing group doesn't help us or user.
Secondly, this is minor but the golang function os.Chown only accept int for group ID not string. We can certainly workaround that but as pointed out in point number 1, there isn't benefit to dynamically choose the group for the Longhorn device.

WDYT?

ejweber · 2024-03-07T22:11:38Z

I was annoyed that I didn't understand exactly WHY disk device nodes typically have group disk ownership (vs Longhorn device nodes, which have group root ownership).

I also found it strange that the device nodes representing a Longhorn disk in OTHER locations DO have group disk ownership.

For example, on my Ubuntu node, /dev/longhorn/pvc-a9ce5e72-faf2-480b-b6e6-6e93e1b26d82 and /dev/sda are both the same Longhorn device.

root@eweber-v126-worker-9c1451b4-kgxdq:~# ls -l /dev/longhorn/pvc-a9ce5e72-faf2-480b-b6e6-6e93e1b26d82
brw-rw---- 1 root root 8, 0 Mar  7 21:41 /dev/longhorn/pvc-a9ce5e72-faf2-480b-b6e6-6e93e1b26d82
root@eweber-v126-worker-9c1451b4-kgxdq:~# ls -l /dev/sda 
brw-rw---- 1 root disk 8, 0 Mar  7 21:41 /dev/sda

I think the answer is udev rules.

While a Longhorn volume is attaching:

root@eweber-v126-worker-9c1451b4-kgxdq:~# udevadm monitor --property
...
UDEV  [16132.166051] add      /devices/platform/host3/session3/target3:0:0/3:0:0:1/block/sda (block)
ACTION=add
DEVPATH=/devices/platform/host3/session3/target3:0:0/3:0:0:1/block/sda
SUBSYSTEM=block
DEVNAME=/dev/sda
DEVTYPE=disk
DISKSEQ=25
SEQNUM=3980
USEC_INITIALIZED=16132151254
.SYSFS_PATH=/sys/class/scsi_device/3:0:0:1/device
SCSI_TPGS=0
SCSI_TYPE=disk
SCSI_VENDOR=IET
SCSI_VENDOR_ENC=IET\x20\x20\x20\x20\x20
SCSI_MODEL=VIRTUAL-DISK
SCSI_MODEL_ENC=VIRTUAL-DISK\x20\x20\x20\x20
SCSI_REVISION=0001
ID_SCSI=1
ID_SCSI_INQUIRY=1
SCSI_IDENT_SERIAL=beaf11
SCSI_IDENT_LUN_T10=IET_00010001
SCSI_IDENT_LUN_NAA_LOCAL=3000000100000001
SCSI_IDENT_LUN_NAA_REGEXT=60000000000000000e00000000010001
ID_VENDOR=IET
ID_VENDOR_ENC=IET\x20\x20\x20\x20\x20
ID_MODEL=VIRTUAL-DISK
ID_MODEL_ENC=VIRTUAL-DISK\x20\x20\x20\x20
ID_REVISION=0001
ID_TYPE=disk
ID_WWN_WITH_EXTENSION=0x60000000000000000e00000000010001
ID_WWN=0x6000000000000000
ID_BUS=scsi
ID_SERIAL=360000000000000000e00000000010001
ID_SERIAL_SHORT=60000000000000000e00000000010001
ID_SCSI_SERIAL=beaf11
MPATH_SBIN_PATH=/sbin
.SAVED_FM_WAIT_UNTIL=
DM_MULTIPATH_DEVICE_PATH=0
ID_PATH=ip-10.42.2.84:3260-iscsi-iqn.2019-10.io.longhorn:pvc-a9ce5e72-faf2-480b-b6e6-6e93e1b26d82-lun-1
ID_PATH_TAG=ip-10_42_2_84_3260-iscsi-iqn_2019-10_io_longhorn_pvc-a9ce5e72-faf2-480b-b6e6-6e93e1b26d82-lun-1
.ID_FS_TYPE_NEW=
ID_FS_TYPE=
SYSTEMD_WANTS=iscsid.service
MAJOR=8
MINOR=0
DEVLINKS=/dev/disk/by-id/scsi-1IET_00010001 /dev/disk/by-id/scsi-33000000100000001 /dev/disk/by-id/scsi-360000000000000000e00000000010001 /dev/disk/by-id/wwn-0x60000000000000000e00000000010001 /dev/disk/by-path/ip-10.42.2.84:3260-iscsi-iqn.2019-10.io.longhorn:pvc-a9ce5e72-faf2-480b-b6e6-6e93e1b26d82-lun-1 /dev/disk/by-id/scsi-SIET_VIRTUAL-DISK_beaf11
TAGS=:systemd:
CURRENT_TAGS=:systemd:
...

Checking the udev rules for the block subsystem, it looks like group disk ownership is automatically handled when creating the block device.

root@eweber-v126-worker-9c1451b4-kgxdq:~# grep -e 'GROUP="disk"' -e SUBSYSTEM=="block" /lib/udev/rules.d/*
/lib/udev/rules.d/50-udev-default.rules:SUBSYSTEM=="block", GROUP="disk"

We do not rely on udev to create the /dev/longhorn/... device node. Instead, we use mknod ourselves. According to its man page, it creates the device node with the group ID of the parent process.

https://man7.org/linux/man-pages/man2/mknod.2.html

So I guess it makes sense that we must change the ownership as you are doing in this PR. If we bind mounted /dev/sda (for example) instead of /dev/longhorn/..., I think we would not.

ejweber

LGTM.

shuo-wu · 2024-03-07T22:20:16Z

Checking the udev rules for the block subsystem, it looks like group disk ownership is automatically handled when creating the block device.

We do not rely on udev to create the /dev/longhorn/... device node. Instead, we use mknod ourselves. According to its man page, it creates the device node with the group ID of the parent process.

Interesting findings! I am thinking if it's possible to udev rather than mknod then we don't need to worry about conventions like disk group.

PhanLe1010 · 2024-03-08T00:23:44Z

Converting to draft PR because we also need to change the ownership for block device created by v2 engine

PhanLe1010 · 2024-03-08T00:36:58Z

Interesting findings! I am thinking if it's possible to udev rather than mknod then we don't need to worry about conventions like disk group.

Hi @shuo-wu , I think even if we use udev, we should still stick with the convention that Longhorn device will have a fixed group id 6. This will make it easier for the user to define their workload pod: user adds group 6 and know that the pod will work on every node it is deployed to no matter which underlying OS distro it is. We don't have to follow the default value that the OS chooses for the disk group.

If Longhorn watches and sets Longhorn device the same as the disk group id that OS chooses. Sometimes it will be 6 (the most common cases), sometimes it will be 493 (SUSE OS). This means that a pod with group id 6 will work ok when it is deployed to some nodes and fail to work when it is deployed on other nodes. This is an inconsistent behavior IMO. The pod would have to include all possible values like 6,493, ... to be run on every node if we choose this approach.

PhanLe1010 · 2024-03-08T01:29:01Z

V2 engine will be fix at longhorn/go-spdk-helper#86. This PR is ready for merging since test passed https://ci.longhorn.io/job/private/job/longhorn-tests-regression/6602/

innobead · 2024-03-08T02:33:20Z

If Longhorn watches and sets Longhorn device the same as the disk group id that OS chooses. Sometimes it will be 6 (the most common cases), sometimes it will be 493 (SUSE OS). This means that a pod with group id 6 will work ok when it is deployed to some nodes and fail to work when it is deployed on other nodes. This is an inconsistent behavior IMO. The pod would have to include all possible values like 6,493, ... to be run on every node if we choose this approach.

If the disk group ID is different from the default 6, then users need to handle it on their own to add the ID to the SupplementalGroups, right? If yes, we need to document this.

PhanLe1010 · 2024-03-08T02:58:21Z

If the disk group ID is different from the default 6, then users need to handle it on their own to add the ID to the SupplementalGroups, right? If yes, we need to document this.

@innobead The merged PR doesn't dynamically change the Longhorn device to have same group ID as the default group ID for disk group that the OS choose. The PR hard-coded group 6 to the Longhorn device. Therefore, users don't need to worry about finding the default value of the underlying OSs and adding it to the pod. Instead, the users just need to:

Run the pod with user in group 6
OR run the pod with root user

We will document this point in our doc

innobead · 2024-03-08T03:10:33Z

Run the pod with user in group 6

Assuming that 6 is always available, even if the disk group ID is different from 6 on some hosts, right?

PhanLe1010 · 2024-03-08T03:25:40Z

Assuming that 6 is always available, even if the disk group ID is different from 6 on some hosts, right?

I think yes, the users can always add group 6 to the pod.Spec.SecurityContext regardless of what ID the underlying OS chooses for the name disk group. I will have a sanity test when running Longhorn on SUSE OS (SUSE OS choose id 493 for disk group) to confirm.

innobead · 2024-03-08T03:27:25Z

Sounds good. Well done.

PhanLe1010 · 2024-03-09T03:31:39Z

I think yes, the users can always add group 6 to the pod.Spec.SecurityContext regardless of what ID the underlying OS chooses for the name disk group. I will have a sanity test when running Longhorn on SUSE OS (SUSE OS choose id 493 for disk group) to confirm.

@innobead This is the test result:

SUSE OS info

ec2-user@phan-v622-block-vol-pool1-a52c31f3-qjstc:~> cat /etc/os-release
NAME="SLES"
VERSION="15-SP5"
VERSION_ID="15.5"
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP5"
ID="sles"
ID_LIKE="suse"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:15:sp5"
DOCUMENTATION_URL="https://documentation.suse.com/"

The OS chooses group id 490 for disk group name

ec2-user@phan-v622-block-vol-pool1-a52c31f3-qjstc:~> cat /etc/group
...
dialout:x:491:
disk:x:490:
...

Deploy a block PVC and pod with non-root user and add group id 6 to the container's user

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: longhorn-block-vol
spec:
  accessModes:
    - ReadWriteOnce
  volumeMode: Block
  storageClassName: longhorn
  resources:
    requests:
      storage: 2Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: block-volume-test
  namespace: default
spec:
  securityContext:
    runAsGroup: 1000
    runAsNonRoot: true
    runAsUser: 1000
    supplementalGroups:
    - 6
  containers:
    - name: block-volume-test
      image: ubuntu:20.04
      command: ["sleep", "360000"]
      imagePullPolicy: IfNotPresent
      volumeDevices:
        - devicePath: /dev/longhorn/testblk
          name: block-vol
  volumes:
    - name: block-vol
      persistentVolumeClaim:
        claimName: longhorn-block-vol

Longhorn device has group id 6

ec2-user@phan-v622-block-vol-pool1-a52c31f3-qjstc:~> ls -l /dev/longhorn/
total 0
brw-rw---- 1 root 6 8, 0 Mar  9 03:29 pvc-d0473f41-2ed5-4c7d-905b-4bd0a1281ebe

Read/write to the block device is ok.

I have no name!@block-volume-test:/$ dd if=/dev/urandom of=/dev/longhorn/testblk bs=1M count=1 && dd if=/dev/longhorn/testblk bs=1M count=1 | md5sum 
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00614914 s, 171 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0031772 s, 330 MB/s
b2bc78b33f9c8bfeab553bdebb68712c  -

Conclusion: so pod just needs to add the group 6 (or run as root) and it will work on any OS

Automatically add Longhorn device to the group id 6 (disk group) by d…

dffb532

…efault More explanation is at longhorn/longhorn#8088 (comment) longhorn-8088 Signed-off-by: Phan Le <[email protected]>

PhanLe1010 force-pushed the master branch from e1b34ee to dffb532 Compare March 7, 2024 04:02

PhanLe1010 requested review from derekbit, ejweber and shuo-wu March 7, 2024 04:03

PhanLe1010 assigned james-munson Mar 7, 2024

PhanLe1010 requested a review from innobead March 7, 2024 04:04

james-munson approved these changes Mar 7, 2024

View reviewed changes

ejweber approved these changes Mar 7, 2024

View reviewed changes

PhanLe1010 marked this pull request as draft March 8, 2024 00:22

PhanLe1010 marked this pull request as ready for review March 8, 2024 01:28

shuo-wu merged commit 8f2980e into longhorn:master Mar 8, 2024
3 checks passed

PhanLe1010 mentioned this pull request Mar 8, 2024

[IMPROVEMENT] Cannot read/write to block volume when the container is run as non-root longhorn/longhorn#8088

Closed

PhanLe1010 mentioned this pull request Mar 11, 2024

[BACKPORT][v1.6.1][IMPROVEMENT] Cannot read/write to block volume when the container is run as non-root longhorn/longhorn#8122

Closed

Vicente-Cheng mentioned this pull request Mar 11, 2024

[Question] Verify if any workload in Havester that runs as non-root id (different than 0) and has root group id (0) harvester/harvester#5318

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically add Longhorn device to the group id 6 (disk group) by default #79

Automatically add Longhorn device to the group id 6 (disk group) by default #79

PhanLe1010 commented Mar 7, 2024

PhanLe1010 commented Mar 7, 2024

james-munson left a comment

markhillgit commented Mar 7, 2024

PhanLe1010 commented Mar 7, 2024 •

edited

Loading

ejweber commented Mar 7, 2024

ejweber left a comment

shuo-wu commented Mar 7, 2024

PhanLe1010 commented Mar 8, 2024

PhanLe1010 commented Mar 8, 2024 •

edited

Loading

PhanLe1010 commented Mar 8, 2024 •

edited

Loading

innobead commented Mar 8, 2024

PhanLe1010 commented Mar 8, 2024 •

edited

Loading

innobead commented Mar 8, 2024 •

edited

Loading

PhanLe1010 commented Mar 8, 2024

innobead commented Mar 8, 2024

PhanLe1010 commented Mar 9, 2024 •

edited

Loading

Automatically add Longhorn device to the group id 6 (disk group) by default #79

Automatically add Longhorn device to the group id 6 (disk group) by default #79

Conversation

PhanLe1010 commented Mar 7, 2024

PhanLe1010 commented Mar 7, 2024

james-munson left a comment

Choose a reason for hiding this comment

markhillgit commented Mar 7, 2024

PhanLe1010 commented Mar 7, 2024 • edited Loading

ejweber commented Mar 7, 2024

ejweber left a comment

Choose a reason for hiding this comment

shuo-wu commented Mar 7, 2024

PhanLe1010 commented Mar 8, 2024

PhanLe1010 commented Mar 8, 2024 • edited Loading

PhanLe1010 commented Mar 8, 2024 • edited Loading

innobead commented Mar 8, 2024

PhanLe1010 commented Mar 8, 2024 • edited Loading

innobead commented Mar 8, 2024 • edited Loading

PhanLe1010 commented Mar 8, 2024

innobead commented Mar 8, 2024

PhanLe1010 commented Mar 9, 2024 • edited Loading

PhanLe1010 commented Mar 7, 2024 •

edited

Loading

PhanLe1010 commented Mar 8, 2024 •

edited

Loading

PhanLe1010 commented Mar 8, 2024 •

edited

Loading

PhanLe1010 commented Mar 8, 2024 •

edited

Loading

innobead commented Mar 8, 2024 •

edited

Loading

PhanLe1010 commented Mar 9, 2024 •

edited

Loading