Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically add Longhorn device to the group id 6 (disk group) by default #79

Merged
merged 1 commit into from
Mar 8, 2024

Conversation

PhanLe1010
Copy link
Contributor

@PhanLe1010
Copy link
Contributor Author

Copy link

@james-munson james-munson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. LGTM.

@markhillgit
Copy link

Should we be using the symbolic name "disk" for the group instead of hard coding the number 6? I don't know if it is guaranteed by linux or posix that disk will always be the number 6 (although it's 6 in every example I have looked at so far.)

@PhanLe1010
Copy link
Contributor Author

PhanLe1010 commented Mar 7, 2024

Should we be using the symbolic name "disk" for the group instead of hard coding the number 6? I don't know if it is guaranteed by linux or posix that disk will always be the number 6 (although it's 6 in every example I have looked at so far.)

Thanks @markhillgit for the feedback! I thought about this before but have a few concerns

  1. From the user usage (from the pod's point of view) a fixed number (6) will be more predictable. User can safely define the pod as to have group number 6 and knowing that it will always work like:
      securityContext:
        runAsGroup: 1000
        runAsNonRoot: true
        runAsUser: 1000
        supplementalGroups:
        - 6
    If Longhorn dynamically choose a number based on the OS information like sometime it is 6 in CentOS, REHL, Ubuntu or sometime it is 493 in SUSE OS family, user would have a harder time as they would need to figureout which group to put into the pod (6 or 493). So in this case, dynamically choosing group doesn't help us or user.
  2. Secondly, this is minor but the golang function os.Chown only accept int for group ID not string. We can certainly workaround that but as pointed out in point number 1, there isn't benefit to dynamically choose the group for the Longhorn device.

WDYT?

@ejweber
Copy link
Collaborator

ejweber commented Mar 7, 2024

I was annoyed that I didn't understand exactly WHY disk device nodes typically have group disk ownership (vs Longhorn device nodes, which have group root ownership).

I also found it strange that the device nodes representing a Longhorn disk in OTHER locations DO have group disk ownership.

For example, on my Ubuntu node, /dev/longhorn/pvc-a9ce5e72-faf2-480b-b6e6-6e93e1b26d82 and /dev/sda are both the same Longhorn device.

root@eweber-v126-worker-9c1451b4-kgxdq:~# ls -l /dev/longhorn/pvc-a9ce5e72-faf2-480b-b6e6-6e93e1b26d82
brw-rw---- 1 root root 8, 0 Mar  7 21:41 /dev/longhorn/pvc-a9ce5e72-faf2-480b-b6e6-6e93e1b26d82
root@eweber-v126-worker-9c1451b4-kgxdq:~# ls -l /dev/sda 
brw-rw---- 1 root disk 8, 0 Mar  7 21:41 /dev/sda

I think the answer is udev rules.

While a Longhorn volume is attaching:

root@eweber-v126-worker-9c1451b4-kgxdq:~# udevadm monitor --property
...
UDEV  [16132.166051] add      /devices/platform/host3/session3/target3:0:0/3:0:0:1/block/sda (block)
ACTION=add
DEVPATH=/devices/platform/host3/session3/target3:0:0/3:0:0:1/block/sda
SUBSYSTEM=block
DEVNAME=/dev/sda
DEVTYPE=disk
DISKSEQ=25
SEQNUM=3980
USEC_INITIALIZED=16132151254
.SYSFS_PATH=/sys/class/scsi_device/3:0:0:1/device
SCSI_TPGS=0
SCSI_TYPE=disk
SCSI_VENDOR=IET
SCSI_VENDOR_ENC=IET\x20\x20\x20\x20\x20
SCSI_MODEL=VIRTUAL-DISK
SCSI_MODEL_ENC=VIRTUAL-DISK\x20\x20\x20\x20
SCSI_REVISION=0001
ID_SCSI=1
ID_SCSI_INQUIRY=1
SCSI_IDENT_SERIAL=beaf11
SCSI_IDENT_LUN_T10=IET_00010001
SCSI_IDENT_LUN_NAA_LOCAL=3000000100000001
SCSI_IDENT_LUN_NAA_REGEXT=60000000000000000e00000000010001
ID_VENDOR=IET
ID_VENDOR_ENC=IET\x20\x20\x20\x20\x20
ID_MODEL=VIRTUAL-DISK
ID_MODEL_ENC=VIRTUAL-DISK\x20\x20\x20\x20
ID_REVISION=0001
ID_TYPE=disk
ID_WWN_WITH_EXTENSION=0x60000000000000000e00000000010001
ID_WWN=0x6000000000000000
ID_BUS=scsi
ID_SERIAL=360000000000000000e00000000010001
ID_SERIAL_SHORT=60000000000000000e00000000010001
ID_SCSI_SERIAL=beaf11
MPATH_SBIN_PATH=/sbin
.SAVED_FM_WAIT_UNTIL=
DM_MULTIPATH_DEVICE_PATH=0
ID_PATH=ip-10.42.2.84:3260-iscsi-iqn.2019-10.io.longhorn:pvc-a9ce5e72-faf2-480b-b6e6-6e93e1b26d82-lun-1
ID_PATH_TAG=ip-10_42_2_84_3260-iscsi-iqn_2019-10_io_longhorn_pvc-a9ce5e72-faf2-480b-b6e6-6e93e1b26d82-lun-1
.ID_FS_TYPE_NEW=
ID_FS_TYPE=
SYSTEMD_WANTS=iscsid.service
MAJOR=8
MINOR=0
DEVLINKS=/dev/disk/by-id/scsi-1IET_00010001 /dev/disk/by-id/scsi-33000000100000001 /dev/disk/by-id/scsi-360000000000000000e00000000010001 /dev/disk/by-id/wwn-0x60000000000000000e00000000010001 /dev/disk/by-path/ip-10.42.2.84:3260-iscsi-iqn.2019-10.io.longhorn:pvc-a9ce5e72-faf2-480b-b6e6-6e93e1b26d82-lun-1 /dev/disk/by-id/scsi-SIET_VIRTUAL-DISK_beaf11
TAGS=:systemd:
CURRENT_TAGS=:systemd:
...

Checking the udev rules for the block subsystem, it looks like group disk ownership is automatically handled when creating the block device.

root@eweber-v126-worker-9c1451b4-kgxdq:~# grep -e 'GROUP="disk"' -e SUBSYSTEM=="block" /lib/udev/rules.d/*
/lib/udev/rules.d/50-udev-default.rules:SUBSYSTEM=="block", GROUP="disk"

We do not rely on udev to create the /dev/longhorn/... device node. Instead, we use mknod ourselves. According to its man page, it creates the device node with the group ID of the parent process.

https://man7.org/linux/man-pages/man2/mknod.2.html

So I guess it makes sense that we must change the ownership as you are doing in this PR. If we bind mounted /dev/sda (for example) instead of /dev/longhorn/..., I think we would not.

Copy link
Collaborator

@ejweber ejweber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@shuo-wu
Copy link
Contributor

shuo-wu commented Mar 7, 2024

Checking the udev rules for the block subsystem, it looks like group disk ownership is automatically handled when creating the block device.

We do not rely on udev to create the /dev/longhorn/... device node. Instead, we use mknod ourselves. According to its man page, it creates the device node with the group ID of the parent process.

Interesting findings! I am thinking if it's possible to udev rather than mknod then we don't need to worry about conventions like disk group.

@PhanLe1010 PhanLe1010 marked this pull request as draft March 8, 2024 00:22
@PhanLe1010
Copy link
Contributor Author

Converting to draft PR because we also need to change the ownership for block device created by v2 engine

@PhanLe1010
Copy link
Contributor Author

PhanLe1010 commented Mar 8, 2024

Interesting findings! I am thinking if it's possible to udev rather than mknod then we don't need to worry about conventions like disk group.

Hi @shuo-wu , I think even if we use udev, we should still stick with the convention that Longhorn device will have a fixed group id 6. This will make it easier for the user to define their workload pod: user adds group 6 and know that the pod will work on every node it is deployed to no matter which underlying OS distro it is. We don't have to follow the default value that the OS chooses for the disk group.

If Longhorn watches and sets Longhorn device the same as the disk group id that OS chooses. Sometimes it will be 6 (the most common cases), sometimes it will be 493 (SUSE OS). This means that a pod with group id 6 will work ok when it is deployed to some nodes and fail to work when it is deployed on other nodes. This is an inconsistent behavior IMO. The pod would have to include all possible values like 6,493, ... to be run on every node if we choose this approach.

@PhanLe1010 PhanLe1010 marked this pull request as ready for review March 8, 2024 01:28
@PhanLe1010
Copy link
Contributor Author

PhanLe1010 commented Mar 8, 2024

V2 engine will be fix at longhorn/go-spdk-helper#86. This PR is ready for merging since test passed https://ci.longhorn.io/job/private/job/longhorn-tests-regression/6602/

@shuo-wu shuo-wu merged commit 8f2980e into longhorn:master Mar 8, 2024
3 checks passed
@innobead
Copy link
Member

innobead commented Mar 8, 2024

If Longhorn watches and sets Longhorn device the same as the disk group id that OS chooses. Sometimes it will be 6 (the most common cases), sometimes it will be 493 (SUSE OS). This means that a pod with group id 6 will work ok when it is deployed to some nodes and fail to work when it is deployed on other nodes. This is an inconsistent behavior IMO. The pod would have to include all possible values like 6,493, ... to be run on every node if we choose this approach.

If the disk group ID is different from the default 6, then users need to handle it on their own to add the ID to the SupplementalGroups, right? If yes, we need to document this.

@PhanLe1010
Copy link
Contributor Author

PhanLe1010 commented Mar 8, 2024

If the disk group ID is different from the default 6, then users need to handle it on their own to add the ID to the SupplementalGroups, right? If yes, we need to document this.

@innobead The merged PR doesn't dynamically change the Longhorn device to have same group ID as the default group ID for disk group that the OS choose. The PR hard-coded group 6 to the Longhorn device. Therefore, users don't need to worry about finding the default value of the underlying OSs and adding it to the pod. Instead, the users just need to:

  1. Run the pod with user in group 6
  2. OR run the pod with root user

We will document this point in our doc

@innobead
Copy link
Member

innobead commented Mar 8, 2024

  1. Run the pod with user in group 6

Assuming that 6 is always available, even if the disk group ID is different from 6 on some hosts, right?

@PhanLe1010
Copy link
Contributor Author

Assuming that 6 is always available, even if the disk group ID is different from 6 on some hosts, right?

I think yes, the users can always add group 6 to the pod.Spec.SecurityContext regardless of what ID the underlying OS chooses for the name disk group. I will have a sanity test when running Longhorn on SUSE OS (SUSE OS choose id 493 for disk group) to confirm.

@innobead
Copy link
Member

innobead commented Mar 8, 2024

Sounds good. Well done.

@PhanLe1010
Copy link
Contributor Author

PhanLe1010 commented Mar 9, 2024

I think yes, the users can always add group 6 to the pod.Spec.SecurityContext regardless of what ID the underlying OS chooses for the name disk group. I will have a sanity test when running Longhorn on SUSE OS (SUSE OS choose id 493 for disk group) to confirm.

@innobead This is the test result:

  1. SUSE OS info
    ec2-user@phan-v622-block-vol-pool1-a52c31f3-qjstc:~> cat /etc/os-release
    NAME="SLES"
    VERSION="15-SP5"
    VERSION_ID="15.5"
    PRETTY_NAME="SUSE Linux Enterprise Server 15 SP5"
    ID="sles"
    ID_LIKE="suse"
    ANSI_COLOR="0;32"
    CPE_NAME="cpe:/o:suse:sles:15:sp5"
    DOCUMENTATION_URL="https://documentation.suse.com/"
    
  2. The OS chooses group id 490 for disk group name
    ec2-user@phan-v622-block-vol-pool1-a52c31f3-qjstc:~> cat /etc/group
    ...
    dialout:x:491:
    disk:x:490:
    ...
    
  3. Deploy a block PVC and pod with non-root user and add group id 6 to the container's user
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: longhorn-block-vol
    spec:
      accessModes:
        - ReadWriteOnce
      volumeMode: Block
      storageClassName: longhorn
      resources:
        requests:
          storage: 2Gi
    ---
    apiVersion: v1
    kind: Pod
    metadata:
      name: block-volume-test
      namespace: default
    spec:
      securityContext:
        runAsGroup: 1000
        runAsNonRoot: true
        runAsUser: 1000
        supplementalGroups:
        - 6
      containers:
        - name: block-volume-test
          image: ubuntu:20.04
          command: ["sleep", "360000"]
          imagePullPolicy: IfNotPresent
          volumeDevices:
            - devicePath: /dev/longhorn/testblk
              name: block-vol
      volumes:
        - name: block-vol
          persistentVolumeClaim:
            claimName: longhorn-block-vol
    
  4. Longhorn device has group id 6
    ec2-user@phan-v622-block-vol-pool1-a52c31f3-qjstc:~> ls -l /dev/longhorn/
    total 0
    brw-rw---- 1 root 6 8, 0 Mar  9 03:29 pvc-d0473f41-2ed5-4c7d-905b-4bd0a1281ebe
    
  5. Read/write to the block device is ok.
    I have no name!@block-volume-test:/$ dd if=/dev/urandom of=/dev/longhorn/testblk bs=1M count=1 && dd if=/dev/longhorn/testblk bs=1M count=1 | md5sum 
    1+0 records in
    1+0 records out
    1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00614914 s, 171 MB/s
    1+0 records in
    1+0 records out
    1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0031772 s, 330 MB/s
    b2bc78b33f9c8bfeab553bdebb68712c  -
    
  6. Conclusion: so pod just needs to add the group 6 (or run as root) and it will work on any OS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants