Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS#10640: Added clarification point to disk partition BM doc #67707

Merged
merged 1 commit into from
Mar 22, 2024

Conversation

dfitzmau
Copy link
Contributor

@dfitzmau dfitzmau commented Nov 10, 2023

OCPBUGS-10640

Version(s):
4.16, 4.15 and 4.14

For 4.13 and 4.12, ensure you do not remove:

For disk sizes larger than 100GB, and especially disk sizes larger than 1TB, create a separate /var partition. See "Creating a separate /var partition" and this link:https://access.redhat.com/solutions/5587281[Red Hat Knowledgebase article] for more information.

Link to docs preview:
Disk partioning

  • SME has approved this change.
  • QE has approved this change.

Additional information:

@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Nov 10, 2023
@openshift-ci-robot
Copy link

@dfitzmau: This pull request references Jira Issue OCPBUGS-10640, which is invalid:

  • expected the bug to target the "4.15.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Version(s):

Issue:

Link to docs preview:

QE review:

  • QE has approved this change.

Additional information:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Nov 10, 2023
@dfitzmau dfitzmau changed the title OCPBUGS-10640: Added clarification point to disk partition BM doc OCPBUGS#10640: Added clarification point to disk partition BM doc Nov 10, 2023
@openshift-ci-robot openshift-ci-robot removed jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Nov 10, 2023
@openshift-ci-robot
Copy link

@dfitzmau: No Jira issue is referenced in the title of this pull request.
To reference a jira issue, add 'XYZ-NNN:' to the title of this pull request and request another refresh with /jira refresh.

In response to this:

OCPBUGS-10640

Version(s):
4.11 to 4.15

Link to docs preview:

QE review:

  • QE has approved this change.

Additional information:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ocpdocs-previewbot
Copy link

ocpdocs-previewbot commented Nov 10, 2023

🤖 Tue Mar 19 17:09:23 - Prow CI generated the docs preview:
https://67707--ocpdocs-pr.netlify.app

@dfitzmau dfitzmau force-pushed the OCPBUGS-10640 branch 2 times, most recently from 9fbda97 to 77f1bdc Compare November 14, 2023 09:33
@andreaskaris
Copy link

andreaskaris commented Nov 17, 2023

That doesn't quite cut it for me.

Just as a reminder:
nodefs monitors the disk that contains /var/lib/kubelet
imagefs monitors the disk that contains /var/lib/containers

Thus, in the default configuration, nodefs monitors /, imagefs monitors /.

My issues with the doc:

Looking at https://docs.openshift.com/container-platform/4.14/installing/installing_bare_metal/installing-bare-metal.html#installation-user-infra-machines-advanced_disk_installing-bare-metal

  • " This is officially supported for mounting /var or a subdirectory of /var, such as /var/lib/etcd, on a separate partition, but not both."
    This sentence IMO is problematic. We should at least advise against mounting /var in its entirety. When you mount /var in full to a separate partition, the following happens:
    kubelet monitors imagefs = nodefs = /var. Kubelet stops monitoring the root disk.
    The better scheme is to mount /var/lib/containers to a separate partition, in that case:
    imagefs = /var/lib/containers, nodefs=/, therefore imagefs monitors the container location and nodefs monitors the entire root filesystem including logs, etc

  • "The use of custom partitions could result in those partitions not being monitored by OpenShift Container Platform or alerted on. If you are overriding the default partitioning, see Understanding OpenShift File System Monitoring (eviction conditions) for more information about how OpenShift Container Platform monitors your host file systems."
    The documentation already contains this - correct - warning, I think I missed that when I created https://issues.redhat.com/browse/OCPBUGS-10640. Be it as it may, I would like to see our recommendations keep this in mind and be rewritten in the spirit of this warning (this means instead of highlighting mounting /var, I'd recommend highlighting mounting /var/lib/containers instead).

  • "For disk sizes larger than 100GB, and especially larger than 1TB, create a separate /var partition."
    That advice is misguided, and the same warning appears 2x in the same document (redundant), once with, once without a link to https://access.redhat.com/solutions/5587281
    a) for "large" disk sizes - who considers 100GB large in 2023?
    b) Look at https://access.redhat.com/solutions/5587281: it talks about resizing disks ("The affected Filesystem has been resized (extended)."), and OCP file systems are (nearly?) never resized. If we absolutely want to keep the warning because we're being extra cautious, we should add to it that this only happens with resized disks.
    --> Large partition + xfs + rhel8 + resized = problematic
    --> Large partition + xfs + rhel8 = not problematic
    Investigate if the issue persists in RHEL9, because if not, starting with OCP 4.13 we might drop the warning altogether if RHEL 9 fixed this.

  • "The following procedure sets up a separate /var partition"
    I'd change the entire example to /var/lib/containers instead of /var

@kannon92
Copy link

kannon92 commented Dec 3, 2023

My issues with the doc:

Looking at https://docs.openshift.com/container-platform/4.14/installing/installing_bare_metal/installing-bare-metal.html#installation-user-infra-machines-advanced_disk_installing-bare-metal

  • " This is officially supported for mounting /var or a subdirectory of /var, such as /var/lib/etcd, on a separate partition, but not both."
    This sentence IMO is problematic. We should at least advise against mounting /var in its entirety. When you mount /var in full to a separate partition, the following happens:
    kubelet monitors imagefs = nodefs = /var. Kubelet stops monitoring the root disk.

I don't know if this is true. I don't think Kubelet will ever stop monitoring the root disk. It will monitor the disk that /var/lib/kubelet is on and consider that the root filesystem. So pods, volumes, ephemeral storage and logs will be on this filesystem.

The better scheme is to mount /var/lib/containers to a separate partition, in that case:
imagefs = /var/lib/containers, nodefs=/, therefore imagefs monitors the container location and nodefs monitors the entire root filesystem including logs, etc

This is correct.

The other issue we should keep in mind is that we may allow users to modify the location for /var/lib/containers. They can change the graphroot in storage.conf. I wrote an upstream blog post about this. Currently we don't really provide support for it but I just want to bring more awareness to this fact.

The blog post I wrote (which is still WIP) is here

@dfitzmau
Copy link
Contributor Author

dfitzmau commented Dec 4, 2023

Thanks, @kannon92 , @andreaskaris , and @djuran . From reading the comments, I'll take it the bullet points under "There are two cases where you might want to override the default partitioning when installing RHCOS on an OpenShift Container Platform cluster node:" are problematic and need to be removed and replaced with suggestive text that points towards mounting /var/lib/containers in a separate partition?

@andreaskaris
Copy link

andreaskaris commented Dec 4, 2023

I don't know if this is true. I don't think Kubelet will ever stop monitoring the root disk. It will monitor the disk that /var/lib/kubelet is on and consider that the root filesystem. So pods, volumes, ephemeral storage and logs will be on this filesystem.

So some of the confusion might be coming from my wording, and if so, I'm sorry about that. What I mean to say is that (and correct me if I'm wrong):

kubelet considers the root filesystem the disk that /var/lib/kubelet is on.
Let's say / is mounted on /dev/sda1, and /var is mounted on /dev/sda2.
In that case, kubelet considers the root filesystem /var (because that's what /var/lib/kubelet is under) and monitors the contents of /dev/sda2 as nodefs. It will also monitor imagefs = /var/lib/containers, but that's also on /var. So kubelet will only monitor /var/, thus /dev/sda2.
The kubelet will no longer monitor the filesystem that's holding /root, /etc/, /usr, etc. assuming that all of these reside on /.

@kannon92
Copy link

kannon92 commented Dec 4, 2023

I don't know if this is true. I don't think Kubelet will ever stop monitoring the root disk. It will monitor the disk that /var/lib/kubelet is on and consider that the root filesystem. So pods, volumes, ephemeral storage and logs will be on this filesystem.

So some of the confusion might be coming from my wording, and if so, I'm sorry about that. What I mean to say is that (and correct me if I'm wrong):

kubelet considers the root filesystem the disk that /var/lib/kubelet is on. Let's say / is mounted on /dev/sda1, and /var is mounted on /dev/sda2. In that case, kubelet considers the root filesystem /var (because that's what /var/lib/kubelet is under) and monitors the contents of /dev/sda2 as nodefs. It will also monitor imagefs = /var/lib/containers, but that's also on /var. So kubelet will only monitor /var/, thus /dev/sda2. The kubelet will no longer monitor the filesystem that's holding /root, /etc/, /usr, etc. assuming that all of these reside on /.

That is all correct.

@openshift-ci openshift-ci bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Dec 5, 2023
@dfitzmau
Copy link
Contributor Author

dfitzmau commented Dec 5, 2023

Hi @andreaskaris and @kannon92 , I updated the PR based on your feedback. I'm not too versed in OCP storage components, so please feel free to suggest better sentences. I assume I should remove the https://access.redhat.com/solutions/5587281 link for 4.14 + docs as it applies to RHEL 8?

As this section is highly confusing its current format, maybe simplifying it would be better?

@dfitzmau
Copy link
Contributor Author

dfitzmau commented Jan 4, 2024

/retest

@dfitzmau dfitzmau force-pushed the OCPBUGS-10640 branch 2 times, most recently from 991a118 to 3995d66 Compare January 4, 2024 12:06
@kcarmichael08 kcarmichael08 added peer-review-in-progress Signifies that the peer review team is reviewing this PR branch/enterprise-4.14 branch/enterprise-4.15 labels Mar 19, 2024
Copy link
Contributor

@kcarmichael08 kcarmichael08 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of suggestions. Nice job! (Added labels but wasn't sure about 4.16, so didn't add that one.)

modules/installation-user-infra-machines-advanced.adoc Outdated Show resolved Hide resolved
modules/installation-user-infra-machines-advanced.adoc Outdated Show resolved Hide resolved
@kcarmichael08 kcarmichael08 added peer-review-done Signifies that the peer review team has reviewed this PR and removed peer-review-in-progress Signifies that the peer review team is reviewing this PR labels Mar 19, 2024
@kcarmichael08 kcarmichael08 added this to the Continuous Release milestone Mar 19, 2024
Copy link

openshift-ci bot commented Mar 19, 2024

@dfitzmau: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@gpei
Copy link

gpei commented Mar 22, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 22, 2024
@dfitzmau
Copy link
Contributor Author

/label merge-review-needed

@ShaunaDiaz
Copy link
Contributor

/remove-label merge-review-needed

@ShaunaDiaz ShaunaDiaz merged commit 32acc01 into openshift:main Mar 22, 2024
2 checks passed
@openshift-ci openshift-ci bot removed the merge-review-needed Signifies that the merge review team needs to review this PR label Mar 22, 2024
@ShaunaDiaz
Copy link
Contributor

/cherrypick enterprise-4.14

@ShaunaDiaz
Copy link
Contributor

/cherrypick enterprise-4.15

@ShaunaDiaz
Copy link
Contributor

/cherrypick enterprise-4.16

@openshift-cherrypick-robot

@ShaunaDiaz: new pull request created: #73615

In response to this:

/cherrypick enterprise-4.14

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@ShaunaDiaz: new pull request created: #73616

In response to this:

/cherrypick enterprise-4.15

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@ShaunaDiaz: new pull request created: #73617

In response to this:

/cherrypick enterprise-4.16

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch/enterprise-4.14 branch/enterprise-4.15 branch/enterprise-4.16 lgtm Indicates that a PR is ready to be merged. peer-review-done Signifies that the peer review team has reviewed this PR size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.