Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transposefs: Only autosave-xfs for much larger filesystems #2565

Merged
merged 1 commit into from
Aug 29, 2023

Conversation

cgwalters
Copy link
Member

@cgwalters cgwalters commented Aug 24, 2023

The change in #2320
has been very problematic for OpenShift because our default node
configuration is always over the threshold, and that causes
significant latency on instance provisioning.

Experimentally bumping to 400 allocation groups, which is about 700GiB.
This is comfortably about the default OpenShift node root disk sizes,
and returns us to the prior status quo.

While we're here, rework the logging a bit so that we always
log the agcount for debugging purposes.


@cgwalters cgwalters marked this pull request as draft August 24, 2023 17:12
@cgwalters
Copy link
Member Author

Draft since this could use some more testing

@cgwalters
Copy link
Member Author

@cgwalters
Copy link
Member Author

Here's a script I wrote to analyze xfs AG counts as the system is grown:

#!/bin/bash
set -euo pipefail
initial=$1
shift
grown=$1
shift

runv() {
    echo "$ $@"
    "$@"
} 

tmpd=$(mktemp -d -p /var/tmp)
target="${tmpd}/backing-file"
runv truncate -s "$initial" "$target"
runv mkfs.xfs "${target}"
runv truncate -s "$grown" "$target"
mntpath="${tmpd}/mnt"
mkdir "${mntpath}"
runv mount -o loop -t xfs "${target}" "${mntpath}"
runv xfs_growfs "${mntpath}" 1>/dev/null
runv xfs_info "${mntpath}"
umount "${mntpath}"
rm ${tmpd} -rf

@cgwalters cgwalters changed the title transposefs: Only autosave-xfs 1TiB filesystems transposefs: Only autosave-xfs for much larger filesystems Aug 24, 2023
@cgwalters cgwalters marked this pull request as ready for review August 24, 2023 18:51
Copy link
Member

@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. A few comments

@dustymabe
Copy link
Member

I added the whitespace fixes over in #2566 so you can ignore that CI failure here.

@cgwalters cgwalters force-pushed the bump-transposefs branch 4 times, most recently from 88fc4dc to fafe7cd Compare August 28, 2023 18:48
@dustymabe
Copy link
Member

I added the whitespace fixes over in #2566 so you can ignore that CI failure here.

I broke the whitespace fixes out into #2570

dustymabe
dustymabe previously approved these changes Aug 28, 2023
Copy link
Member

@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cgwalters cgwalters enabled auto-merge (rebase) August 28, 2023 19:41
@cgwalters cgwalters disabled auto-merge August 28, 2023 22:08
@cgwalters cgwalters added the needs-rework/failing-ci Legitimate CI failure label Aug 28, 2023
travier
travier previously approved these changes Aug 29, 2023
The change in coreos#2320
has been very problematic for OpenShift because our default node
configuration is *always* over the threshold, and that causes
significant latency on instance provisioning.

Experimentally bumping to 400 allocation groups, which is about 700GiB.
This is comfortably about the default OpenShift node root disk sizes,
and returns us to the prior status quo.

While we're here, rework the logging a bit so that we *always*
log the `agcount` for debugging purposes.

Also:

- Only log to stdout for normal conditions
- Include the name of the systemd unit in the test description
  so we can cross-reference
- tests: Hoist the expected agcount of 4 to a common variable
@cgwalters cgwalters dismissed stale reviews from travier and dustymabe via c2e4ef8 August 29, 2023 14:39
@cgwalters
Copy link
Member Author

Blah, missed a case of output redirection. Fixed now and cleaned up.

@cgwalters cgwalters enabled auto-merge (rebase) August 29, 2023 14:43
@cgwalters cgwalters removed the needs-rework/failing-ci Legitimate CI failure label Aug 29, 2023
local threshold
threshold=400
if [ "$agcount" -lt "${threshold}" ]; then
echo "autosave-xfs: ${root_part} agcount=$agcount is lower than threshold=${threshold}" >&2
echo 0
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally optional: this return can be dropped now.

Copy link
Member

@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants