Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix duration measurement issues in the client #24344

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

fmeum
Copy link
Collaborator

@fmeum fmeum commented Nov 15, 2024

Even with CLOCK_MONOTONIC, we are still seeing Bazel servers fail to start up occasionally due to start time being larger than end time. Make this non-fatal by truncating to 0 and emitting a warning with start and end time to facilitate further investigation.

Also flip the conditions for command and extraction wait time, which previously were only included if not known.

@fmeum fmeum requested a review from meisterT November 15, 2024 16:46
@github-actions github-actions bot added team-Rules-CPP Issues for C++ rules awaiting-review PR is awaiting review from an assigned reviewer labels Nov 15, 2024
@fmeum
Copy link
Collaborator Author

fmeum commented Nov 15, 2024

@bazel-io fork 8.0.0

Even with `CLOCK_MONOTONIC`, we are still seeing Bazel servers fail to start up occasionally due to start time being larger than end time. Make this non-fatal by truncating to 0 and emitting a warning with start and end time to facilitate further investigation.

Also flip the conditions for command and extraction wait time, which previously were only included if *not* known.
Copy link
Contributor

@tjgq tjgq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High-level comment: I realize that I'm asking for a larger refactor, but I'm wondering if we could simply use std::chrono::duration for DurationMillis, and std::optional<std::chrono::duration> for ExtractionDurationMillis (because we either extracted and have a known duration, or we didn't extract and don't know it; the other two combinations don't appear to be useful).

WDYT?

@tjgq
Copy link
Contributor

tjgq commented Nov 15, 2024

High-level comment: I realize that I'm asking for a larger refactor, but I'm wondering if we could simply use std::chrono::duration for DurationMillis, and std::optional<std::chrono::duration> for ExtractionDurationMillis (because we either extracted and have a known duration, or we didn't extract and don't know it; the other two combinations don't appear to be useful).

WDYT?

Ignore. My suggestion doesn't really gel with what you're trying to do. See my other comments instead.

@@ -557,14 +557,14 @@ static void AddLoggingArgs(const LoggingInfo &logging_info,

// The time in ms a command had to wait on a busy Blaze server process.
// This is part of startup_time.
if (command_wait_duration_ms.IsUnknown()) {
if (!command_wait_duration_ms.IsUnknown()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the spirit of avoiding double negatives (which I suspect might have contributed to the bug) can we call the method IsKnown() instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking some more about it: if the only consequence of IsUnknown() is that we don't set a --command_wait_time flag, why not set it to 0 in that case? The flags library can't distinguish between 0 and unset anyway.

// Value representing that a timing event never occurred or is unknown.
static constexpr uint64_t kUnknownDuration = 0;
};

// DurationMillis that tracks if an archive was extracted.
struct ExtractionDurationMillis : DurationMillis {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we really need this class: we either did an extraction and know the time it took, or we didn't and we don't; we never inspect archive_extracted, except in tests. Wouldn't a DurationMillis suffice, with "unknown" signifying "not extracted"?

@@ -731,7 +732,7 @@ LockHandle AcquireLock(const std::string& name, const blaze_util::Path& path,
}
}

*wait_time = elapsed_time;
*wait_time = elapsed_time->millis;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning a DurationMillis through an out-parameter seems less awkward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-review PR is awaiting review from an assigned reviewer team-Rules-CPP Issues for C++ rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants