Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more DTrace tooling #1585

Merged
merged 2 commits into from
Dec 6, 2024
Merged

Add more DTrace tooling #1585

merged 2 commits into from
Dec 6, 2024

Conversation

leftwo
Copy link
Contributor

@leftwo leftwo commented Dec 6, 2024

Added a get-up-state bash and d scripts for looking at the sled overall and producing a high level summary.
This script will gather some selected dtrace stats for 10 seconds, then print out a summary.

In the first example here we have 3 unique propolis-server processes.
We print a line for each PID/Session (a single PID can have multiple sessions)

  PID  SESSION DS0 DS1 DS2   NEXT_JOB DELTA CONN   ELR   ELC   ERR   ERN
 9972 b5d1cbe7 ACT ACT ACT      65953    12    3     0     0     0     0
12059 69cc7aa8 ACT ACT ACT       2095     0    3     0     0     0     0
12059 924f18ed ACT ACT ACT       1444     0    3     0     0     0     0
12059 d7e7d0fd ACT ACT ACT      30292     0    3     0     0     0     0
12172 74ddab44 ACT ACT ACT     688093    83    3     0     0     0     0
12172 a151673e ACT ACT ACT       2198     0    3     0     0     0     0

I've hacked together a summary of the downstairs states into three letters.
Not all states have three letter summaries, but I've captured the common ones.

The DELTA is the number of jobs that went through this PID/Session in the 10 seconds we were watching.
CONN is number of times the upstairs has connected to a downstairs (the sum of all client connections).
ELR is extents that have been live repaired.
ELC is extents that were checked during LR, but no repair was needed.
ERR is extents that were reconcilied (happens on startup).
ERN is the remaining number of extents we need to reconcile.

Here is another example. In this case you can see that some extents were reconciled when propolis first started.

  PID  SESSION DS0 DS1 DS2   NEXT_JOB DELTA CONN   ELR   ELC   ERR   ERN
 9200 5827dcae ACT ACT ACT      15326     0    3     0     0     0     0
11977 9ab0865f ACT ACT ACT       1309     0    3     0     0     0     0
12595 4878f9f0 ACT ACT ACT      16944     0    3     0     0     0     0
13891 fb840f9f ACT ACT ACT  464968478 38777    3     0     0   400     0
13931 d5613d2e ACT ACT ACT      94948     0    3     0     0    24     0

This status script found the #1579 bug.
Updated upstairs_count.d to include barrier operations.

Alan Hanson added 2 commits December 6, 2024 07:18
Added the get-up-state bash and d script for looking at the
sled overall and producing a high level summary.

Updated upstairs_count.d to include barrier operations.
@leftwo leftwo merged commit 2925f29 into main Dec 6, 2024
16 checks passed
@leftwo leftwo deleted the alan/dtrace-up-state branch December 6, 2024 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants