Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMS: probe checking the number and volume of rules per rse, activity and state #151

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

eachristgr
Copy link

@eachristgr eachristgr commented Nov 8, 2024

Related to dmwm/CMSRucio#747

The script acts on rules with the following states:

  • OK
  • REPLICATING
  • STUCK

The script pushes the following metrics by using PrometheusPusher:

  • rule_count_per_rse_activity_state: The number of rules per rse, activity and state
  • rule_volume_per_rse_activity_state: The total size (in bytes) of rules per rse, activity and state

Tested it on Integration and it works as far as I can see.
More details about the implementation are commented in the script.

FYI @haozturk

eachristgr added a commit to eachristgr/CMSRucio that referenced this pull request Nov 11, 2024
eachristgr added a commit to eachristgr/CMSRucio that referenced this pull request Nov 11, 2024
# - The information is retried using the following tables in the specified order: rules -> contents -> dids.
# - For these states volume information can be retrieved from contents and dids table.
# Although, only rules with container dids are counted. If a rule is created using a did of a different type it will be ignored.
# - For rules in this state, the information about the destination rse is (only maybe?) available at the rules table (ReplicationRule) under rse_expression.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please break up this and all the other long lines.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to @ericvaandering for the feedback.
I’ve reformatted the comments by breaking up the long lines. Please let me know if anything else is needed.

ericvaandering added a commit to dmwm/CMSRucio that referenced this pull request Nov 27, 2024
@eachristgr eachristgr added the CMS label Dec 2, 2024
# as the DB will need to update this index as well.
# - For this reason, handling rules of these states is better implemented using HDFS dumps.

rse = case((models.ReplicationRule.rse_expression.startswith("rse="), func.substr(models.ReplicationRule.rse_expression, 5, func.length(models.ReplicationRule.rse_expression) - 4)), else_=models.ReplicationRule.rse_expression).label("rse")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap this too. Here the parenthesis give you an easy way to do it.


with PrometheusPusher() as manager:
for rse, activity, state, count, volume in query.all():
manager.gauge(name='rule_count_per_rse_activity_state.{rse}.{activity}.{state}', documentation='Number of rules in a given rse, activity and state').labels(rse=rse, activity=activity, state=state).set(count)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And these two

with PrometheusPusher() as manager:
for rse, activity, state, count, volume in query.all():
# print(rse, activity, state.name, count, volume)
manager.gauge(name='rule_count_per_rse_activity_state.{rse}.{activity}.{state}', documentation='Number of rules in a given rse, activity and state').labels(rse=rse, activity=activity, state=state.name).set(count)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And these two

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants