-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CMS: probe checking the number and volume of rules per rse, activity and state #151
base: master
Are you sure you want to change the base?
Conversation
# - The information is retried using the following tables in the specified order: rules -> contents -> dids. | ||
# - For these states volume information can be retrieved from contents and dids table. | ||
# Although, only rules with container dids are counted. If a rule is created using a did of a different type it will be ignored. | ||
# - For rules in this state, the information about the destination rse is (only maybe?) available at the rules table (ReplicationRule) under rse_expression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please break up this and all the other long lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to @ericvaandering for the feedback.
I’ve reformatted the comments by breaking up the long lines. Please let me know if anything else is needed.
# as the DB will need to update this index as well. | ||
# - For this reason, handling rules of these states is better implemented using HDFS dumps. | ||
|
||
rse = case((models.ReplicationRule.rse_expression.startswith("rse="), func.substr(models.ReplicationRule.rse_expression, 5, func.length(models.ReplicationRule.rse_expression) - 4)), else_=models.ReplicationRule.rse_expression).label("rse") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrap this too. Here the parenthesis give you an easy way to do it.
|
||
with PrometheusPusher() as manager: | ||
for rse, activity, state, count, volume in query.all(): | ||
manager.gauge(name='rule_count_per_rse_activity_state.{rse}.{activity}.{state}', documentation='Number of rules in a given rse, activity and state').labels(rse=rse, activity=activity, state=state).set(count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And these two
with PrometheusPusher() as manager: | ||
for rse, activity, state, count, volume in query.all(): | ||
# print(rse, activity, state.name, count, volume) | ||
manager.gauge(name='rule_count_per_rse_activity_state.{rse}.{activity}.{state}', documentation='Number of rules in a given rse, activity and state').labels(rse=rse, activity=activity, state=state.name).set(count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And these two
Related to dmwm/CMSRucio#747
The script acts on rules with the following states:
OK
REPLICATING
STUCK
The script pushes the following metrics by using
PrometheusPusher
:rule_count_per_rse_activity_state
: The number of rules perrse
,activity
andstate
rule_volume_per_rse_activity_state
: The total size (in bytes) of rules perrse
,activity
andstate
Tested it on Integration and it works as far as I can see.
More details about the implementation are commented in the script.
FYI @haozturk