Releases: distributed-system-analysis/pbench
v0.73.0 (agent only)
Release notes TBD.
Installation
The basic installation notes can be found here.
There are no installation changes since the v0.72.0 release, but the release notes for that release describe some important changes. In particular, there are two pbench repos that are needed: the original pbench-repo
and the version-specific pbench-0.73
repo. The former contains some support RPMs that are needed by every release, the latter contains the RPM of the pbench-agent for the current release. The Ansible playbooks and roles take care of these details, so you are encouraged to use them for installation. Otherwise, please adjust your installation procedures accordingly. Note that you will have to modify pbench_copr_repo
to pbench-0.73
in your inventory file.
In the case where you cannot or do not want to use Ansible, the two repos can be enabled like this (but you will have to manually install the config file and ssh key file):
dnf copr enable ndokos/pbench
dnf copr enable ndokos/pbench-0.73
The RPM version you should get when installing is 0.73.0-1g7bd2a1a6a.
Changelog
The complete changelog for this release (including server and dashboard changes that are not described above) is as follows:
- 7bd2a1a Minor fixes found during release (#3541)
- 6da0cfc Enable CodeQL runs on release branches
- e00389f Version lock redis to <5.0.0 -b0.73 branch (#3526)
- b867548 Remove special handling for PCP installation We will now get the PCP RPMs from the distro repos instead of from upstream.
- e6f09c7 Add RH CA certs to the containerized Agent base container
- 7e22b8d Fix server tests on b0.73 branch (#3535)
- 1f8ac5b Set agent version (#3517)
- b4ef1dd Handle ENOSPC more cleanly (#3513)
- 9b071de Remove directory from disposition header (#3515)
- 647e9f2 Nits and stuff (#3514)
- 8db79f5 Allow deletion of uploaded relay datasets (#3512)
- b2e56ac Enhance host in Keycloak valid redirects list and TLS cert (#3503)
- c3688cd "Note" non-fatal INTAKE behaviors (#3510)
- 780adca Add pytest-timeout and refactor requirements.txt files (#3509)
- ca36407 Attempt to improve upload performance (#3501)
- 7682537 Add support for Python profiling of Server requests (#3502)
- fac8d5f Support benchmark type fallback (#3507)
- bc7e692 Update the pbench_demo script to use --server (#3508)
- db87083 Fix possible use-before-assign bug
- a97e6c9 Increase Nginx limits for internal responses (#3504)
- 2d0b189 Allow for private CA on the Agent side (#3494)
- 971ed19 Fix some problems with archive-only behavior (#3497)
- e119384 Generate Comparison Charts (#3500)
- abfc995 Fix corruption during CONFLICT upload (#3498)
- 5329c0f Improve client benchmark identification (#3496)
- e82943e Identify benchmark type (#3495)
- 4435b3d Generate charts using Quisby (#3490)
- 5e1ae3c Bug fix in api request (#3493)
- 32c6c5c UI to interact with Relay server (#3484)
- 26b1349 Sweep one missed piece of crontab fluff (#3491)
- 1361bcc Gracefully report duplicate usernames (#3481)
- 0714e72 Remove working containers when commiting a container image (#3488)
- 30ddc0b Fix OIDC access to verify TLS connection (#3487)
- aaf890c crontab exorcism (#3483)
- ec6f42b Provide dataset summary info on upload (#3480)
- 3ab34f9 Gracefully handle connection errors in RELAY API (#3482)
- 60f13e1 Correct OIDC URL generation and raise max payload size (#3486)
- 61be9df Harmonize extract with API logic (#3473)
- b6d63ed Fixes to ansible roles (#3479)
- 9008b1e Add ADMIN roles through config file (#3475)
- 75d4d91 Re-enable the dev dashboard (#3477)
- d6b2ed9 Clarify what tool-specific options are (#3444) (#3462)
- 7d5defa Fix transient "Area 51" failures (#3476)
- 8d54280 Make
jenkins/runlocal
platform independent (#3474) - 3b8bf2f Add TLS on Keycloak server (#3427)
- 775da21 Fix handling of labeled hosts by pbench-postprocess-tools (fwd port of #3456) (#3472)
- bfdc613 Add URI to TOC API (#3471)
- 74ce886 Fixes for user-tool (forward port to main of #3440) (#3465)
- 1eebe0f Compare datasets - Integrate Quisby into Pbench Server API (#3470)
- 999f797 PBENCH-1014 Using Tarball.extract in Inventory API for extracting files from tarball (#3105)
- fa225e2 Remove HTTP access to the canned Pbench Server (#3451)
- 68d543d PBENCH-1127 Implementation of Quisby API (#3463)
- d8e6b81 Work around requests version conflict (#3469)
- c2b7472 Add basic relay support to Pbench Agent (#3460)
- b091da3 Fix dashboard logout (#3468)
- 8561ce6 Fix JWT OIDC decode yet again (#3466)
- fb64f6f Add OpenSSL to container build (#3467)
- eb37ffc update readthedocs config and add installation docs (#3436)
- d302664 User Profile Page UI (#3459)
- 12708ed Dust off the
README.md
and pick some other nits (#3452) - b99fc77 Disable redis protected mode (#3434) (#3458)
- 4826076 Man pages for pbench-agent commands (#3442)
- b9aebb4 Dashboard API key copy button (#3450)
- c810bf1 Build fail fix (#3455)
- 22ade07 Pbench Dashboard Doc (#3431)
- fd290c4 Add support for SSL access to the Pbench Server
- 1f4087f Adapt to the change from F36 to F38 in the CI base container. (#3449)
- 2b2fbfb Update CI container to Fedora-38 (#3433)
- 6b25590 Specify version number of the dashboard in footer (#3447)
- 1dbad11 Some API documentation fixes (#3441)
- f5a6b12 Touch up Content-Disposition header (#3439)
- f417cc1 Implement POST /api/v1/relay (#3425)
- 6d766a7 Don't report API Keys as errors (#3438)
- d336170 PBENCH-1031 - CacheManager CacheMap (#3396)
- 905fe4e PBENCH-1165 (#3437)
- a2d0f01 Remove Fedora-36 support from the Agent Update default usage to F38.
- 5b2e9ed PBENCH-1155 (#3430)
- 5f1d2c0 PBENCH-1153 (#3429)
- 64449cc Restructure Readthedocs documentation (#3428)
- 49ffc55 Open firewall ports immediately as well as permanently (#3419)
- e26d925 Dashboard API Key Management (#3423)
- 5d3ac3c Allow keeping functional test datasets (#3418)
- e5d4efa Ansible fixes (#3420)
- 9509d23 GET and DELETE method for managing api_key (#3410)
- ddc3a81 Fix the section name in wait_for_oidc_server function (#3415)
- 10d5704 Unit test on jwt exception type instead of string (#3417)
- 918d435 Update CodeQL scanning - exclude certain Python files - scan only certain Javascript directories
- 54c53f2 Support for dynamic pagination (#3386)
- 0a5a032 Return "raw" API parameters in pagination (#3412)
- 245bb0d Restore flexibility in keycloak-js dependency (#3391)
- 872e79b Get the uri schema from NGINX (#3413)
- 97517b8 Build pbench-tools-kill (#3405)
- 87822ef Unix socket for the nginx and gunicorn communication (#3411)
- 5feaef9 Restore floating Flask / Werkzeug dependency (#3409)
- 0e3badf Check for API key on any failure to validate access token
- 7dc871e Add API Key functional testing (#3401)
- 92fed82 Add invalid algorithm exception check (#3399)
- a9175cd Lock flask and Werkzeug below 2.3.0 for now (#3400)
- 0d5dff1 Add
AuthToken.API_KEY
SQL ENUM (#3398) - ded34c1 Add fedora-38 to the default container list for the Agent build
- 6d9c40f Add typed filtering (#3385)
- d1977e2 Avoid Sync message overflow (#3389)
- 2108087 Allow user to change
server.deletion
date (#3237) - f3bea9f Retire the previous containerized Agent example from contrib and update the README.md.
- a2ad6c2 Add explicit test_auth dependencies (#3392)
- c74bd4f Quick fix (#3390)
- 5fa129e Reinstate mistakenly deleted test (#3379)
- 473463f Remove pbench-generate-token agent CLI functionality (#3383)
- e6c3a54 Generation of API key on pbench-server (#3368)
- 6c8bb0a Remove directory check, replace with index.html
- 9a57940 Fix various problems in the pbench_results_push tests (#3378)
- 677ae18 Support generic sorting (#3373)
- 6a5b263 Update Nginx log format to
v2
- e0dc06d pbench-results-push: logging fixes and better reporting (#3348)
- 0745bb5 Fix a typo in jenkins/runlocal which inadvertently revokes RHEL 9 support
- af615a2 Integrate daterange into datasets API (#3371)
- 82cc01b API documentation improvements (#3365)
- 2914daf Remove crontab (#3369)
- c10bb8a Api (#3367)
- 78dd636 Move set-expiration of Server container to a conditional pipeline stage
- 8d54f43 Sync openid-connect branch with main
- 5c0c0d4 Get OIDC access tokens once the authentication redirect is successful (#3250)
- 16ffcb8 Rework the User table (#3251)
- 06b0718 Enable OIDC redirect in dashboard (#3233)
- 0c92e77 Add OIDC user to the functional test (#3235)
- 65e7d36 Use of oidc configuration in the server unit test as default (#3249)
- 7175eb0 Make the OAuth2 client public in our Keycloak config (#3243)
- b0f3624 Remove unused userinfo and online token validation (#3239)
- c64793f Remove hyphen from openid-connect section of the endpoints api (#3241)
- 4a92493 Fix (and test) some /server/audit query parameters (#3362)
- 114d321 Update nginx JSON to add missing
type
andhost
fields - 8932900 Update metadata (#3361)
- 8518937 Update contrib scripts for Perfconf demo (#3363)
- 2d1e6af Fix /datasets?filter to select across namespaces (#3359)
- 91980b0 Add Prettier formatting check to the Dashboard lint check (#3358)
- afb9bef Support limited non-alphanumeric metadata keys (#3354)
- 97e17ca Gracefully handle duplicate name with unique MD5 (#3355)
- 1dc3a3e Pagination fix (#3344)
- 1f5cd39 Expose aggregate metadata namespace for UI (#3345)
- ab2ff68 Config file changes (#3349)
v0.72.2 (agent only)
This release fixes a problem that was introduced with the release of v5.0.0 of the python redis module. That broke the tool-meister. We "fix" the problem by version-locking the module to its pre-5.0.0 incarnation. A more comprehensive fix is not yet available, so the imminent v0.73 release will use the same version lock "fix".
The RPM version you should get when installing is 0.72.2-1g366e5323e.
v0.72.1 (agent only)
This release is a point release for v0.72.0. It fixes one problem that was found just as we were releasing v0.72.0. The problem was that pbench-postprocess-tools
was not handling labeled hosts correctly (labels may be attached to hosts during tool registration). See issue #3454 and PR #3456 for a fuller description.
Installation is as described in the v0.72.0 release notes. The version of the pbench-agent RPM for this release is 0.72.1-2gf8fb65d92.
For the rest of the changes, see the v0.72.0 release notes.
v0.72.0 (agent only)
This is a minor release of the Pbench agent. It consists mostly of bug fixes and deletions of deprecated components that were announced previously.
The most visible parts of the changes are summarized below. The full change log can be found at Changelog, but note that most of the 408 commits are not for the agent: the list includes server and dashboard changes which are already incorporated into the current production Pbench server and dashboard.
Installation
The COPR repo names have changed: the pbench-agent
RPM is now found in the pbench-0.72 repo. The reason for this change was that COPR does not allow us to keep different versions of the RPM in the same repo: it deleted the older one as soon as a newer one was built. We needed that capability however, so we chose to go with separate repos for each release.
There are some RPMs that are shared between versions (e.g. pbench-sysstat
). We maintain those in the original COPR pbench repo . The upshot is that the user now has to install two COPR repos (if doing it manually). The Ansible roles have been modified to do that, so if you are installing through Ansible, you don’t need to worry about that, except for adding the following line to the [servers:vars]
section of the inventory file:
pbench_repo_name = pbench-0.72
The inventory file should look like this:
[servers]
<host1>
<host2>
...
[servers:vars]
pbench_repo_name = pbench-0.72
pbench_key_url = <URL to directory containing the key>
pbench_config_url = <URL to directory containing the config file>
The version of the pbench-agent
RPM for this release is 0.72.0-1g4391fbc01.
Ansible roles
New Ansible roles have been uploaded to Ansible Galaxy, so you will need to update your installation of those roles:
ansible-galaxy collection install pbench.agent -f
In addition, there is no default for pbench_repo_name
any longer (see above for an explanation). You will need to set it in your inventory file like this:
...
[servers:vars]
pbench_repo_name = pbench-0.72
pbench_key_url = <URL for ssh key>
pbench_config_url = <URL for config file>
New user-visible utilities
The pbench-tools-kill
script has been added. This is a new utility intended to provide complete cleanup if some Tool Meister component refuses to start. This is usually because there are old processes running and keeping network ports busy. The tool cleans up such errant processes so you can start afresh.
Bug fixes and enhancements
The Tool Meister subsystem has undergone a few fixes and some enhancements, primarily in logging and reporting of status; also, the state-signals work was integrated into the Tool Meister (thanks Mustafa Eyceoz!). pbench-linpack
has had some fixes (primarily thanks to Lukas Doktor). In addition, pbench-specjbb
, pbench-uperf
and pbench-fio
have had bug fixes.
The user-tool
script was broken and that necessitated a few changes to the Tool Meister and also to pbench-postprocess-tools
. Thanks to Keith Valin for finding the problem and to Michey Mehta for debugging it.
As usual, if you find problems, please open an issue on Github.
Support for latest RHEL and Fedora versions
V0.72.0 supports RHEL 8.8, RHEL9.2 and Fedora 37 and 38, in addition to the previously supported RHEL versions. Fedora 36 is not supported any longer (primarily because COPR has dropped it).
Deprecation notices and deletions of previously deprecated items
A default
tool set was implicitly used by pbench-register-tool-set
. It was deprecated in v0.71.0 and is still deprecated in v0.72.0 - it will finally go away in the next release and you will need to choose a tool set explicitly when registering tools. The name for what used to be the default tool set is legacy
. In addition, there are light
, medium
and heavy
tool sets. Not supplying an argument for the tool set is still a warning but it is going to become an error in the next release.
The pbench-generate-token
command (see the "Futures" section below) is deprecated and will be deleted in the next release of the agent.
The following have been previously deprecated and have now been deleted: pbench-run-benchmark
, pbench-cyclictest
, pbench-dbench
, pbench-iozone
, pbench-migrate
, pbench-netperf
. In addition, two contributed bench-scripts, pbench-bzt
and pbench-mpt
, have been deleted.
The stockpile
subproject has been removed, as well as the script pbench-avg-stddev
which was unused.
Futures
The following describes some details about future directions. One component of that is containerization. The Pbench server on production is already running in a container. Here we describe the current, experimental version of the containerized agent. The second component is user authentication and ownership of datasets. That is work in progress and what we describe here is what is available in the v0.72.0 release of the Pbench agent and the current production Pbench server.
This section is meant as a foretaste of things to come. We expect most users to continue using RPMs for installing the agent and the pbench-move-results
(or pbench-copy-results
) utility to upload datasets, just as with previous versions of the Pbench agent.
If you'd like to kick the tires a bit, read on and feel free to experiment, although we recommend that you don't use the following for "real" data. Things should work but that's not guaranteed: if you do venture forth and encounter problems, we would really like to know about them. Thanks in advance!
Containerized Pbench agent
There is an experimental Pbench agent container, intended as a demonstration project, available in the contrib/containerized-pbench directory of the b0.72 branch (the branch that was used to cut this release). The directory contains a README file, a pbench
command and a demo script, pbench-demo
.
The demo script (which is to be thought of more as "executable documentation" than anything else at this point) uses the pbench
command to execute a series of commands inside a container. The first time that the pbench
command runs, it realizes that there is no container yet, so it downloads the pbench-agent-all-fedora-36:b0.72
image from quay.io
and starts the container. It then executes the first command that it was given inside that container. Subsequent invocations of the pbench
command execute their arguments inside that container, first registering tools, then listing the tools, then running a simple fio
benchmark under pbench-user-benchmark
and finally pushing the results to the configured Pbench server. Although this is a very simple set of commands, it indicates how things would go in a more complicated invocation.
There are a couple of significant caveats: this version of the demo script does NOT use pbench-move-results
to send the results to the server (although it be could modified to do so). Instead it passes an authentication token to the pbench-results-move
command (see below) to push the results. That token is generated by the pbench-generate-token
script, which is invoked at the very beginning of the demo script: that script asks for a user ID and a password and then generates and stores that token in a file (the file is stored in a directory which is mapped into the container from the outside, so the token persists beyond the run of the demo script). That means you have to have a user ID and a password on the Pbench server before generating the token.
To create a user ID with a password, you have to visit the Pbench dashboard and click on Login
in the upper right. That will pop up a login/sign up
dialog through which you can create an account that will then allow you to generate a token (or login to the dashboard and look around). N.B. All data sent to the AWS Pbench Satellite or pbench.app.intlab.redhat.com
“pass-through” server is owned by a legacy
user: it’s all visible, but can’t be modified.
The trouble is that this is a very temporary arrangement: we expect that very soon, you will be able to use Red Hat SSO for logging in and generating the document. The accounts created as above will go away, as will the pbench-generate-token
script which is already deprecated. Any datasets submitted through this mechanism will therefore be orphaned, hence the imprecation to use this to kick the tires, not for storing "real" results that you don't want to lose. We are NOT planning to migrate any such results.
New utilities to upload datasets as an authenticated user
The existing pbench-move-results
command works in exactly the same way as before: it uses ssh/scp
to copy the results to a pass-through server, but there is no notion of a user owning those
results. Although we expect that to continue to be the main mode of operation for users of v0.72.0, we are moving towards a future where users will authenticate using SSO and that authenticated identity will become the owner of the datasets that are submitted by that user. The new commands pbench-results-move
and pbench-results-push
use an HTTP PUT
to send results to a v1.0 Pbench Server which now provides a RESTful API to its services (pbench-results-push
is used by the pass-through server to send the results to a Pbench server using a legacy
user ID). The new commands will eventually supplant the existing `pbench-move-resu...
Pbench Server release updating v0.69.10
The v0.69.10-server
release is a maintenance release for the Pbench Server to remove the use of pbench-report-status
CLI interface from all the "Background Tasks" (cron jobs), and add JSON log records for reporting purposes (e.g. see mmjsonparse
module of rsyslog
).
v0.71.0 (agent only)
This is a very significant "minor" release of the pbench-agent code base, primarily to deliver the new "Tool Meister" sub-system.
NOTE WELL:
- The notion of a "default" tool set is being deprecated and will be removed in the upcoming Pbench Agent v1.0 release. To replace it, the Pbench Agent is introducing a few named tool sets. See "Default Tool Set is Deprecated; Named tool sets introduced" below.
- All tools registered prior to installing
v0.71
must be re-registered; tools registered locally, or remotely, on a host with v0.69 or earlier version of thepbench-agent
will be ignored. See "Tool registration kept local to the host where registration happens" below.
This release also delivers:
- Support for RHEL 9 & CentOS Stream 9
- Support of Prometheus and PCP tool data collection
- Independence of Pbench Agent "tool" Scripts
- Removal of gratuitous manipulation of networking firewalls
- Removal of gratuitous software installation, only checks for requirements
- True for both tools and benchmark convenience script requirements
- Change to check command versions instead of RPM versions for
pbench-fio
,pbench-linpack
, andpbench-uperf
- The
pbench-linpack
benchmark convenience script now provides result graphs, JSON data files, and supports execution on one or more local / remote hosts - Required use of
--user
withpbench-move-results
/pbench-copy-results
- Support for the new HTTP PUT method of posting tar balls
- Removal of the dependency on the SCL (Software Collections Library)
- Dropped support for the
pbench-trafficgen
benchmark convenience script - Deprecation announcements for unused benchmark convenience scripts:
pbench-run-benchmark
,pbench-cyclictest
,pbench-dbench
,pbench-iozone
,pbench-migrate
, andpbench-netperf
- Semi-Public CLI Additions, Changes, and Removals
- Many, many, bug fixes and behavioral improvements
You can review the Full ChangeLog on GitHub (all 560+ commits, tags b0.69-bp
to v0.71.0
), or read a summary with relevant details below.
We did not bump the "major" release version number with these changes because we still don't consider all the necessary functionality in place for such a major version bump.
Note that work on the v0.71
release started in earnest with the v0.69.3-agent
release (tagged as b0.69-bp
). A number of bug fixes and behaviors from the v0.71
work have already been back-ported and delivered in the various v0.69.*
releases since then. These release notes will highlight only the behavioral changes that have not been back-ported previously.
This release supports RHEL 7.9, RHEL 8.6, RHEL 9, CentOS-Stream 8, CentOS-Stream 9 and Fedora 35. For various reasons, it does NOT support earlier versions of RHEL 8 (ansible vs. ansible-core dependency problem), RHEL 9.1 (missing repos) or Fedora 36 (python-3.10 problems). If you need support for any of these, please talk to us: we will do our best to accommodate you in some way, but there is no guarantee.
Installation
There are no installation changes in this release: see the Getting Started Guide for how to install or update.
After installation or update, you should have version 0.71.0-3g85910732a
of the pbench-agent
RPM installed.
RPMs are available from Fedora COPR, covering Fedora 35 (x86_64
only), EPEL 7, 8, & 9 (x86_64
and aarch64
), and CentOS Stream 8 & 9 (x86_64
and aarch64
), but please note there are problems with some distros as described above.
There are Ansible playbooks available via Ansible Galaxy to install the pbench-agent
, and the pieces needed (key and configuration files) to be able to send results to a Pbench Server. To use the RPMs provided above via COPR with the playbooks, your inventory file needs to include the fedoraproject_username
variable set to ndokos
, for example:
...
[servers:vars]
fedoraproject_username = ndokos
pbench_repo_name = pbench-test
...
Alternatively, one can specify fedoraproject_username
on the command line, rather than having it specified in the inventory file:
ansible-playbook -i <inventory> <playbook> -e '{fedoraproject_username: ndokos}' -e '{pbench_repo_name: pbench-test}'
NOTE WELL: If the inventory file also has a definition for pbench_repo_url_prefix
(which was standard practice before fedoraproject_username
was introduced), it needs to be deleted, otherwise it will override the default repo URL and the fedoraproject_username
change will not take effect.
While we don't include installation instructions for the new node-exporter
and dcgm
tools in the published documentation, you can find a manual installation procedure for the Prometheus "node_exporter" and references to the Nvidia "DCGM" documentation in the agent/tool-scripts/README
.
Container images built using the above RPMs are available in the Pbench organization in the Quay.io container image repository using tags latest
, v0.71.0
, and 0b7f55850
.
Summary of Changes
Default Tool Set is Deprecated; Named tool sets introduced
The notion of a "default" tool set is being deprecated and will be removed in the upcoming Pbench Agent v1.0 release. In preparation for this deprecation, we have added additional named tool sets for users to consider replacing the "default" tool set.
This deprecation announcement is to address the very heavy-weight tools employed by the "default" tool set, including pidstat
, proc-interrupts
, and perf
(aka perf record
).
The four named tool sets added are:
legacy
:iostat
,mpstat
,perf
,pidstat
,proc-interrupts
,proc-vmstat
,sar
,turbostat
(the current "default" tool set)light
:vmstat
medium
:${light}
,iostat
,sar
(this will be the new default tool set Pbench Agent v1.0)heavy
:${medium}
,perf
,pidstat
,proc-interrupts
,proc-vmstat
,turbostat
Users are not required to use the pre-defined tool sets: a user may register whatever tools they like; or, a user may define a custom, named tool set in /opt/pbench-agent/config/pbench-agent.cfg
(follow the pattern of the default tool set definitions in /opt/pbench-agent/config/pbench-agent-default.cfg
-- note, we don't support modifications to the default configuration file).
In addition to the "default" tool set deprecation, the --toolset
option is also deprecated and will be removed with the Pbench Agent v1.0 release. This is due to the fact that a tool set name will also be required going forward with the v1.0 release.
As a reminder, if you are using the "default" tool set, you need to ensure the pbench-sysstat
, perf
, and kernel-tools
(which provides turbostat
) RPMs are installed.
Support for RHEL 9 & CentOS Stream 9
Support for RHEL & CentOS Stream 9 is provided in this release.
The New "Tool Meister" Sub-System
The "Tool Meister" sub-system (introduced by PR #1248) is the major piece of functionality delivered with the release of v0.71
of the pbench-agent.
This is a significant change, where the pbench-agent first orchestrates the instantiation of a "Tool Meister" process on all hosts registered with tools, using a Redis instance to coordinate their operation, and the new "Tool Data Sink" process handles the collection of data into the pbench run directory hierarchy. This effectively eliminates all remote SSH operations for individual tools except the initial one per host to create each Tool Meister instance.
One Tool Meister instance is created per registered host, and then a single Tool Data Sink instance is created on the host where the benchmark convenience script is run. The Tool Meister instances are responsible for running the registered tools on their respective host, collecting the data generated as appropriate. The Tool Data Sink is responsible for collecting and storing locally all data sent to it from the deployed Tool Meister instances.
User-Controlled Orchestration of "Tool Meister" Sub-System via Container Images
Container images are provided for the constituent components of the Tool Meister sub-system, the Tool Meister image and the Tool Data Sink image. The images allow for the orchestration of the Tool Meister sub-system to be handled by the user instead of automatically by the pbench-agent.
The "Tool Meister" Sub-System with No Tools
While this is not a new feature of the Pbench Agent, it is worth noting that when no tools are registered, the "Tool Meister" sub-system is not deployed and the bench scripts still execute normally.
Tool registration kept local to the host where registration happens
Along with the new "Tool Meister" sub-system comes a subtle, but significant, change to how tools are registered.
Prior to v0.71, tool registration for remote hosts was recorded locally, and also remotely via ssh.
With v0.71, tools are recorded only locally when they are registered and the validation of remote hosts is deferred until the workload is run. During its initialization, the Tool Meister sub-system now reports when registered tools are not present on registered hosts, and, if a tool is not installed, an error message will be displayed, and the "bench-script" will exit with a failure code.
The registered tools are recorded in a local directory off of the "pbench_run" directory, by default /var/lib/pbench-agent/tools-v1-<name>
, where <name>
is the name of the Tool Group under which the tools were registered.
...
v0.71.0-beta.0 (agent-only)
This is a very significant "minor" release of the pbench-agent code base, primarily to deliver the new "Tool Meister" sub-system.
It also delivers:
- Support for RHEL 9 & CentOS Stream 9
- Tool registration kept local to the host where registration happens
- Support of Prometheus and PCP tool data collection
- Independence of Pbench Agent "tool" Scripts
- Reduction of the default tool set to
iostat
,sar
, &vmstat
tools - Removal of gratuitous manipulation of networking firewalls
- Removal of gratuitous software installation, only checks for requirements
- True for both tools and benchmark convenience script requirements
- Change to check command versions instead of RPM versions for
pbench-fio
,pbench-linpack
, andpbench-uperf
- The
pbench-linpack
benchmark convenience script now provides result graphs, JSON data files, and supports execution on one or more local / remote hosts - Required use of
--user
withpbench-move-results
/pbench-copy-results
- Support for the new HTTP PUT method of posting tar balls
- Removal of the dependency on the SCL (Software Collections Library)
- Dropped support for the
pbench-trafficgen
benchmark convenience script - Deprecation announcements for unused benchmark convenience scripts:
pbench-run-benchmark
,pbench-cyclictest
,pbench-dbench
,pbench-iozone
,pbench-migrate
, andpbench-netperf
- Semi-Public CLI Additions, Changes, and Removals
- Many, many, bug fixes and behavioral improvements
You can review the Full ChangeLog on GitHub (all 560+ commits, tags b0.69-bp
to v0.71.0-beta.0
), or read a summary with relevant details below.
We did not bump the "major" release version number with these changes because we still don't consider all the necessary functionality in place for such a major version bump.
Note that work on the v0.71
release started in earnest with the v0.69.3-agent
release (tagged as b0.69-bp
). A number of bug fixes and behaviors from the v0.71
work have already been back-ported and delivered in the various v0.69.*
releases since then. These release notes will highlight only the behavioral changes that have not been back-ported previously.
Installation
There are no installation changes in this release: see the Getting Started Guide for how to install or update.
After installation or update, you should have version 0.71.0-XXgXXXXXXXXX
of the pbench-agent
RPM installed.
RPMs are available from Fedora COPR, covering Fedora 34, 35, 36, EPEL 7, 8, 9, and CentOS Stream 8 & 9.
There are Ansible playbooks available via Ansible Galaxy to install the pbench-agent
, and the pieces needed (key and configuration files) to be able to send results to a Pbench Server. To use the RPMs provided above via COPR with the playbooks, your inventory file needs to include the fedoraproject_username
variable set to ndokos
, for example:
...
[servers:vars]
fedoraproject_username = ndokos
pbench_repo_name = pbench-test
...
Alternatively, one can specify fedoraproject_username
on the command line, rather than having it specified in the inventory file:
ansible-playbook -i <inventory> <playbook> -e '{fedoraproject_username: ndokos}' -e '{pbench_repo_name: pbench-test}'
NOTE WELL: If the inventory file also has a definition for pbench_repo_url_prefix
(which was standard practice before fedoraproject_username
was introduced), it needs to be deleted, otherwise it will override the default repo URL and the fedoraproject_username
change will not take effect.
While we don't include installation instructions for the new node-exporter
and dcgm
tools in the published documentation, you can find a manual installation procedure for the Prometheus "node_exporter" and references to the Nvidia "DCGM" documentation in the agent/tool-scripts/README
.
Container images built using the above RPMs are available in the Pbench organization in the Quay.io container image repository using tags beta
, v0.71.0-XX
, and XXXXXXXXX
.
Summary of Changes
Support for RHEL 9 & CentOS Stream 9
Support for RHEL & CentOS Stream 9 is provided in this release.
The New "Tool Meister" Sub-System
The "Tool Meister" sub-system (introduced by PR #1248) is the major piece of functionality delivered with the release of v0.71
of the pbench-agent.
This is a significant change, where the pbench-agent first orchestrates the instantiation of a "Tool Meister" process on all hosts registered with tools, using a Redis instance to coordinate their operation, and the new "Tool Data Sink" process handles the collection of data into the pbench run directory hierarchy. This effectively eliminates all remote SSH operations for individual tools except the initial one per host to create each Tool Meister instance.
One Tool Meister instance is created per registered host, and then a single Tool Data Sink instance is created on the host where the benchmark convenience script is run. The Tool Meister instances are responsible for running the registered tools for that host, collecting the data generated as appropriate. The Tool Data Sink is responsible for collecting and storing locally all data sent to it from the deployed Tool Meister instances.
User Orchestration of "Tool Meister" Sub-System
Container images are provided for the constituent components of the Tool Meister sub-system, the Tool Meister image and the Tool Data Sink image. The images allow for the orchestration of the Tool Meister sub-system to be handled by the user instead of automatically by the pbench-agent.
The "Tool Meister" Sub-System with No Tools
While this is not a new feature of the Pbench Agent, it is worth noting that when no tools are registered, the "Tool Meister" sub-system is not deployed and the bench scripts still execute normally.
All Tool Registration Handled Locally
Along with the new "Tool Meister" sub-system comes a subtle, but significant, change to how tools are registered.
Prior to v0.71, tool registration for remote hosts was recorded locally, and also remotely via ssh.
With v0.71, tools are recorded only locally when they are registered and the validation of remote hosts is deferred until the workload is run. During its initialization, the Tool Meister sub-system now reports when registered tools are not present on registered hosts, and, if a tool is not installed, an error message will be displayed, and the "bench-script" will exit with a failure code.
The registered tools are recorded in a local directory off of the "pbench_run" directory, by default /var/lib/pbench-agent/tools-v1-<name>
, where <name>
is the name of the Tool Group under which the tools were registered.
The process of registering tools on local or remote hosts no longer validates that those tools are available during tool registration. The Tool Meister sub-system now reports when registered tools are not present on registered hosts before beginning a benchmark workload. An error message will be displayed, and the particular "bench-script" will exit with a failure code.
All tools registered prior to installing v0.71
must be re-registered; tools registered locally or remotely on a host with v0.69 or earlier of the pbench-agent
will be ignored.
New Support for Prometheus and PCP-based Tools
The new "Tool Meister" sub-system enables support of Prometheus and PCP-based tools for data collection.
The existing tools supported prior to the v0.71 release can be categorized as "Transient" tools. By transient we mean that a given tool is started immediately before and stopped immediately after the execution of a benchmark workload. For example, when using pbench-fio -b 4,16,32 -t read,write
, the transient tools are started immediately before each fio
job is executed, and stopped immediately following its completion, for each of the six fio
jobs that would be run.
A new category is introduced for Prometheus and PCP called "Persistent" tools. Persistent tools are started once at the beginning of a benchmark convenience script and stopped at its end. Using the previous pbench-fio
example, persistent tools would be started before any of the six pbench-fio
jobs begin and would be stopped once all six end.
When persistent tools are used, data is continuously collected from the data sources ("exporters", in the case of Prometheus, and "PMCDs", in the case of PCP) and stored local to the execution of the Tool Data Sink.
Note that for transient tools, where data for the transient tool is collected locally on the host the tool is registered, the collected data is usually sent to the Tool Data Sink when the benchmark workload finishes, though in some cases the data won't be sent until the very end to avoid impacting the behavior of the benchmark workload (e.g. pbench-specjbb2005
).
Prometheus tools: node-exporter
and dcgm
Two new pbench "tools" have been added, node-exporter
and dcgm
. If either or both of these new tools is registered (e.g. via pbench-register-tool --name=node-exporter --remotes=a.example.com
), then the Tool Meister sub-system will run the node_exporter
code on the hosts (in this case, a.example.com
) and a local instance of Prometheus to collect the data. The collected Prometheus data is stored in the pbench result directory as a tar ball at: ${pbench_run}/<script>_<config>_YYYY.MM.DDTHH.mm.ss/tools-<group>/prometheus
.
...
v0.71.0-alpha.0 (agent-only)
This is a very significant "minor" release of the pbench-agent code base, primarily to deliver the new "Tool Meister" sub-system.
It also delivers:
- Support for RHEL 9 & CentOS 9
- Tool registration kept local to the host where registration happens
- Support of Prometheus and PCP tool data collection
- The default tool set has been reduced to
iostat
,sar
, &vmstat
tools - Removal of gratuitous manipulation of networking firewalls
- Removal of gratuitous software installation, only checks for requirements
- True for both tools and benchmark script requirements
- Change to check command versions instead of RPM versions for
pbench-fio
,pbench-linpack
, andpbench-uperf
- The
pbench-linpack
benchmark script now provides result graphs, JSON data files, and supports execution on one or more local / remote hosts - Required use of
--user
withpbench-move-results
/pbench-copy-results
- Support for the new HTTP PUT method of posting tar balls
- Removal of the dependency on the SCL (Software Collections Library)
- Support for
pbench-trafficgen
benchmark script dropped entirely - Deprecation announcements for unused benchmark convenience scripts:
pbench-run-benchmark
,pbench-cyclictest
,pbench-dbench
,pbench-iozone
,pbench-migrate
, andpbench-netperf
- Many, many, bug fixes and behavioral improvements
You can review the Full ChangeLog on GitHub (all 550+ commits, tags b0.69-bp
to v0.71.0-alpha.0
), or read a summary with relevant details below.
We did not bump the "major" release version number with these changes because we still don't consider all the necessary functionality in place for such a major version bump.
Note that work on the v0.71
release started in earnest with the v0.69.3-agent
release (tagged as b0.69-bp
). A number of bug fixes and behaviors from the v0.71
work have already been back-ported and delivered in the various v0.69.*
releases since then. These release notes will highlight only the behavioral changes that have not been back-ported previously.
Installation
There are no other installation changes in this release: see the Getting Started Guide for how to install or update.
After installation or update, you should have version 0.71.0-XXgXXXXXXXXX
of the pbench-agent
RPM installed.
RPMs are available from Fedora COPR, covering Fedora 35, 36, EPEL 7, 8, 9.
There are Ansible playbooks available via Ansible Galaxy to install the pbench-agent
, and the pieces needed (key and configuration files) to be able to send results to a server. To use the RPMs provided above via COPR with the playbooks, an inventory file needs to include the fedoraproject_username
variable set to portante
, for example:
...
[servers:vars]
fedoraproject_username: portante
...
Alternatively, one can specify fedoraproject_username
on the command line, rather than having it specified in the inventory file:
ansible-playbook -i <inventory> <playbook> -e '{fedoraproject_username: portante}'
NOTE WELL: If the inventory file also has a definition for pbench_repo_url_prefix
(which was standard practice before fedoraproject_username
was introduced), it needs to be deleted, otherwise it will override the default repo URL and the fedoraproject_username
change is not going to take effect.
While we don't include installation instructions for the new node-exporter
and dcgm
tools in the published documentation, you can find a manual installation procedure for the Prometheus "node_exporter" and references to the Nvidia "DCGM" documentation in the agent/tool-scripts/README
.
Container images built using the above RPMs are available in the Pbench organization in the Quay.io container image repository using tags beta
, v0.71.0-XX
, and XXXXXXXXX
.
Summary of Changes
Support for RHEL 9 & CentOS 9
Support for RHEL & CentOS 9 is provided in this release. Note that since RHEL 9 has not been GA'd yet there might still be some changes that will have to be made to support it.
The New "Tool Meister" Sub-System
The "Tool Meister" sub-system (introduced by PR #1248) is the major piece of functionality delivered with the release of v0.71
of the pbench-agent.
This is a significant change, where the pbench-agent first orchestrates the instantiation of a "Tool Meister" process on all hosts registered with tools, using a Redis instance to coordinate their operation, and the new "Tool Data Sink" process handles the collection of data into the pbench run directory hierarchy. This effectively eliminates all remote SSH operations for individual tools except one per host to orchestrate the creation of the Tool Meister instance.
One Tool Meister instance is created per registered host, and then a single Tool Data Sink instance is created on the host where the benchmark script is run. The Tool Data Sink is responsible for collecting and storing locally all data sent to it from the deployed Tool Meister instances.
User Orchestration of "Tool Meister" Sub-System
Container images are provided for the constituent components of the Tool Meister sub-system, the Tool Meister image and the Tool Data Sink image. The images allow for the orchestration of the Tool Meister sub-system to be handled by the user instead of automatically by the pbench-agent.
The "Tool Meister" Sub-System with No Tools
While this is not a new feature of the Pbench Agent, it is worth noting that when no tools are registered, the "Tool Meister" sub-system is not deployed and the bench scripts still execute normally.
All Tool Registration Handled Locally
Along with the new "Tool Meister" sub-system comes another subtle, but significant, change to how tools are registered.
With the v0.71 release, the record of which tools are registered on which hosts is kept local to the host on which pbench-register-tool
or pbench-register-tool-set
are invoked.
Prior to v0.71, tool registration for remote hosts was recorded locally, and remotely via ssh.
The registered tools are recorded in a local directory off of the "pbench_run" directory, by default /var/lib/pbench-agent/tools-v1-<name>
, where <name>
is the name of the Tool Group under which the tools were registered.
The process of registering tools on local or remote hosts no longer validates that those tools are available during tool registration. The Tool Meister sub-system now reports when registered tools are not present on registered hosts before beginning a benchmark run. An error message will be displayed, and the particular "bench-script" will exit with a failure code.
All tools registered prior to installing v0.71.0-alpha
must be re-registered; tools registered locally or remotely on a host with v0.69 or earlier of the pbench-agent
will be ignored.
New Support for Prometheus and PCP-based Tools
The new "Tool Meister" sub-system enables support of Prometheus and PCP-based tools for data collection.
The existing tools supported prior to the v0.71 release can be categorized as "Transient" tools. By transient we mean that a given tool is started and stopped immediately around the execution of a benchmark workload. For example, when using pbench-fio -b 4,16,32 -t read,write
, the transient tools are started immediately before each fio
job is executed, and stopped immediately following its completion, for each of the 6 (six) fio
jobs that would be run.
A new category is introduced for Prometheus and PCP called "Persistent" tools. Persistent are started once at the beginning of a benchmark script, stopped at its end. Using the previous pbench-fio
example, persistent tools would be started before any of the 6 (six) pbench-fio
jobs begin, and would be stopped once all six end.
When persistent tools are used, data is continuously collected from the data sources ("exporters", in the case of Prometheus, and "PMCDs", in the case of PCP) and stored local to the execution of the Tool Data Sink.
Note that for transient tools, where data for the transient tool is collected locally on the host the tool is registered, the collected data is sent to the Tool Data Sink when the benchmark script deems it won't impact behavior of the benchmark itself.
Prometheus tools: node-exporter
and dcgm
Two new pbench "tools" have been added, node-exporter
and dcgm
. If one registers either or both of these new tools (e.g. via pbench-register-tools --name=node-exporter
), then the Tool Meister sub-system will run the node_exporter
code on the registered hosts, and a local instance of Prometheus to collect the data. The collected Prometheus data is stored in the pbench result directory as a tar ball at: ${pbench_run}/<script>_<config>_YYYY.MM.DDTHH.mm.ss/tools-<group>/prometheus
.
For the duration of the run, the Prometheus instance is available on localhost:9090
if one desires to review the metrics being collected live.
NOTE WELL: like all the other "tools" the pbench-agent
supports, the node-exporter
and dcgm
tools themselves need to be installed separately on the registered hosts.
The new dcgm
tool requires Python 2, an Nvidia based install which might conflict with the Pbench Agent's Python 3 operational requirement in some cases.
The PCP tool
Just like the new Prometheus based tools, you can register "PCP" as a persistent tool using: `pbench-register-tool -...
v0.71.0-qe.02 (agent-only)
Removed package-lock.json
v0.69.10 (agent-only)
This is an agent-only release, changing pbench-trafficgen
to use the bench-trafficgen
repo.
Beyond the pbench-trafficgen
work, we also have fixes for pbench-specjbb
recorded metadata, improved error handling for pbench-move/copy-results
, and documentation for the pbench-clear-tools -r
option.
This release also includes server-side commits which will not be released in an RPM.
What's Changed (Agent)
- Remove errant
envars
tox setting by @portante in #2419 - Improve error handling for
ssh
failures frompbench-move/copy-results
by @portante in #2432 - Avoid hardcoded tools location on bench-scripts by @portante in #2519
- Backport PR #2522 specjbb mdlog fix by @portante in #2523
- Added missing
-r
option inpbench-clear-tools
by @riya-17 in #2586 - Add option to allow chroot specs for COPR builds (b0.69) by @ndokos in #2621
- Move to
perftool-incubator/bench-trafficgen
by @portante in #2722
What's Changed (Server)
Server side commits which will not show up in an RPM.
- Add a re-unpack cronjob that avoids indexing by @portante in #2350
- Generate raw statistics from archived tar balls by @portante in #2279
- Add reporting for the last 2 weeks by @portante in #2431
- Pbench put shim server by @riya-17 in #2414
- remove 'test_' prefix from the fixture in conftest by @riya-17 in #2455
- Emit full exception on backup directory failure by @portante in #2563
- Start using syslog identifier by @portante in #2569
- Sort all items in the tarball stats report by @webbnh in #2637
Full Changelog: v0.69.9...v0.69.10