Skip to content

Commit

Permalink
Implement units information (#144)
Browse files Browse the repository at this point in the history
* First units working version using std containers

Adds structure to hold units of monitored values alongside
the name
Add new Interface method, get_unit_info(), that fills in
the JSON container with unit information
Dummy calls added to all classes, but real one implemeneted
for cpumon

* Refactor with parameter class

Add a class dedicated for holding parameter names and units in a
separate header file

Move the definition of the cpu parameters into the class itself

* Add option to specify the clang-format binary

This is useful on systems like Debian where multiple
clang-formats are available

Also move the PYTHON_TEST definition up a
bit in tests to make sure that it is defined when used

* Implement units for cpumon and iomon

Move the definition for monitored components into the headers
for the classes
Use the standard dumper for unit output
Add option for units output into the JSON file

* Clang format

* Update memmon to new parameter scheme

* Add netmon parameters and units

* Added parameters and units for NVIDIA GPUs

* Add countmon and wallmon parameters

* Add empty initalizers for member variables

* Add unit test for units

Fail if there are extra or missing parameters comparing
the JSON outputs with the Units field

* Apply clang-format

* Update documentation for monitor classes

* Update units for pure numbers to be "1"; add documentation

Use "1" for pure numerical values (consistent with plotting script).

Add missing documentation on the --units option.

Tidy minor points from the review

Co-authored-by: clang-format <clang-format@github-actions>
  • Loading branch information
graeme-a-stewart and clang-format authored Jul 2, 2020
1 parent 17aebca commit 6b356d6
Show file tree
Hide file tree
Showing 24 changed files with 269 additions and 62 deletions.
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ The `prmon` binary is invoked with the following arguments:

```sh
prmon [--pid PPP] [--filename prmon.txt] [--json-summary prmon.json] \
[--interval 30] [--suppress-hw-info] [--netdev DEV] \
[--interval 30] [--suppress-hw-info] [--units] [--netdev DEV] \
[-- prog arg arg ...]
```

Expand All @@ -91,6 +91,7 @@ prmon [--pid PPP] [--filename prmon.txt] [--json-summary prmon.json] \
* `--json-summmary` output file for summary data written in JSON format
* `--interval` time, in seconds, between monitoring snapshots
* `--suppress-hw-info` flag that turns-off hardware information collection
* `--units` add information on units for each metric to JSON file
* `--netdev` restricts network statistics to one (or more) network devices
* `--` after this argument the following arguments are treated as a program to invoke
and remaining arguments are passed to it; `prmon` will then monitor this process
Expand All @@ -104,7 +105,9 @@ In the `filename` output file, plain text with statistics written every

In the `json-summmary` file values for the maximum and average statistics
are given in JSON format. This file is rewritten every `interval` seconds
with the current summary values.
with the current summary values. Use the `--units` option to see exactly
which units are used for each metric (the value of `1` for a unit means
it is a pure number).

Monitoring of CPU, I/O and memory is reliably accurate, at least to within
the sampling time. Monitoring of network I/O is **not reliable** unless the
Expand Down
4 changes: 4 additions & 0 deletions cmake/clang-format.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@
# Requires clang-format to be available in the
# environment

# Setup the target version of clang-format we will use
set(CLANG_FORMAT "clang-format" CACHE STRING "Clang format binary")
message(STATUS "Setting clang-format test binary to '${CLANG_FORMAT}' (use -DCLANG_FORMAT to change)")

# Get all project files
file(GLOB_RECURSE ALL_SOURCE_FILES *.cpp *.h)

Expand Down
16 changes: 13 additions & 3 deletions doc/ADDING_MONITORS.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,16 @@ All of the prmon monitors are concrete implementations of the virtual
`Imonitor.h` interface. Inside the `package/src` directory you will find all of
the current examples and these are an excellent guide.

## `util.h`
## `parameter.h`

Every monitor describes its column headers (*what* it will monitor), so add
short, descriptive names for the columns you will output.
Every monitor describes its column headers (*what* it will monitor) and the
units for this parameter as a vector of `parameter` classes. This is a very
simple class, initialised with three strings:
- name of parameter
- units for maximum/continuous value
- units for average value
If the later is an empty string it means that there is no meaningful value
for the average and so nothing is output.

## Data Structures

Expand All @@ -27,6 +33,10 @@ maximum values and average values. These are updated each update cycle.
Use an RAII pattern, so that on initialisation the monitor is valid and ready
to be used.

For most monitors a vector of monitored quantities is constructed from the
parameter class, as the units are only needed once. Counters are mostly
set to zero as well.

## Registry

All monitors should use the `REGISTER_MONITOR` macro to register themselves. The
Expand Down
1 change: 1 addition & 0 deletions package/src/Imonitor.h
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ class Imonitor {
unsigned long long elapsed_clock_ticks) = 0;

virtual void const get_hardware_info(nlohmann::json& hw_json) = 0;
virtual void const get_unit_info(nlohmann::json& unit_json) = 0;
virtual bool const is_valid() = 0;
};

Expand Down
20 changes: 14 additions & 6 deletions package/src/countmon.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,16 +15,19 @@
// Constructor; uses RAII pattern to be valid
// after construction
countmon::countmon()
: count_params{prmon::default_count_params},
: count_params{},
count_stats{},
count_peak_stats{},
count_average_stats{},
count_total_stats{},
iterations{0L} {
for (const auto& count_param : count_params) {
count_stats[count_param] = 0;
count_peak_stats[count_param] = 0;
count_average_stats[count_param] = 0;
count_total_stats[count_param] = 0;
count_params.reserve(params.size());
for (const auto& param : params) {
count_params.push_back(param.get_name());
count_stats[param.get_name()] = 0;
count_peak_stats[param.get_name()] = 0;
count_average_stats[param.get_name()] = 0;
count_total_stats[param.get_name()] = 0;
}
}

Expand Down Expand Up @@ -80,3 +83,8 @@ std::map<std::string, double> const countmon::get_json_average_stats(

// Collect related hardware information
void const countmon::get_hardware_info(nlohmann::json& hw_json) { return; }

void const countmon::get_unit_info(nlohmann::json& unit_json) {
prmon::fill_units(unit_json, params);
return;
}
6 changes: 6 additions & 0 deletions package/src/countmon.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,15 @@
#include <vector>

#include "Imonitor.h"
#include "parameter.h"
#include "registry.h"

class countmon final : public Imonitor {
private:
// Setup the parameters to monitor here
const prmon::parameter_list params = {{"nprocs", "1", "1"},
{"nthreads", "1", "1"}};

// Which network count paramters to measure and output key names
std::vector<std::string> count_params;

Expand All @@ -42,6 +47,7 @@ class countmon final : public Imonitor {

// This is the hardware information getter that runs once
void const get_hardware_info(nlohmann::json& hw_json);
void const get_unit_info(nlohmann::json& unit_json);
bool const is_valid() { return true; }
};
REGISTER_MONITOR(Imonitor, countmon, "Monitor number of processes and threads")
Expand Down
13 changes: 11 additions & 2 deletions package/src/cpumon.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,12 @@

// Constructor; uses RAII pattern to be valid
// after construction
cpumon::cpumon() : cpu_params{prmon::default_cpu_params}, cpu_stats{} {
for (const auto& cpu_param : cpu_params) cpu_stats[cpu_param] = 0;
cpumon::cpumon() : cpu_params{}, cpu_stats{} {
cpu_params.reserve(params.size());
for (const auto& param : params) {
cpu_params.push_back(param.get_name());
cpu_stats[param.get_name()] = 0;
}
}

void cpumon::update_stats(const std::vector<pid_t>& pids) {
Expand Down Expand Up @@ -120,3 +124,8 @@ void const cpumon::get_hardware_info(nlohmann::json& hw_json) {

return;
}

void const cpumon::get_unit_info(nlohmann::json& unit_json) {
prmon::fill_units(unit_json, params);
return;
}
8 changes: 8 additions & 0 deletions package/src/cpumon.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,17 @@
#include <vector>

#include "Imonitor.h"
#include "parameter.h"
#include "registry.h"

class cpumon final : public Imonitor {
private:
// Setup the parameters to monitor here
const prmon::parameter_list params = {{"utime", "s", ""}, {"stime", "s", ""}};

// Which network cpu paramters to measure and output key names
// This will be filled at initialisation, taking the names
// from the above params
std::vector<std::string> cpu_params;

// Container for total stats
Expand All @@ -36,6 +42,8 @@ class cpumon final : public Imonitor {

// This is the hardware information getter that runs once
void const get_hardware_info(nlohmann::json& hw_json);

void const get_unit_info(nlohmann::json& unit_json);
bool const is_valid() { return true; }
};
REGISTER_MONITOR(Imonitor, cpumon, "Monitor cpu time used")
Expand Down
13 changes: 11 additions & 2 deletions package/src/iomon.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,12 @@

// Constructor; uses RAII pattern to be valid
// after construction
iomon::iomon() : io_stats{} {
for (const auto& io_param : prmon::default_io_params) io_stats[io_param] = 0;
iomon::iomon() : io_params{}, io_stats{} {
io_params.reserve(params.size());
for (const auto& param : params) {
io_params.push_back(param.get_name());
io_stats[param.get_name()] = 0;
}
}

void iomon::update_stats(const std::vector<pid_t>& pids) {
Expand Down Expand Up @@ -57,3 +61,8 @@ std::map<std::string, double> const iomon::get_json_average_stats(

// Collect related hardware information
void const iomon::get_hardware_info(nlohmann::json& hw_json) { return; }

void const iomon::get_unit_info(nlohmann::json& unit_json) {
prmon::fill_units(unit_json, params);
return;
}
8 changes: 8 additions & 0 deletions package/src/iomon.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,17 @@
#include <vector>

#include "Imonitor.h"
#include "parameter.h"
#include "registry.h"

class iomon final : public Imonitor {
private:
// Setup the parameters to monitor here
const prmon::parameter_list params = {{"rchar", "B", "B/s"},
{"wchar", "B", "B/s"},
{"read_bytes", "B", "B/s"},
{"write_bytes", "B", "B/s"}};

// Which network io paramters to measure and output key names
std::vector<std::string> io_params;

Expand All @@ -35,6 +42,7 @@ class iomon final : public Imonitor {

// This is the hardware information getter that runs once
void const get_hardware_info(nlohmann::json& hw_json);
void const get_unit_info(nlohmann::json& unit_json);
bool const is_valid() { return true; }
};
REGISTER_MONITOR(Imonitor, iomon, "Monitor input and output activity")
Expand Down
25 changes: 19 additions & 6 deletions package/src/memmon.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,20 @@

// Constructor; uses RAII pattern to be valid
// after construction
memmon::memmon() : mem_params{prmon::default_memory_params}, iterations{0} {
for (const auto& mem_param : mem_params) {
mem_stats[mem_param] = 0;
mem_peak_stats[mem_param] = 0;
mem_average_stats[mem_param] = 0;
mem_total_stats[mem_param] = 0;
memmon::memmon()
: mem_params{},
mem_stats{},
mem_peak_stats{},
mem_average_stats{},
mem_total_stats{},
iterations{0} {
mem_params.reserve(params.size());
for (const auto& param : params) {
mem_params.push_back(param.get_name());
mem_stats[param.get_name()] = 0;
mem_peak_stats[param.get_name()] = 0;
mem_average_stats[param.get_name()] = 0;
mem_total_stats[param.get_name()] = 0;
}
}

Expand Down Expand Up @@ -117,3 +125,8 @@ void const memmon::get_hardware_info(nlohmann::json& hw_json) {

return;
}

void const memmon::get_unit_info(nlohmann::json& unit_json) {
prmon::fill_units(unit_json, params);
return;
}
10 changes: 10 additions & 0 deletions package/src/memmon.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,19 @@
#include <vector>

#include "Imonitor.h"
#include "parameter.h"
#include "registry.h"

class memmon final : public Imonitor {
private:
// Default paramater list
// const static std::vector<std::string> default_memory_params{"vmem", "pss",
// "rss", "swap"};
const prmon::parameter_list params = {{"vmem", "kB", "kB"},
{"pss", "kB", "kB"},
{"rss", "kB", "kB"},
{"swap", "kB", "kB"}};

// Which network memory parameters to measure and output key names
std::vector<std::string> mem_params;

Expand All @@ -41,6 +50,7 @@ class memmon final : public Imonitor {

// This is the hardware information getter that runs once
void const get_hardware_info(nlohmann::json& hw_json);
void const get_unit_info(nlohmann::json& unit_json);
bool const is_valid() { return true; }
};
REGISTER_MONITOR(Imonitor, memmon, "Monitor memory usage")
Expand Down
10 changes: 9 additions & 1 deletion package/src/netmon.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,10 @@
// network device streams and to take initial values
// to the monitor relative differences
netmon::netmon(std::vector<std::string> netdevs)
: interface_params{prmon::default_network_if_params}, network_if_streams{} {
: interface_params{}, network_if_streams{} {
interface_params.reserve(params.size());
for (const auto& param : params) interface_params.push_back(param.get_name());

if (netdevs.size() == 0) {
monitored_netdevs = get_all_network_devs();
} else {
Expand Down Expand Up @@ -111,3 +114,8 @@ std::map<std::string, double> const netmon::get_json_average_stats(

// Collect related hardware information
void const netmon::get_hardware_info(nlohmann::json& hw_json) { return; }

void const netmon::get_unit_info(nlohmann::json& unit_json) {
prmon::fill_units(unit_json, params);
return;
}
11 changes: 9 additions & 2 deletions package/src/netmon.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,18 @@
#include <vector>

#include "Imonitor.h"
#include "parameter.h"
#include "registry.h"

class netmon final : public Imonitor {
private:
// Which network interface paramters to measure (in this simple case
// these are also the output key names)
// Setup the parameters to monitor here
const prmon::parameter_list params = {{"rx_bytes", "B", "B/s"},
{"rx_packets", "1", "1/s"},
{"tx_bytes", "B", "B/s"},
{"tx_packets", "1", "1/s"}};

// Vector for network interface paramters to measure (will be constructed)
std::vector<std::string> interface_params;

// Which network interfaces to monitor
Expand Down Expand Up @@ -71,6 +77,7 @@ class netmon final : public Imonitor {

// This is the hardware information getter that runs once
void const get_hardware_info(nlohmann::json& hw_json);
void const get_unit_info(nlohmann::json& unit_json);
bool const is_valid() { return true; }
};
REGISTER_MONITOR(Imonitor, netmon, "Monitor network activity (device level)")
Expand Down
21 changes: 14 additions & 7 deletions package/src/nvidiamon.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,16 +18,18 @@

// Constructor; uses RAII pattern to be valid after construction
nvidiamon::nvidiamon()
: nvidia_params{prmon::default_nvidia_params},
nvidia_stats{},
: nvidia_stats{},
nvidia_peak_stats{},
nvidia_average_stats{},
nvidia_total_stats{},
iterations{0L} {
for (const auto& nvidia_param : nvidia_params) {
nvidia_stats[nvidia_param] = 0;
nvidia_peak_stats[nvidia_param] = 0;
nvidia_average_stats[nvidia_param] = 0;
nvidia_total_stats[nvidia_param] = 0;
nvidia_params.reserve(params.size());
for (const auto& param : params) {
nvidia_params.push_back(param.get_name());
nvidia_stats[param.get_name()] = 0;
nvidia_peak_stats[param.get_name()] = 0;
nvidia_average_stats[param.get_name()] = 0;
nvidia_total_stats[param.get_name()] = 0;
}

// Attempt to execute nvidia-smi
Expand Down Expand Up @@ -206,3 +208,8 @@ void const nvidiamon::get_hardware_info(nlohmann::json& hw_json) {
}
return;
}

void const nvidiamon::get_unit_info(nlohmann::json& unit_json) {
prmon::fill_units(unit_json, params);
return;
}
Loading

0 comments on commit 6b356d6

Please sign in to comment.