-
Notifications
You must be signed in to change notification settings - Fork 103
Community calls
Vasileios Karakasis edited this page Apr 5, 2022
·
45 revisions
This page holds (temporarily) the agenda and minutes of the bi-weekly community conference calls.
Developments updates
- OSU microbenchmarks as a library tests are merged
- Almost done with the extended syntax of
valid_systems
andvalid_prog_environs
(https://github.com/eth-cscs/reframe/pull/2479)- We had to reimplement how valid systems/environments are selected in order to make it work with fixtures
- The implementation fixes also the bug with
--skip-{system|prgenv}-check
options when using fixtures.
- Still WIP: Distributing a set of tests over multiple nodes (https://github.com/eth-cscs/reframe/pull/2458)
Vasileios Karakasis (CSCS) Victor Holanda (CSCS) Theofilos Manitaras (CSCS) Simon Bradford (Univ. Birmingham)
- We will delay 3.11.0 for two weeks (work got stuck due to limited availability of the team), but an rc release will be done today.
- Draft PRs
- Syntax extensions for
valid_systems
andvalid_prog_environs
: https://github.com/eth-cscs/reframe/pull/2479 - OSU microbenchmarks library test (https://github.com/eth-cscs/reframe/pull/2421)
- Still requires a bit of fine tuning, but it will soon be ready to merge.
- Generating node-pinned tests (https://github.com/eth-cscs/reframe/pull/2458)
- We needed to address some limitations on how we can dynamically generate tests
- https://github.com/eth-cscs/reframe/pull/2470
- https://github.com/eth-cscs/reframe/pull/2474
- Syntax extensions for
- Vasileios Karakasis (CSCS)
- Theofilos Manitaras (CSCS)
- Eirini Koutsaniti (CSCS)
- Jg Piccinali (CSCS)
- Kenneth Hoste (HPC-UGent)
- Åke Sandgren (Umeå Univ)
- Rafael Sarmiento (CSCS)
- Carlos Rosales (Amazon)
- Richard Henwood (Arm)
- Simon Branford (Univ. of Birmingham)
- We will skip 3.10.2 and target 3.11.0 for March 22; two dev releases in-between.
- Bug fixes
- Fixed weird behaviour when overriding hooks within the same test (https://github.com/eth-cscs/reframe/pull/2436)
- Fixed sub-configuration selection when running tests (https://github.com/eth-cscs/reframe/pull/2438)
- Do not set up Spack shell support (https://github.com/eth-cscs/reframe/pull/2424)
- Enhancements
- Control which attributes, variables or parameters can be logged (https://github.com/eth-cscs/reframe/pull/2428); current behaviour can cause problems with Logstash and lose records.
- Remove pipeline timings from output.
- OSU library test and the associated CSCS tests PR (under review): https://github.com/eth-cscs/reframe/pull/2421
- Next sprint: https://github.com/eth-cscs/reframe/milestone/76
- Bug fixes
- Community feedback
-
Extension of the
valid_systems
andvalid_prog_environs
syntax is still work in progress. What if we supported basic compiler abstractions as in Spack here?- Vasileios: There are no plans for compiler auto-detection and auto-generation of the
environments
configuration section. - Kenneth: this could quickly become a time-consuming task, since also compiler versions, etc. are relevant
- Kenneth: this seems like an opportunity for a common Python library that could be leveraged by ReFrame, Spack, EasyBuild, ...
- kind of similar to
archspec
(cfr.-mtune
& co options thatarchspec
knows about, but compiler flags for OpenMP is out-of-scope there...
- kind of similar to
- Richard: Delegate the compilation task fully onto Spack and use the compiler info to generate the ReFrame config on-the-fly. Then ReFrame tests are monkey-patched to parametrise them over the various specs.
- Vasileios: There are no plans for compiler auto-detection and auto-generation of the
- Use cases of running a test session continuously until a time limit is reached: https://github.com/eth-cscs/reframe/issues/619
- could be used for burn-in testing, simulate user workload, ...
- also related to exploring range of combinations for multi-node tests, since often not enough tests are generated to actually fill a system
-
Extension of the
- Meeting frequency
- AOB
- Vasileios Karakasis (CSCS)
- Victor Holanda (CSCS)
- Theofilos Manitaras (CSCS)
- Jg Piccinali (CSCS)
- Stefan Wolfsheimer (SURF0
- Kenneth Hoste (HPC-UGent)
- Åke Sandgren (Umeå Univ.)
- Ben Fulton (Indiana Univ.)
- Caspar van Leeuwen (SURF)
- Rafael Sarmiento (CSCS)
- Carlos Rosales (Amazon)
- Development updates
- ReFrame 3.10.0 is out: https://github.com/eth-cscs/reframe/releases/tag/v3.10.0
- ReFrame 3.10.1 planned for today: https://github.com/eth-cscs/reframe/milestone/74?closed=1
- Next sprint: https://github.com/eth-cscs/reframe/milestone/75
- Added new labels to tag each issue with the framework part it refers to
- We plan to migrate the repo under
github.com/reframe-hpc
.
- Community feedback on use cases
- Do you use or plan to use ReFrame to test and deploy software stack, e.g., using Spack/EasyBuild?
- Feedback: This is an interesting feature for both Spack and EasyBuild for exploring different build configurations, but it's not likely to be used for deploying the software stack.
- Towards relaxing
valid_systems
andvalid_prog_environs
: https://github.com/eth-cscs/reframe/issues/1987- Key challenge here is to integrate also the
resources
that can be defined in the configuration, which are accessed now throughextra_rerources
inside the test. - There are three types of system-related attributes: features, key/value properties and scheduler resources.
- Key challenge here is to integrate also the
- Submit single node job automatically on every node of a reframe partition: https://github.com/eth-cscs/reframe/issues/2334
- would be very useful to find "bad nodes" in a given reservation
- automatically submit a separate copy of a test to each node
- for now, nothing combinatorial (explodes quickly after 2 nodes...)
- combinatorial combos could be pick N out of M possibilities at random, or strided throughout set of 100 nodes (1-10, 11-20, etc.)
- selection mechanism is really needed when running 16-node tests out of 100 available nodes
- Caspar: could tests somehow indicate that they want to use flexible allocation?
- example: gpuburn to check thermal throtlling of GPUs ("hardware test")
- tests that aim to validate working software are probably less interesting to run with flexible allocation
- idea:
--flex-alloc-singlenode=idle:testXYZ,testABC
=> only run these 2 specific single node tests across all nodes
- Theo: Should the tests in such scenario share a single-stage directory so as to avoid redundant builds?
- Åke: This case should be addressed by fixtures, where the build part of the test is a fixture and you only dynamically parametrise the run test.
- Do you use or plan to use ReFrame to test and deploy software stack, e.g., using Spack/EasyBuild?
- Maintenance of scheduler backends
- AOB
- Welcome and introductions
- Briefly introduce yourself and where are you using (or planning to use) ReFrame?
- Development status
- Team & contributions
- Core team (@ekouts, @rsarm, @teojgo, @vkarak, @victorusu)
- Contributions are more than welcome!
- Development model
- Release train model: A new release every two weeks; releases are not delayed; whatever is ready and merged gets released
- Semantic versioning:
<major>.<minor>.<patch>
- Patch-level bumps (every two weeks): bug fixes and new features (no deprecations)
- Minor version bumps (every 6–8 weeks): introduction of major features (deprecations are allowed, but backward compatibility is ensured)
- Major version bumps: backward compatibility may be broken.
- Upcoming major features scheduled for 3.10.
- Asynchronous builds (https://github.com/eth-cscs/reframe/pull/2194)
- New test naming scheme (https://github.com/eth-cscs/reframe/pull/2355)
- Team & contributions
- Outlook for HPC Test library
- Proof-of-concept in
hpctestlib/
(documentation: https://reframe-hpc.readthedocs.io/en/stable/hpctestlib.html) - Continue with creating library tests from our microbenchmarks
- Still unclear: community contributions, library location (different repo?), moving to stable
- Proof-of-concept in
- Discuss issues that need resolution (feature requests, bugs)
- Discuss interesting use cases