forked from pytorch/pytorch
-
Notifications
You must be signed in to change notification settings - Fork 0
[WIP] What is a SEV
Suraj Subramanian edited this page Jul 24, 2023
·
1 revision
"OSS CI SEV" represents the incident response process for PyTorch OSS CI, including incidents that breaks the HUD status, trunk health, PR health, and CI infrastructure stability. The goal of ci: sev
process is to maintain a healthy trunk for better developer experience.
- [OSS] PyTorch Metrics Platform: https://metrics.pytorch.org/
- [FB Only] Green HUD Top Level Metrics: https://fburl.com/unidash/961dprzj
Create an issue that clearly indicates the scope and the impact area. Tag the issue with ci: sev
label so that it appears on the HUD. https://hud.pytorch.org/build2/pytorch-master
- Raise the awareness. SEV events visibility on HUD should be able to help tree-hugger oncalls to clarify if some "test failures" are SEV or infra flaky issues.
- Notify the related tests' owner team.
- Escalate the issue with
high priority
label if necessary - After the issue is resolved, simply close the issue (but don't remove the label
ci: sev
).
- Gathering the recent SEV issues: https://github.com/pytorch/pytorch/issues?q=is%3Aissue+label%3A%22ci%3A+sev%22+
- Summarize what can we do to prevent similar issues in the future
- Actionable Items
- Improved Detection
PyTorch presented to you with love by the PyTorch Team of contributors
- Install Prerequisites and Dependencies
- Fork, clone, and checkout the PyTorch source
- Build PyTorch from source
- Tips for developing PyTorch
- PyTorch Workflow Git cheatsheet
- Overview of the Pull Request Lifecycle
- Finding Or Creating Issues
- Pre Commit Checks
- Create a Pull Request
- Typical Pull Request Workflow
- Pull Request FAQs
- Getting Help
- Codebase structure
- Tensors, Operators, and Testing
- Autograd
- Dispatcher, Structured Kernels, and Codegen
- torch.nn
- CUDA basics
- Data (Optional)
- function transforms (Optional)