Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(anta): Limit concurrency #680

Draft
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

carl-baillargeon
Copy link
Contributor

@carl-baillargeon carl-baillargeon commented May 16, 2024

Description

This PR improves the test runner by introducing a generator-based approach for managing test coroutines and setting a configurable limit on the number of concurrent tests.

  • Instead of loading all test coroutines into a list, the runner now uses a generator to yield tests. This approach prevents memory overload and improves performance when dealing with a large number of tests.

  • A limit on the number of concurrent tests is introduced to avoid overwhelming the runner. This limit is configurable with an environement variable (hidden).

Implementation:
The generator yields test coroutines, ensuring that only a limited number of tests are scheduled and run concurrently.
Upon reaching the concurrency limit, the runner waits for some tests to complete before scheduling new ones from the generator.

Fixes: #713

Checklist:

  • Update FAQ to reference Scaling ANTA documentation
  • Update Scaling ANTA documentation - Add numbers @dlobato
  • Add section on JSON catalogs vs YAML (faster)
  • Need to retest on digital twin @dlobato
  • Support None timeouts
  • Confirm if PoolTimeout default should be set to None
  • Update benchmark + unit tests
  • ci: add codspeed to benchmark ANTA #826
  • Confirm that the number of open file descriptors never exceed device * max_connections (100 by default)

Copy link

sonarcloud bot commented Jun 12, 2024

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

Copy link
Contributor

github-actions bot commented Jul 4, 2024

This pull request has conflicts, please resolve those before we can evaluate the pull request.

anta/runner.py Outdated Show resolved Hide resolved
anta/runner.py Outdated Show resolved Hide resolved
Copy link
Contributor

Conflicts have been resolved. A maintainer will review the pull request shortly.

Copy link

sonarcloud bot commented Aug 28, 2024

Copy link
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@mtache mtache changed the title refactor(anta): Refactor runner to use a generator with a limit feat(anta): Limit concurrency Oct 2, 2024
Copy link
Contributor

github-actions bot commented Oct 2, 2024

Conflicts have been resolved. A maintainer will review the pull request shortly.

Copy link

codspeed-hq bot commented Oct 2, 2024

CodSpeed Performance Report

Merging #680 will not alter performance

Comparing carl-baillargeon:refactor/runner_limit (ad22107) with main (8ac4477)

Summary

✅ 6 untouched benchmarks

⁉️ 2 dropped benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main carl-baillargeon:refactor/runner_limit Change
⁉️ test_get_coroutines[1-device] 48.1 ms N/A N/A
⁉️ test_get_coroutines[2-devices] 94.7 ms N/A N/A

Copy link
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Copy link
Contributor

github-actions bot commented Nov 4, 2024

Conflicts have been resolved. A maintainer will review the pull request shortly.

anta/runner.py Show resolved Hide resolved
Comment on lines +43 to +46
The limits are set using the following environment variables:
- ANTA_MAX_CONNECTIONS: Maximum number of allowable connections.
- ANTA_MAX_KEEPALIVE_CONNECTIONS: Number of allowable keep-alive connections.
- ANTA_KEEPALIVE_EXPIRY: Time limit on idle keep-alive connections in seconds.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to httpx doc

Copy link
Collaborator

@gmuloc gmuloc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots on comment on the guide but a very big Thank You for this amazing guide :)

🎉 🎈

Comment on lines 35 to +38
DEFAULT_NOFILE = 16384
"""Default number of open file descriptors for the ANTA process."""
DEFAULT_MAX_CONCURRENCY = 10000
"""Default maximum number of tests to run concurrently."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to move all these to constants for simpler way to know where stuff is



def adjust_rlimit_nofile() -> tuple[int, int]:
"""Adjust the maximum number of open file descriptors for the ANTA process.

The limit is set to the lower of the current hard limit and the value of the ANTA_NOFILE environment variable.

If the `ANTA_NOFILE` environment variable is not set or is invalid, `DEFAULT_NOFILE` is used.
If the `ANTA_NOFILE` environment variable is not set or is invalid, `DEFAULT_NOFILE` is used (16384).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proabbly best to not put the value here - we then forget to update it :p

The result of each completed test.
"""
# NOTE: The `aiter` built-in function is not available in Python 3.9
aws = tests_generator.__aiter__() # pylint: disable=unnecessary-dunder-call
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does aws stands for here? :)


# 🚀 Scaling ANTA: A Comprehensive Guide

**Table of Contents:**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this? this is redundant with
Screenshot 2024-11-14 at 14 48 12

@@ -179,6 +179,7 @@ nav:
- Debug commands: cli/debug.md
- Tag Management: cli/tag-management.md
- Advanced Usages:
- Scaling ANTA: advanced_usages/scaling.md
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not first - should be last or let's use alphabetical order


### Results Management

ANTA can output results in various formats:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need to repeat these? I would just point to the NRFU page documentatin and say you use JSON (less maintenance)

3. Optional: You can merge the JSON results and generate a JUnit report using the following Python script (requires `junitparser`):

```python
from pathlib import Path
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use snippets for this and we will try to test it :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we provide this by default in ANTA as a feature?

Comment on lines +607 to +609
- 📞 Reach out to your Arista SE for guidance
- 📝 Document your specific use case on [GitHub](https://github.com/aristanetworks/anta)
- 🔍 Share your findings with the community
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repeat from the beginning - probably should be the same (or just a button that goes back to the beginning?:)


## 📚 References

- **Python AsyncIO**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need 4 spaces it does not render properly:

Screenshot 2024-11-14 at 15 20 40

@@ -7,8 +7,7 @@

from typing import TYPE_CHECKING

from anta.result_manager import ResultManager
from anta.runner import get_coroutines, prepare_tests
from anta.runner import prepare_tests
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can add back get_coroutines for now

Copy link

sonarcloud bot commented Nov 14, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add options to control HTTPX resource limits
3 participants