Agent Evaluation

Agent Evaluation is a generative AI-powered framework for testing virtual agents.

Internally, Agent Evaluation implements an LLM agent (evaluator) that will orchestrate conversations with your own agent (target) and evaluate the responses during the conversation.

✨ Key features

Built-in support for popular AWS services including Amazon Bedrock, Amazon Q Business, and Amazon SageMaker. You can also bring your own agent to test using Agent Evaluation.
Orchestrate concurrent, multi-turn conversations with your agent while evaluating its responses.
Define hooks to perform additional tasks such as integration testing.
Can be incorporated into CI/CD pipelines to expedite the time to delivery while maintaining the stability of agents in production environments.

📚 Documentation

To get started, please visit the full documentation here. To contribute, please refer to CONTRIBUTING.md

👏 Contributors

Shout out to these awesome contributors:

Name		Name	Last commit message	Last commit date
Latest commit History 236 Commits
.github/workflows		.github/workflows
docs		docs
samples		samples
src		src
tests/src/agenteval		tests/src/agenteval
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
NOTICE		NOTICE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Evaluation

✨ Key features

📚 Documentation

👏 Contributors

About

Releases 2

Packages

Contributors 10

Languages

License

awslabs/agent-evaluation

Folders and files

Latest commit

History

Repository files navigation

Agent Evaluation

✨ Key features

📚 Documentation

👏 Contributors

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 10

Languages

Packages