Skip to content

A collection of generative AI evaluation practices for Amazon Bedrock and Amazon SageMaker

License

Notifications You must be signed in to change notification settings

gilinachum/evaluation-gen-ai

Repository files navigation

Evaluation-Gen-AI

Examples for evaluating generative AI use cases on Amazon Bedrock and Amazon SageMaker.

Features

  • Examples for how ROUGE is computed over text
  • Examples for how BERT score is computed over text
  • Consider which use cases fits each
  • Implements RAGAS framework for baseline testing of amazon Bedrock Knowledge bases
  • Measures retrieval accuracy and relevance
  • Evaluates context precision and faithfulness
  • Use RAGAS to find optimal query time parameters for knowledge bases -- number of retreived answers -- Choice of generating model
  • Integration with Bedrock Guardrails
  • RAGAS safety metrics implementation
  • Measure guardrail accuracy by analyzing tradeoffs between over-filtering (false positives) and under-filtering (false negatives).

4. Agent Evaluation Framework

  • End-to-end agent testing
  • Task completion verification
  • Response quality measurement
  • Performance benchmarking

Contributing

Open an Issue or a Pull request.

License

This project is licensed under the LICENSE file in the repository.

About

A collection of generative AI evaluation practices for Amazon Bedrock and Amazon SageMaker

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published