Skip to content

Latest commit

 

History

History
37 lines (29 loc) · 1.48 KB

README.md

File metadata and controls

37 lines (29 loc) · 1.48 KB

Evaluation-Gen-AI

Examples for evaluating generative AI use cases on Amazon Bedrock and Amazon SageMaker.

Features

  • Examples for how ROUGE is computed over text
  • Examples for how BERT score is computed over text
  • Consider which use cases fits each
  • Implements RAGAS framework for baseline testing of amazon Bedrock Knowledge bases
  • Measures retrieval accuracy and relevance
  • Evaluates context precision and faithfulness
  • Use RAGAS to find optimal query time parameters for knowledge bases -- number of retreived answers -- Choice of generating model
  • Integration with Bedrock Guardrails
  • RAGAS safety metrics implementation
  • Measure guardrail accuracy by analyzing tradeoffs between over-filtering (false positives) and under-filtering (false negatives).

4. Agent Evaluation Framework

  • End-to-end agent testing
  • Task completion verification
  • Response quality measurement
  • Performance benchmarking

Contributing

Open an Issue or a Pull request.

License

This project is licensed under the LICENSE file in the repository.