Examples for evaluating generative AI use cases on Amazon Bedrock and Amazon SageMaker.
- Examples for how ROUGE is computed over text
- Examples for how BERT score is computed over text
- Consider which use cases fits each
- Implements RAGAS framework for baseline testing of amazon Bedrock Knowledge bases
- Measures retrieval accuracy and relevance
- Evaluates context precision and faithfulness
- Use RAGAS to find optimal query time parameters for knowledge bases -- number of retreived answers -- Choice of generating model
- Integration with Bedrock Guardrails
- RAGAS safety metrics implementation
- Measure guardrail accuracy by analyzing tradeoffs between over-filtering (false positives) and under-filtering (false negatives).
- End-to-end agent testing
- Task completion verification
- Response quality measurement
- Performance benchmarking
Open an Issue or a Pull request.
This project is licensed under the LICENSE file in the repository.