- Cognitive Search Index implementation from Challenge 4
- Install required libraries in the
requirements.txt
file viapip install -r requirements.txt
if you have not already.
As LLMs grow in popularity and use around the world, the need to manage and monitor their outputs becomes increasingly important. In this challenge, you will learn how to evaluate the outputs of LLMs and how to identify and mitigate potential biases in the model.
Questions you should be able to answer by the end of this challenge:
- What are services and tools to identify and evaluate harms and data leakage in LLMs?
- What are ways to evaluate truthfulness and reduce hallucinations?
- What are methods to evaluate a model if you don't have a ground truth dataset for comparison?
Sections in this Challenge:
- Identifying harms and detecting Personal Identifiable Information (PII)
- Evaluating truthfulness using Ground-Truth Datasets
- Evaluating truthfulness using GPT without Ground-Truth Datasets
You will run the following Jupyter notebook for this challenge. They can be found in your Codespace under the notebooks folder. If you are working locally or in the Cloud, you can find it in the /Notebooks
folder of Resources.zip
file.
CH-05-ResponsibleAI.ipynb
To complete this challenge successfully, you should be able to:
- Articulate Responsible AI principles with OpenAI
- Demonstrate methods and approaches for evaluating LLMs
- Identify tools available to identify and mitigate harms in LLMs
- Overview of Responsible AI practices for Azure OpenAI models
- Azure Cognitive Services - What is Content Filtering
- Azure AI Content Safety tool
- Azure Content Safety Annotations feature
- OpenAI PII Detection Plugin
- Hugging Face Evaluate Library
- Question Answering Evaluation using LangChain
- OpenAI Technical Whitepaper on evaluating models (see Section 3.1)