-
Notifications
You must be signed in to change notification settings - Fork 60
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Updated README * Missed one replacement * New docs for evaluations * Added a few more articles to the docs * Updated readme * Quick updates
- Loading branch information
Showing
12 changed files
with
231 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
--- | ||
title: Datasets | ||
description: Learn how to use to create and manage your datasets. | ||
--- | ||
|
||
## Overview | ||
|
||
Datasets contain values to use as parameters for running evaluations in batches. For example, you can upload a dataset of customer support tickets and use it to evaluate the performance of a chatbot. | ||
|
||
## How it works | ||
|
||
To create a dataset, navigate to the "Datasets" page and click on the "Upload dataset" button. You'll see a form with the following fields: | ||
|
||
- Name: The name of the dataset | ||
- Delimiter: The delimiter used in the first row of the CSV file | ||
- File: The file to upload | ||
|
||
Click the "Create dataset" button to upload the dataset. | ||
|
||
Once the dataset is uploaded, you can use it to run evaluations in batches. Learn how to do it in the [running evaluations](/guides/evaluations/running-evaluations#running-evaluations-in-batch-mode) guide. |
Empty file.
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
--- | ||
title: Overview | ||
description: 'Learn how to create and connect evaluations to your prompts.' | ||
--- | ||
|
||
## What is an evaluation? | ||
|
||
Evaluations help you assess the quality of your LLM outputs. Latitude supports two types of evaluations: | ||
|
||
- **LLM evaluations**: You can use LLM evaluators to score your LLM outputs. | ||
- **Human evaluations (HITL) [Coming soon]**: You can manually review the logs and score them based on your criteria. | ||
|
||
## How do they work? | ||
|
||
A Latitude project can have any number of evaluations that will be available to connect to prompts. You can create evaluations in the **Evaluations** tab of your workspace. Latitude comes with a set of built-in evaluations that you can use to get started, it's as simple as importing them into your project. | ||
|
||
Once you've created an evaluation, you can connect it to a prompt by navigating to the prompt and clicking on the **Evaluations** tab. Then you can select the evaluation you want to connect to the prompt. | ||
|
||
After connecting an evaluation to a prompt, you can: | ||
|
||
- Activate a live evaluation: This will start evaluating the prompt in real-time. For every new log, the evaluation will be run and the result will be displayed in the evaluation's page. | ||
- Run in batch: You can choose whether to run the evaluation on existing logs or automatically generate a batch of logs to run the evaluation on. | ||
|
||
To learn more about how to connect and run evaluations, check out the [Running evaluations](/guides/evaluations/running-evaluations) guide. | ||
|
||
## How do I create an evaluation? | ||
|
||
You can create an evaluation from scratch or import an existing one and edit it. | ||
|
||
### Creating an evaluation from scratch | ||
|
||
Go to the **Evaluations** tab of your project and click on the **Create evaluation** button. You'll have to provide a name for the evaluation and select the type of evaluation you want to create. We support three types of evaluations, depending on the output you expect: | ||
|
||
- **Number**: This is helpful when you want to score outputs on a range, for example a score between 0 and 10. You'll have to provide a minimum and maximum value for the evaluation. | ||
- **Boolean**: Useful for true/false questions. For example, you can use this to evaluate if the output contains harmful content. | ||
- **Text**: A free-form text evaluation. For example, you can use this to generate feedback on the output of a prompt. | ||
|
||
Number and Boolean evaluations expect a specific format for the evaluation result. You have to make sure your evaluation prompt returns either a score or a boolean value (true/false) and that the output is a JSON object with the following format: | ||
|
||
```json | ||
{ | ||
"result": <result>, | ||
"reason": <reason> | ||
} | ||
``` | ||
|
||
We use this format to parse the evaluation result and display aggregated metrics in the evaluations page. Make sure to include this format in your evaluation prompt. If you're not sure how to do this, all of our templates include this format, so you can use them as a reference. | ||
|
||
### Importing an evaluation | ||
|
||
Importing an evaluation is really simple, just navigate to the **Evaluations** tab of your project and you'll see a few templates to get you started. Simply click on the template you want to import and the evaluation will be created for you. | ||
|
||
You can edit an imported evaluation just like you would edit an evaluation from scratch, so feel free to customize it to your needs. |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
--- | ||
title: Running Evaluations | ||
description: 'Learn how to run evaluations on your prompts.' | ||
--- | ||
|
||
Once you've created evaluations and connected them to any of your prompts, you can run them on live logs or in batch mode. This guide will walk you through the process of running evaluations. | ||
|
||
## Prerequisites | ||
|
||
- You have already connected one or more evaluations to your prompt. | ||
- To run evaluations in batch mode, you need to have a dataset created in your project. Learn more about [creating datasets](/guides/datasets/creating-datasets). | ||
|
||
## Steps to run evaluations | ||
|
||
1. **Navigate to the document** | ||
Go to the specific document where you've connected the evaluations. | ||
|
||
2. **Access the evaluations tab** | ||
Look for the "Evaluations" tab or section within the document view. This is where you'll find all the connected evaluations. | ||
|
||
3. **Select evaluations to run** | ||
You should see a list of connected evaluations. Click on the one you want to run. | ||
|
||
4. **Run the evaluation in batch mode** | ||
Click on the "Run in batch" button to start the evaluation process. Learn more about [running evaluations in batch mode](/guides/evaluations/running-evaluations#running-evaluations-in-batch-mode). | ||
|
||
5. **Run the evaluation in live mode** | ||
Activate the "Evaluate production logs" toggle in the top right corner to turn on live evaluation. Learn more about [running evaluations in live mode](/guides/evaluations/running-evaluations#running-evaluations-in-live-mode). | ||
|
||
By following these steps, you should be able to successfully run your connected evaluations and gain valuable insights into the performance of your prompts. | ||
|
||
## Running evaluations in batch mode | ||
|
||
When you run evaluations in batch mode, you can either create new logs from a dataset or use existing logs. | ||
|
||
- **Create new logs from a dataset**: Select the option "Generate from dataset" as the source for the logs. Choose the dataset you want to use, the number of logs to generate, and how the prompt parameters map to the dataset columns. | ||
- **Use existing logs [Coming soon]**: Select the option "Use existing logs" as the source for the logs. Choose how many logs you want to use, and the evaluation will run on the logs you selected. | ||
|
||
Click the "Run evaluation" button to start the evaluation process. You'll see the status of the batch evaluation just above the logs table. Once it's finished, the charts will update with the results of the evaluation, and you can check the evaluation logs to drill down into the results. | ||
|
||
## Running evaluations in live mode | ||
|
||
Evaluations running in live mode will run on all new logs generated in your project. This is useful if you want to monitor the performance of your prompts in real-time. | ||
|
||
We recommend keeping a few key evaluations running in live mode to spot degradations in response quality as soon as they happen. Sometimes new model releases or changes in parameters can lead to a drop in response quality, so this is a good way to catch those issues early. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
--- | ||
title: Quick start | ||
description: Learn how to get started with Latitude | ||
--- | ||
|
||
## Overview | ||
|
||
This quick start guide will walk you through the process of setting up and using Latitude, whether you choose to use Latitude Cloud or self-host the platform. By the end of this guide, you'll have created your first prompt, tested it, and learned how to evaluate and deploy it. | ||
|
||
Latitude offers two deployment options: | ||
|
||
1. **Latitude Cloud**: A fully managed solution that allows you to get started quickly without worrying about infrastructure. | ||
2. **Latitude Self-Hosted**: An open-source version that you can deploy and manage on your own infrastructure for complete control and customization. | ||
|
||
Choose the option that best fits your needs and follow the corresponding instructions below. | ||
|
||
|
||
## Latitude Cloud | ||
|
||
To get started with Latitude, follow these steps: | ||
|
||
1. **Sign up for Latitude**: Visit our [website](https://latitude.so) and follow the instructions to create your account. | ||
|
||
2. **Create a new project**: Once logged in, create a new project to organize your prompts and evaluations. | ||
|
||
3. **Write your first prompt**: Navigate to the Editor and create a new prompt. Start with a simple task, like generating a short story or answering a question. | ||
|
||
4. **Test your prompt**: Use the playground to test your prompt with different inputs and see the model's responses. | ||
|
||
5. **Evaluate in batch**: Before deploying, you can upload a dataset and run a batch evaluation to assess your prompt's performance across various scenarios. | ||
|
||
6. **Deploy your prompt**: Once you're satisfied with your prompt's performance in batch evaluation, deploy it as an endpoint for easy integration with your applications. | ||
|
||
7. **Monitor and evaluate**: Use the Logs section to review your prompt's performance over time. Set up ongoing evaluations to systematically assess and improve your prompt's output. | ||
|
||
8. **Iterate and improve**: Based on the evaluation results, refine your prompt or create new versions to enhance its performance. | ||
|
||
9. **Collaborate with your team**: Invite team members to your Latitude workspace to collaborate on prompt engineering and evaluations. | ||
|
||
For more detailed information on each step, explore our documentation or join our [community](https://join.slack.com/t/trylatitude/shared_invite/zt-17dyj4elt-rwM~h2OorAA3NtgmibhnLA) for support and discussions. | ||
|
||
## Latitude Self-Hosted | ||
|
||
Follow the instructions in the [self-hosted guide](https://docs.latitude.so/self-hosted/quick-start) to get started with Latitude Self-Hosted. | ||
|
||
After setting up Latitude Self-Hosted, you can follow the same steps as in the Latitude Cloud guide to create, test, evaluate, and deploy your prompts. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
--- | ||
title: Logs | ||
description: Learn how to use the logs page to monitor your prompts and evaluate their performance. | ||
--- | ||
|
||
## Overview | ||
|
||
Latitude stores all the logs generated by your prompts in a database. You can use the logs page to monitor your prompts and evaluate their performance. | ||
|
||
## How it works | ||
|
||
Every time you run a prompt, from the API or from the UI, a new log is created. | ||
|
||
To access the logs page, navigate to a prompt and click on the "Logs" tab. You'll see a table with all the logs generated by the prompt, some metadata like the timestamp, the prompt version used, latency, tokens used, and cost. | ||
|
||
Clicking on a log will display a side panel with the full details of the log, including the list of messages. | ||
|
||
## Creating logs for evaluations | ||
|
||
You can also create logs for evaluation purposes without actually running the prompt. This is useful when you want to run evaluations on a large number of inputs. | ||
|
||
For a detailed guide on running evaluations in batches, refer to the [Running Evaluations](/guides/evaluations/running-evaluations#running-evaluations-in-batch-mode) guide. | ||
|
||
## Coming soon | ||
|
||
- Filtering and sorting | ||
- Exporting logs to a CSV file | ||
- Deleting logs | ||
- Visualizations for certain metrics like latency, tokens used, and cost |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters