Skip to content

Commit

Permalink
feat: Add how-to for submitting existing datasets to studio
Browse files Browse the repository at this point in the history
  • Loading branch information
MerlinKallenbornAA committed Oct 30, 2024
1 parent 53a335a commit dc16df4
Show file tree
Hide file tree
Showing 3 changed files with 78 additions and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ The how-tos are quick lookups about how to do things. Compared to the tutorials,
| [...define a task](./src/documentation/how_tos/how_to_define_a_task.ipynb) | How to come up with a new task and formulate it |
| [...implement a task](./src/documentation/how_tos/how_to_implement_a_task.ipynb) | Implement a formulated task and make it run with the Intelligence Layer |
| [...debug and log a task](./src/documentation/how_tos/how_to_log_and_debug_a_task.ipynb) | Tools for logging and debugging in tasks |
| [...use Studio with traces](./src/documentation/how_tos/how_to_use_studio_with_traces.ipynb) | Submitting Traces to Studio for debugging |
| [...use Studio with traces](./src/documentation/how_tos/studio/how_to_use_studio_with_traces.ipynb) | Submitting Traces to Studio for debugging |
| **Analysis Pipeline** | |
| [...implement a simple evaluation and aggregation logic](./src/documentation/how_tos/how_to_implement_a_simple_evaluation_and_aggregation_logic.ipynb) | Basic examples of evaluation and aggregation logic |
| [...create a dataset](./src/documentation/how_tos/how_to_create_a_dataset.ipynb) | Create a dataset used for running a task |
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from uuid import uuid4\n",
"\n",
"from intelligence_layer.connectors import StudioClient\n",
"from intelligence_layer.evaluation import InMemoryDatasetRepository"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# How to upload existing datasets to Studio\n",
"<div class=\"alert alert-info\"> \n",
"\n",
"Make sure your account has permissions to use the Studio application.\n",
"\n",
"For an on-prem or local installation, please contact the corresponding team.\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"0. Extract `Dataset` and `Examples` from your `DatasetRepository`.\n",
"\n",
"1. Initialize a `StudioClient` with a project.\n",
" - Use an existing project or create a new one with the `StudioClient.create_project` function.\n",
"2. Submit your `Dataset`along with the corresponding `Examples` with the client.\n",
"\n",
"### Example"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Step 0\n",
"\n",
"existing_dataset_repo = InMemoryDatasetRepository()\n",
"existing_dataset = existing_dataset_repo.dataset(dataset_id=\"my_existing_dataset_id\")\n",
"assert existing_dataset, \"Make sure your dataset still exists.\"\n",
"\n",
"existing_examples = existing_dataset_repo.examples(\n",
" existing_dataset.id, input_type=str, expected_output_type=str\n",
")\n",
"\n",
"# Step 1\n",
"project_name = str(uuid4())\n",
"studio_client = StudioClient(project=project_name)\n",
"my_project = studio_client.create_project(project=project_name)\n",
"\n",
"# Step 2\n",
"studio_dataset_id = studio_client.submit_dataset(\n",
" dataset=existing_dataset, examples=existing_examples\n",
")"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

0 comments on commit dc16df4

Please sign in to comment.