Skip to content

Commit

Permalink
Updated target paths to no longer show org
Browse files Browse the repository at this point in the history
  • Loading branch information
ngrayluna committed Nov 18, 2024
1 parent c456952 commit c8fff78
Showing 1 changed file with 35 additions and 35 deletions.
70 changes: 35 additions & 35 deletions colabs/wandb_registry/zoo_wandb.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -185,22 +185,21 @@
"source": [
"## Track and publish dataset \n",
"\n",
"Within the Dataset registry we will create a collection called \"zoo-dataset-tensors\". A collection is a set of linked artifact versions in a registry. \n",
"Within the Dataset registry we will create a collection called \"zoo-dataset-tensors\". A *collection* is a set of linked artifact versions in a registry. \n",
"\n",
"To create a collection we need to do two things:\n",
"1. Specify the collection and registry we want to link our artifact version to. To do this, we specify a \"target path\" for our artifact version.\n",
"2. Use the `run.link_artifact` method and pass our artifact object and the target path.\n",
"2. Use the `wandb.run.link_artifact` method and pass our artifact object and the target path.\n",
"\n",
"#### Define target path of the collection\n",
"\n",
"The target path of a collection consists of three parts:\n",
"* The name of your W&B Organization\n",
"The target path of a collection consists of two parts:\n",
"* The name of the registry\n",
"* The name of the collection within the registry\n",
"\n",
"If you know these three fields, you can create the full name yourself with string concatanation, f-strings, and so forth:\n",
"If you know these two fields, you can create the full name yourself with string concatenation, f-strings, and so forth:\n",
"```python\n",
"target_path = f\"{ORG_NAME}/wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}\"\n",
"target_path = f\"wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}\"\n",
"```"
]
},
Expand All @@ -213,13 +212,13 @@
"\n",
"Let's publish our dataset to the Dataset registry in a collection called \"zoo-dataset-tensors\". To do this, we will \n",
"\n",
"1. Get or create the target path. For this notebook we will programmatically create the target path\n",
"1. Get or create the target path. (For this notebook we will programmatically create the target path).\n",
"1. Initialize a run\n",
"1. Create an Artifact object\n",
"1. Create an artifact object\n",
"2. Add each split dataset as individual files to the artifact object\n",
"3. Link the artifact object to the collection with `run.link_artifact()`. Here we specify the target path and the artifact we want to link.\n",
"\n",
"First, let's create the target path. In the following code cell, replace the values specified in `<>` with the name of your organization:"
"First, let's create the target path:"
]
},
{
Expand All @@ -229,12 +228,11 @@
"metadata": {},
"outputs": [],
"source": [
"ORG_NAME = \"<INSERT-YOUR-ORG-NAME>\"\n",
"REGISTRY_NAME = \"Dataset\"\n",
"COLLECTION_NAME = \"zoo-dataset-tensors\"\n",
"\n",
"# Path to link the artifact to a collection\n",
"dataset_target_path = f\"{ORG_NAME}/wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}\""
"dataset_target_path = f\"wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}\""
]
},
{
Expand Down Expand Up @@ -291,7 +289,7 @@
"metadata": {},
"outputs": [],
"source": [
"# Decsribe how we split the training dataset for future reference, reproducibility.\n",
"# Describe how we split the training dataset for future reference, reproducibility.\n",
"config = {\n",
" \"random_state\" : 42,\n",
" \"test_size\" : 0.25,\n",
Expand Down Expand Up @@ -358,9 +356,10 @@
"artifact.add_file(local_path=\"zoo_dataset_X_test.pt\", name=\"zoo_dataset_X_test\")\n",
"artifact.add_file(local_path=\"zoo_labels_y_test.pt\", name=\"zoo_labels_y_test\")\n",
"\n",
"# Create a target path for our artifact in the registry\n",
"REGISTRY_NAME = \"Dataset\"\n",
"COLLECTION_NAME = \"zoo-dataset-tensors-split\"\n",
"target_dataset_path=f\"{ORG_NAME}/wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}\"\n",
"target_dataset_path=f\"wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}\"\n",
"\n",
"run.link_artifact(artifact=artifact, target_path=target_dataset_path)\n",
"\n",
Expand Down Expand Up @@ -390,7 +389,7 @@
"source": [
"## Define a model\n",
"\n",
"The following cells show how to create a simple neural network classifier with PyTorch. There is nothing unique about this model, so we'll gloss over this section."
"The following cells show how to create a simple neural network classifier with PyTorch. There is nothing unique about this model, so we'll will not go into detail of this code block."
]
},
{
Expand Down Expand Up @@ -457,22 +456,23 @@
"source": [
"## Train model\n",
"\n",
"Next, let's train, save, and model artifacts to W&B.\n",
"Next, let's train a model using the training data we published to the registry earlier in this notebook. After we train the model, we will publish that model to W&B.\n",
"\n",
"We'll train the model using the training data we published to the Dataset registry. To use the an artifact from a registry, we need to provide the name of the artifact. The name of the artifact looks similar to a filepath. In fact, this filepath is almost identical to the target path we used in a previous step to publish our artifact, except that we must specify the specific artifact version we want to use following the name of the collection: \n",
"To do this, let's first get the artifact we published to the \"Dataset\" registry. To retrieve an artifact from a registry, we need to know the name of that artifact. The name of an artifact in a registry consists of the prefix `wandb-registry-`, the name of the registry, the name of the collection, and the artifact version:\n",
"\n",
"```python\n",
"# Target path for publishing an artifact version to a registry\n",
"f\"{ORG_NAME}/wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}\"\n",
"\n",
"# Artifact name/filepath for downloading and using artifacts published to a registry\n",
"f\"wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}:v{VERSION}\"\n",
"```\n",
"Since we only linked one artifact version, the version we'll use is `v0`. (W&B uses 0 indexing).\n",
"\n",
"```python\n",
"# Artifact name/filepath for downloading and using artifact publsihed in a registry\n",
"f\"{ORG_NAME}/wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}:v{VERSION}\"\n",
"```\n",
"\n",
"Since we only linked on artifact version, the version we'll use is `v0`. (W&B uses 0 indexing)."
"Note that the name of an artifact is nearly identical to the target path we specified in a previous step when we publish our artifact to the registry except for the version number:\n",
"\n",
"```python\n",
"# Target path for publising an artifact version to a registry\n",
"f\"wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}\"\n",
"```"
]
},
{
Expand All @@ -486,7 +486,7 @@
"\n",
"# Get dataset artifacts from registry\n",
"VERSION = 0\n",
"artifact_name = f\"{ORG_NAME}/wandb-registry-{REGISTRY_NAME.lower()}/{COLLECTION_NAME}:v{VERSION}\"\n",
"artifact_name = f\"wandb-registry-{REGISTRY_NAME.lower()}/{COLLECTION_NAME}:v{VERSION}\"\n",
"dataset_artifact = run.use_artifact(artifact_or_name=artifact_name)\n",
"\n",
"# Download only the training data\n",
Expand Down Expand Up @@ -549,17 +549,17 @@
"id": "2666d37e-1232-4609-8e2c-78af670585ab",
"metadata": {},
"source": [
"The preceeding cell might look intimidating. Let's break it down:\n",
"The preceding cell might look intimidating. Let's break it down:\n",
"\n",
"* First, we download the dataset from the Dataset registry and load it as a tensor\n",
"* Next, we create a simple training loop\n",
" * Within the training loop we log the loss for each step\n",
" * We checkpoint(save) the model every time the remainder of the epoch divided by 100 is 0 and the loss is lower than the previously recorded loss.\n",
" * We then add the saved PyTorch model to the Artifact. \n",
" * We then add the saved PyTorch model to our artifact object.\n",
"\n",
"A couple of things to note:\n",
"1. The preceeding code cell adds a single artifact version to W&B. You can confirm this by navigating to your project workspace, select **Artifacts** in the left navigation, and under **models** click the name of the artifact (starts with `zoo-{run.id}`). You will see a single model with version `v0`.\n",
"2. At this point, we have only tracked the model artifact within our team's project. Anyone outside of our team does not have access to the model we created. To make this model accessible to members outside of our team, we will need to publish our model to the registry. "
"1. The preceding code cell adds a single artifact version to W&B. You can confirm this by navigating to your project workspace, select **Artifacts** in the left navigation, and under **models** click the name of the artifact (starts with `zoo-{run.id}`). You will see a single model with version `v0`.\n",
"2. At this point, we have only tracked the model artifact within our team's project. Anyone outside of our team does not have access to the model we created. To make this model accessible to members outside of our team, we will need to publish our model to the registry. "
]
},
{
Expand Down Expand Up @@ -614,7 +614,7 @@
"\n",
"```python\n",
"# Target path used to link artifact to registry\n",
"target_path = f\"{ORG_NAME}/wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}\"\n",
"target_path = f\"wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}\"\n",
"```"
]
},
Expand All @@ -628,7 +628,7 @@
"REGISTRY_NAME = \"Model\"\n",
"COLLECTION_NAME = \"Zoo_Classifier_Models\"\n",
"\n",
"target_path = f\"{ORG_NAME}/wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}\"\n",
"target_path = f\"wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}\"\n",
"print(\"Target path: \", target_path)"
]
},
Expand Down Expand Up @@ -671,11 +671,11 @@
"\n",
"Let's say that you did not know exactly which model version to use. You can check the lineage of all artifact versions on the W&B App UI. The lineage shows which artifacts were used as input to a run and which artifacts were the output of a given run.\n",
"\n",
"For example, the image below shows the Zoo_Classifier_Models collection within the model registry. Highlighted in yellow is the current model artifact version that is linked to the registry.\n",
"For example, the image below shows the \"Zoo_Classifier_Models\" collection within the model registry. Highlighted in yellow is the current model artifact version that is linked to the registry.\n",
"\n",
"From left to right we see that the run \"trim-rain-2\" was responible for creating the \"split_zoo_dataset\" artifact. (Recall that this is the dataset artifact that contains the test and training data).\n",
"\n",
"We then see that the \"golden-sunset-3\" run used the \"split_zoo_dataset\" artifact for training. Within this run, we created a model artifact. The speciic artifact version we linked to Zoo_Classifier_Models is called `zoo-wyhak4o0:v10`.\n",
"We then see that the \"golden-sunset-3\" run used the \"split_zoo_dataset\" artifact for training. Within this run, we created a model artifact. The specific artifact version we linked to \"Zoo_Classifier_Models\" is called `zoo-wyhak4o0:v10`.\n",
"\n",
"![](./images/dag_model_registry.png)\n",
"\n",
Expand Down Expand Up @@ -735,7 +735,7 @@
"COLLECTION_NAME = \"Zoo_Classifier_Models\"\n",
"VERSION = 0\n",
"\n",
"model_artifact_name = f\"{ORG_NAME}/wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}:v{VERSION}\"\n",
"model_artifact_name = f\"wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}:v{VERSION}\"\n",
"print(f\"Model artifact name: {model_artifact_name}\")"
]
},
Expand Down Expand Up @@ -819,7 +819,7 @@
"COLLECTION_NAME = \"zoo-dataset-tensors-split\"\n",
"VERSION = 0\n",
"\n",
"data_artifact_name = f\"{ORG_NAME}/wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}:v{VERSION}\"\n",
"data_artifact_name = f\"wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}:v{VERSION}\"\n",
"print(f\"Dataset artifact name: {data_artifact_name}\")"
]
},
Expand Down

0 comments on commit c8fff78

Please sign in to comment.