Skip to content

Commit

Permalink
switch artifact format
Browse files Browse the repository at this point in the history
  • Loading branch information
oeway committed Nov 18, 2024
1 parent ddccad8 commit a0728c5
Show file tree
Hide file tree
Showing 4 changed files with 65 additions and 55 deletions.
24 changes: 14 additions & 10 deletions docs/artifact-manager.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,11 @@ collection = await artifact_manager.create(
manifest=gallery_manifest,
config={"permissions": {"*": "r", "@": "r+"}}
)
print("Dataset Gallery created.")
print("Dataset Gallery created with ID:", collection.id)
```

**Tips: The returned `collection.id` is the unique identifier for the collection with the format `workspace_id/alias`. You can use this ID to refer to the collection in subsequent operations. You can also use only `alias` as a shortcut, however, this only works in the same workspace.**

### Step 3: Adding a Dataset to the Gallery

After creating the gallery, you can add datasets to it, with each dataset having its own metadata and permissions.
Expand All @@ -65,6 +67,8 @@ dataset = await artifact_manager.create(
print("Dataset added to the gallery.")
```

**Tips: The `version="stage"` parameter stages the dataset for review and commit. You can edit the dataset before committing it to the collection.**

### Step 4: Uploading Files to the Dataset with Download Statistics

Each dataset can contain multiple files. Use pre-signed URLs for secure uploads and set `download_weight` to track file downloads.
Expand Down Expand Up @@ -701,10 +705,10 @@ The `Artifact Manager` provides an HTTP endpoint for retrieving artifact manifes

### Endpoints:

- `/{workspace}/artifacts/{artifact_id}` for fetching the artifact manifest.
- `/{workspace}/artifacts/{artifact_id}/children` for listing all artifacts in a collection.
- `/{workspace}/artifacts/{artifact_id}/files` for listing all files in the artifact.
- `/{workspace}/artifacts/{artifact_id}/files/{file_path:path}` for downloading a file from the artifact (will be redirected to a pre-signed URL).
- `/{workspace}/artifacts/{artifact_alias}` for fetching the artifact manifest.
- `/{workspace}/artifacts/{artifact_alias}/children` for listing all artifacts in a collection.
- `/{workspace}/artifacts/{artifact_alias}/files` for listing all files in the artifact.
- `/{workspace}/artifacts/{artifact_alias}/files/{file_path:path}` for downloading a file from the artifact (will be redirected to a pre-signed URL).


### Request Format:
Expand All @@ -718,7 +722,7 @@ The `Artifact Manager` provides an HTTP endpoint for retrieving artifact manifes
The path parameters are used to specify the artifact or file to access. The following parameters are supported:

- **workspace**: The workspace in which the artifact is stored.
- **artifact_id**: The id of the artifact to access. This can be an uuid generated by `create` or `edit` function, or it can be an alias of the artifact under the current workspace. Note that this artifact_id can only be the uuid or the alias without the workspace id.
- **artifact_alias**: The alias or id of the artifact to access. This can be an artifact id generated by `create` or `edit` function, or it can be an alias of the artifact under the current workspace. Note that this artifact_alias should not contain the workspace.
- **file_path**: Optional, the relative path to a file within the artifact. This is optional and only required when downloading a file.

### Query Parameters:
Expand All @@ -738,13 +742,13 @@ Qury parameters are passed after the `?` in the URL and are used to control the

### Response:

For `/{workspace}/artifacts/{artifact_id}`, the response will be a JSON object representing the artifact manifest. For `/{workspace}/artifacts/{artifact_id}/__files__/{file_path:path}`, the response will be a pre-signed URL to download the file. The artifact manifest will also include any metadata such as download statistics, e.g. `view_count`, `download_count`. For private artifacts, make sure if the user has the necessary permissions.
For `/{workspace}/artifacts/{artifact_alias}`, the response will be a JSON object representing the artifact manifest. For `/{workspace}/artifacts/{artifact_alias}/__files__/{file_path:path}`, the response will be a pre-signed URL to download the file. The artifact manifest will also include any metadata such as download statistics, e.g. `view_count`, `download_count`. For private artifacts, make sure if the user has the necessary permissions.

For `/{workspace}/artifacts/{artifact_id}/children`, the response will be a list of artifacts in the collection.
For `/{workspace}/artifacts/{artifact_alias}/children`, the response will be a list of artifacts in the collection.

For `/{workspace}/artifacts/{artifact_id}/files`, the response will be a list of files in the artifact, each file is a dictionary with the `name` and `type` fields.
For `/{workspace}/artifacts/{artifact_alias}/files`, the response will be a list of files in the artifact, each file is a dictionary with the `name` and `type` fields.

For `/{workspace}/artifacts/{artifact_id}/files/{file_path:path}`, the response will be a pre-signed URL to download the file.
For `/{workspace}/artifacts/{artifact_alias}/files/{file_path:path}`, the response will be a pre-signed URL to download the file.

### Example: Fetching a public artifact with download statistics

Expand Down
2 changes: 1 addition & 1 deletion hypha/VERSION
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
{
"version": "0.20.39.post3"
"version": "0.20.39.post4"
}
71 changes: 38 additions & 33 deletions hypha/artifact.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,17 +145,19 @@ def __init__(
self._artifacts_dir = artifacts_dir

# HTTP endpoint for getting an artifact
@router.get("/{workspace}/artifacts/{artifact_id}")
@router.get("/{workspace}/artifacts/{artifact_alias}")
async def get_artifact(
workspace: str,
artifact_id: str,
artifact_alias: str,
silent: bool = False,
version: str = None,
user_info: self.store.login_optional = Depends(self.store.login_optional),
):
"""Get artifact metadata, manifest, and config (excluding secrets)."""
try:
artifact_id = self._validate_artifact_id(artifact_id, {"ws": workspace})
artifact_id = self._validate_artifact_id(
artifact_alias, {"ws": workspace}
)
session = await self._get_session(read_only=True)
async with session.begin():
# Fetch artifact along with parent
Expand Down Expand Up @@ -201,18 +203,20 @@ async def get_artifact(
await session.close()

# HTTP endpoint for listing child artifacts
@router.get("/{workspace}/artifacts/{artifact_id}/children")
@router.get("/{workspace}/artifacts/{artifact_alias}/children")
async def list_children(
workspace: str,
artifact_id: str,
artifact_alias: str,
page: int = 0,
page_size: int = 100,
order_by: str = None,
user_info: self.store.login_optional = Depends(self.store.login_optional),
):
"""List child artifacts of a specified artifact."""
try:
artifact_id = self._validate_artifact_id(artifact_id, {"ws": workspace})
artifact_id = self._validate_artifact_id(
artifact_alias, {"ws": workspace}
)
session = await self._get_session(read_only=True)
async with session.begin():
parent_artifact, _ = await self._get_artifact_with_permission(
Expand Down Expand Up @@ -265,10 +269,10 @@ async def list_children(
await session.close()

# HTTP endpoint for retrieving files within an artifact
@router.get("/{workspace}/artifacts/{artifact_id}/files/{path:path}")
@router.get("/{workspace}/artifacts/{artifact_alias}/files/{path:path}")
async def get_file(
workspace: str,
artifact_id: str,
artifact_alias: str,
path: str = "",
silent: bool = False,
version: str = None,
Expand All @@ -277,7 +281,9 @@ async def get_file(
):
"""Retrieve a file within an artifact or list files in a directory."""
try:
artifact_id = self._validate_artifact_id(artifact_id, {"ws": workspace})
artifact_id = self._validate_artifact_id(
artifact_alias, {"ws": workspace}
)
session = await self._get_session(read_only=True)
if token:
user_info = await self.store.parse_user_token(token)
Expand Down Expand Up @@ -375,21 +381,27 @@ def end_transaction(session, transaction):

return session

async def _get_artifact(self, session, artifact_id):
"""
Retrieve an artifact by its unique ID or by parent ID and alias.
"""
def _get_artifact_id_cond(self, artifact_id):
"""Get the SQL condition for an artifact ID."""
if "/" in artifact_id:
assert len(artifact_id.split("/")) == 2, "Invalid artifact ID format."
assert (
len(artifact_id.split("/")) == 2
), "Invalid artifact ID format, it should be `workspace/alias`."
ws, alias = artifact_id.split("/")
query = select(ArtifactModel).where(
return and_(
ArtifactModel.workspace == ws,
ArtifactModel.alias == alias,
)
else:
query = select(ArtifactModel).where(
ArtifactModel.id == artifact_id,
)
return ArtifactModel.id == artifact_id

async def _get_artifact(self, session, artifact_id):
"""
Retrieve an artifact by its unique ID or by parent ID and alias.
"""
query = select(ArtifactModel).where(
self._get_artifact_id_cond(artifact_id),
)
result = await session.execute(query)
artifact = result.scalar_one_or_none()
if not artifact:
Expand All @@ -400,19 +412,7 @@ async def _get_artifact_with_parent(self, session, artifact_id):
"""
Retrieve an artifact and its parent artifact in a single SQL query.
"""
if "/" in artifact_id:
assert (
len(artifact_id.split("/")) == 2
), "Invalid artifact ID format, it should be `workspace/alias`."
ws, alias = artifact_id.split("/")
query = select(ArtifactModel).where(
ArtifactModel.workspace == ws,
ArtifactModel.alias == alias,
)
else:
query = select(ArtifactModel).where(
ArtifactModel.id == artifact_id,
)
query = select(ArtifactModel).where(self._get_artifact_id_cond(artifact_id))

result = await session.execute(query)
artifact = result.scalar_one_or_none()
Expand All @@ -421,14 +421,16 @@ async def _get_artifact_with_parent(self, session, artifact_id):
parent_artifact = None
if artifact and artifact.parent_id:
parent_query = select(ArtifactModel).where(
ArtifactModel.id == artifact.parent_id,
self._get_artifact_id_cond(artifact.parent_id)
)
parent_result = await session.execute(parent_query)
parent_artifact = parent_result.scalar_one_or_none()
return artifact, parent_artifact

def _generate_artifact_data(self, artifact):
artifact_data = model_to_dict(artifact)
artifact_data["id"] = f"{artifact.workspace}/{artifact.alias}"
artifact_data["_id"] = artifact.id
# Exclude 'secrets' from artifact_data to prevent exposure
artifact_data.pop("secrets", None)
return artifact_data
Expand Down Expand Up @@ -571,7 +573,7 @@ async def _increment_stat(self, session, artifact_id, field, increment=1.0):
raise ValueError("Field must be 'view_count' or 'download_count'.")
stmt = (
update(ArtifactModel)
.where(ArtifactModel.id == artifact_id)
.where(self._get_artifact_id_cond(artifact_id))
.values({field: getattr(ArtifactModel, field) + increment})
.execution_options(synchronize_session="fetch")
)
Expand Down Expand Up @@ -909,6 +911,7 @@ async def create(
parent_artifact, _ = await self._get_artifact_with_permission(
user_info, parent_id, "create", session
)
parent_id = parent_artifact.id
if workspace:
assert (
workspace == parent_artifact.workspace
Expand Down Expand Up @@ -970,6 +973,8 @@ async def create(
)
if existing_id:
id = existing_id
else:
alias = id

parent_permissions = (
parent_artifact.config["permissions"] if parent_artifact else {}
Expand Down
23 changes: 12 additions & 11 deletions tests/test_artifact.py
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ async def test_serve_artifact_endpoint(minio_server, fastapi_server, test_user_t

# Ensure the public artifact is available via HTTP
response = requests.get(
f"{SERVER_URL}/{api.config.workspace}/artifacts/{dataset.id}"
f"{SERVER_URL}/{api.config.workspace}/artifacts/{dataset.alias}"
)
assert response.status_code == 200
assert "Public Example Dataset" in response.json()["manifest"]["name"]
Expand Down Expand Up @@ -211,7 +211,7 @@ async def test_serve_artifact_endpoint(minio_server, fastapi_server, test_user_t
token = await api.generate_token()
# Ensure the private artifact is available via HTTP (requires authentication or special permissions)
response = requests.get(
f"{SERVER_URL}/{api.config.workspace}/artifacts/{private_dataset.id}",
f"{SERVER_URL}/{api.config.workspace}/artifacts/{private_dataset.alias}",
headers={"Authorization": f"Bearer {token}"},
)
assert response.status_code == 200
Expand All @@ -220,7 +220,7 @@ async def test_serve_artifact_endpoint(minio_server, fastapi_server, test_user_t

# If no authentication is provided, the server should return a 403 Forbidden status code
response = requests.get(
f"{SERVER_URL}/{api.config.workspace}/artifacts/{private_dataset.id}"
f"{SERVER_URL}/{api.config.workspace}/artifacts/{private_dataset.alias}"
)
assert response.status_code == 403

Expand Down Expand Up @@ -277,7 +277,7 @@ async def test_http_file_and_directory_endpoint(

# Retrieve the file via HTTP
response = requests.get(
f"{SERVER_URL}/{api.config.workspace}/artifacts/{dataset.id}/files/example.txt"
f"{SERVER_URL}/{api.config.workspace}/artifacts/{dataset.alias}/files/example.txt"
)
assert response.status_code == 200
assert response.text == file_contents
Expand All @@ -290,7 +290,7 @@ async def test_http_file_and_directory_endpoint(

# Attempt to list directory contents (should be successful after attempting file)
response = requests.get(
f"{SERVER_URL}/{api.config.workspace}/artifacts/{dataset.id}/files/"
f"{SERVER_URL}/{api.config.workspace}/artifacts/{dataset.alias}/files/"
)
assert response.status_code == 200
assert "example.txt" in [file["name"] for file in response.json()]
Expand Down Expand Up @@ -709,21 +709,21 @@ async def test_http_artifact_endpoint(minio_server, fastapi_server, test_user_to

# Retrieve the dataset manifest via HTTP
response = requests.get(
f"{SERVER_URL}/{api.config.workspace}/artifacts/{dataset.id}"
f"{SERVER_URL}/{api.config.workspace}/artifacts/{dataset.alias}"
)
assert response.status_code == 200
assert "Test Dataset" in response.json()["manifest"]["name"]

# List the files in the dataset directory via HTTP
response = requests.get(
f"{SERVER_URL}/{api.config.workspace}/artifacts/{dataset.id}/files/"
f"{SERVER_URL}/{api.config.workspace}/artifacts/{dataset.alias}/files/"
)
assert response.status_code == 200
assert "example.txt" in [file["name"] for file in response.json()]

# Retrieve the file content via HTTP
response = requests.get(
f"{SERVER_URL}/{api.config.workspace}/artifacts/{dataset.id}/files/example.txt"
f"{SERVER_URL}/{api.config.workspace}/artifacts/{dataset.alias}/files/example.txt"
)
assert response.status_code == 200
assert response.text == file_content
Expand All @@ -736,7 +736,7 @@ async def test_http_artifact_endpoint(minio_server, fastapi_server, test_user_to

# Retrieve the collection's children to verify the dataset presence
response = requests.get(
f"{SERVER_URL}/{api.config.workspace}/artifacts/{collection.id}/children"
f"{SERVER_URL}/{api.config.workspace}/artifacts/{collection.alias}/children"
)
assert response.status_code == 200
item = find_item(response.json(), "id", dataset.id)
Expand Down Expand Up @@ -982,7 +982,7 @@ async def test_artifact_manager_with_collection(

# Retrieve the collection via HTTP to confirm creation
response = requests.get(
f"{SERVER_URL}/{api.config.workspace}/artifacts/{collection.id}"
f"{SERVER_URL}/{api.config.workspace}/artifacts/{collection.alias}"
)
assert response.status_code == 200
assert "test-collection" in response.json()["manifest"]["name"]
Expand Down Expand Up @@ -1094,7 +1094,8 @@ async def test_artifact_edge_cases_with_collection(
}

with pytest.raises(
Exception, match=r".*Artifact with ID 'non-existent-parent' does not exist.*"
Exception,
match=r".*Artifact with ID 'ws-user-user-1/non-existent-parent' does not exist.*",
):
await artifact_manager.create(
type="collection",
Expand Down

0 comments on commit a0728c5

Please sign in to comment.