[FEATURE] Allow downloading ML models from Cloud providers buckets #1371

IanMenendez · 2023-09-25T16:34:25Z

Is your feature request related to a problem?
There is no solution to upload ML models from Google Cloud buckets (or other cloud providers) to OpenSearch. The only way is to make bucket objects public and use the public URL, therefore exposing ML models to anyone.

This makes the process of uploading a model quite tedious as we need to upload the model by chunks from our local because we do not want to expose ML models to the public internet

What solution would you like?
ModelHelper.downloadAndSplit currently uses ai.djl.training.util.DownloadUtils to download ML models. The solution would be to add (or check if there is an already coded solution) a helper function that allows gcloud (gs://bucket/model.zip) or AWS (s3://bucket/model.zip) type links

What alternatives have you considered?
Making bucket publicly accessible to anyone or uploading model from local making the process tedious

Do you have any additional context?
No

The text was updated successfully, but these errors were encountered:

ylwu-amzn · 2023-09-25T16:45:46Z

hi, @IanMenendez thanks, this is a good suggestion.

Actually ml-commons supports uploading model from local file now. You can refer to https://opensearch-project.github.io/opensearch-py-ml/examples/demo_deploy_cliptextmodel.html and https://opensearch-project.github.io/opensearch-py-ml/examples/demo_transformer_model_train_save_upload_to_openSearch.html#Step-5:-Upload-the-model-to-OpenSearch-cluster

IanMenendez · 2023-09-25T16:54:10Z

hi, @IanMenendez thanks, this is a good suggestion.

Actually ml-commons supports uploading model from local file now. You can refer to https://opensearch-project.github.io/opensearch-py-ml/examples/demo_deploy_cliptextmodel.html and https://opensearch-project.github.io/opensearch-py-ml/examples/demo_transformer_model_train_save_upload_to_openSearch.html#Step-5:-Upload-the-model-to-OpenSearch-cluster

@ylwu-amzn I am aware that uploading models from local is supported but by going down this path we are sacrificing simplicity as:

We would need to manually manage and upload each chunk of the model. While by uploading by URL we just need to specify the URL
Increases steps of sharing ML models through our company as we would have to download the model from the bucket and then upload it instead of just specifying the model bucket URL

ylwu-amzn · 2023-09-25T17:04:42Z

Make sense. Agree that supporting ML models from cloud provider is a good feature. Will add this to dev plan. But appreciate if you prefer to contribute.

rishabh1815769 · 2023-09-27T02:57:57Z

@ylwu-amzn I would like to contribute to this feature as part of the OSCI-2023 program.

austintlee · 2023-10-24T04:13:16Z

@ylwu-amzn can we assign this one to @rishabh1815769 ? @rishabh1815769 Have you made any progress on this?

ylwu-amzn · 2023-10-24T05:27:01Z

Thanks @austintlee Sure, assign to @rishabh1815769, one suggestion: share your design on this issue first so others can help review.

khoaisohd · 2023-11-07T05:51:54Z

Hi experts, do we have any update for this feature, the current project I'm working on needs this feature to allow customers to upload their model from OCI(Oracle cloud infrastructure) object storage
I did a design and a simple POC to verify the design with OCI object storage(Oracle Cloud Infrastructure). Would it be possible for me to contribute to this feature?

ylwu-amzn · 2023-11-07T08:01:52Z

@rishabh1815769 any update ?

@khoaisohd Thanks, I think you can post your design to this issue first, so community can discuss together.

khoaisohd · 2023-11-07T21:35:58Z

Problem Statement

Today, customers can upload models to OpenSearch using public URLs or local file systems. There is no solution to upload ML models from OCI object storage buckets (or other cloud providers) to OpenSearch. The only way is to make bucket objects public and use the public URL, therefore exposing ML models to anyone. This makes the process of uploading a model quite tedious as we need to upload the model in chunks from our local because we do not want to expose ML models to the public internet. To address those pain points, we need to introduce a solution for uploading ML models to OpenSearch from cloud providers.

Scope

Support upload models from cloud provider

Assumption

The OpenSearch clusters are runnings on the compute instances that have a connection and permission to download model files from the cloud provider storage service. For example, in OCI instances principal token and resource principals token are already provisioned in the OpenSearch cluster nodes

Proposal

To allow customers to upload models from cloud providers to OpenSearch, we need to extend the model input to allow customers to provide information about the cloud provider and the model file location. Furthermore, cloud provider Java SDK dependency (OCI SDK, AWS SDK) is also added to the ml-commons plugin to allow it to download files from cloud provider storage service

When the customer uploads a model from the cloud provider, here is what ml-common plugins do

Look at the scheme of the URI and determine the cloud provider storage service.
- S3 format: s3://{bucket}/{key} or s3://{access-point-arn}/{key}
- GCS format: gs://{bucket}/{model-file}
- OCI object storage format: oci-os://{namespace}/{bucket}/{modile-file}
Get auth settings from the model input to build client authentication details
Download model from the cloud provider store service using cloud provider SDK

Model additional attributes

Depending on each cloud provider, we might need to provide additional setting

OCI object storage additional settings

oci_os_endpoint: the endpoint of OCI object storage service
oci_client_auth_type: the type of accessing the model object, we support the following principal types
- resource_principal: does not need any additional settings
- instance_prinicpal: does not need any additional settings
- user_principal: need some additional settings
oci_client_auth_tenant_id: the user tenant ID (only required for user principal)
oci_client_auth_user_id: the user id (only required for user principal)
oci_client_auth_fingerprint: the API key fingerprint (only required for user principal)
oci_client_auth_pemfilepath: the API key pemfile path (only required for user principal)

Sample for resource principal

{
    "name": "phuong6",
    "model_group_id": "jj1nh4sBNGgnXkAx2296",
    "version": "1.0.0",
    "description": "BYOM Model",
    "model_format": "TORCH_SCRIPT",
    "function_name": "SPARSE_ENCODING",
    "model_content_hash_value": "e13b74006290a9d0f58c1376f9629d4ebc05a0f9385f40db837452b167ae9021",
    "url":  "oci-os://idee4xpu3dvm/phuong-bucket/traced_small_model.zip",
    "oci_os_endpoint": "https://objectstorage.uk-london-1.oraclecloud.com",
    "oci_client_auth_type": "resource_principal"
}

Sample for instance principal

{
    "name": "phuong6",
    "model_group_id": "jj1nh4sBNGgnXkAx2296",
    "version": "1.0.0",
    "description": "BYOM Model",
    "model_format": "TORCH_SCRIPT",
    "function_name": "SPARSE_ENCODING",
    "model_content_hash_value": "e13b74006290a9d0f58c1376f9629d4ebc05a0f9385f40db837452b167ae9021",
    "url":  "oci-os://idee4xpu3dvm/phuong-bucket/traced_small_model.zip",
    "oci_os_endpoint": "https://objectstorage.uk-london-1.oraclecloud.com",
    "oci_client_auth_type": "instance_principal"
}

Sample for user principal

{
    "name": "phuong6",
    "model_group_id": "jj1nh4sBNGgnXkAx2296",
    "version": "1.0.0",
    "description": "BYOM Model",
    "model_format": "TORCH_SCRIPT",
    "function_name": "SPARSE_ENCODING",
    "model_content_hash_value": "e13b74006290a9d0f58c1376f9629d4ebc05a0f9385f40db837452b167ae9021",
    "url":  "oci-os://idee4xpu3dvm/phuong-bucket/traced_small_model.zip",
    "oci_os_endpoint": "https://objectstorage.uk-london-1.oraclecloud.com",
    "oci_client_auth_type": "user_principal",
    "oci_client_auth_tenant_id": "ocid1.tenancy.region1..aaaaaaaa6gqokctiy6pncv6jooomauqibkkhduaohvikdrwi6ze2n5o5v3kq",
    "oci_client_auth_user_id": "ocid1.user.region1..aaaaaaaaedtgkbsw6xjifceubh6w72wdsslgyf5zsco7yooxrt4v6f7dcnuq",
    "oci_client_auth_fingerprint": "3a:01:de:90:39:f4:b1:2f:02:75:77:c1:21:f2:20:24",
    "oci_client_auth_pemfilepath": "~/.oci/oci_api_key.pem"
}

khoaisohd · 2023-11-08T23:32:41Z

I haven't done any POC for S3 to validate the design, however we can propose something similar to the following based on what we have in notification plugin

S3 additional settings

aws_region: the region where bucket object live in
aws_role_arn: the role arn

Sample of for s3 model input

{
    "name": "phuong6",
    "model_group_id": "jj1nh4sBNGgnXkAx2296",
    "version": "1.0.0",
    "description": "BYOM Model",
    "model_format": "TORCH_SCRIPT",
    "function_name": "SPARSE_ENCODING",
    "model_content_hash_value": "e13b74006290a9d0f58c1376f9629d4ebc05a0f9385f40db837452b167ae9021",
    "url":  "s3://phuong_bucket/traced_small_model.zip",
    "aws_region": "us-east-1",
    "aws_role_arn": "arn:aws:iam::123456789012:role/S3AccessRole"
}

IanMenendez · 2023-11-10T23:20:33Z

@khoaisohd This implementation looks good to me and it is exactly how we implemented this in our internal systems. Just as a note:

Download model from the cloud provider store service using cloud provider SDK

Some SDK (GCloud for example no idea about others) can read objects by chunks. So you can upload models by chunks without even downloading the model. Essentially: read chunk into buffer from cloud provider -> upload to OS.

khoaisohd · 2023-11-11T06:47:02Z

@IanMenendez Thank you for reviewing the design doc, it is an awesome idea to use SDK to read objects by chunks and upload them directly to OS, I think other providers also support reading an object by chunks as well.

Furthermore, I think that idea is not only applied to upload models from cloud providers but also for the public URL or file systems. In the current implementation, the inputStream (from public URL and file system) is copied into outputStream (for the target downloaded file) by chunk as well.

However, for the scope of this feature, should we continue with the same approach we are using today by just download model to the local file system then split it into small files to avoid a major refactoring? After that, we can have another follow-up PR for the optimization you suggested and it will be applied to all use cases, public URL, local file system, cloud providers, etc.

austintlee · 2023-11-12T01:49:50Z

For S3, this might find this useful.

austintlee · 2023-11-12T01:50:57Z

I would also prefer a streaming approach that does not leave a copy on local disk.

khoaisohd · 2023-11-13T03:37:33Z

@austintlee @IanMenendez @ylwu-amzn , we can use the streaming approach for cloud providers, and we can make it work for the current public URL as well as the file system option as well to avoid leaving a copy on local disk. I'll update the design to use the streaming approach.

Furthermore, do we have any concerns about the proposed customer experience for this feature

URI scheme will be used to detect the cloud provider and the location of model file, for example s3://{bucket}/{key}. oci-os://{namespace}/{bucket}/{modile-file}
Additional settings will be provided as model inputs, for example authentication details, to make thing clear, let have a prefix for those additional settings, for example aws_region, aws_role_arn, oci_client_auth_type
Auth token or key are assumed to be available in the opensearch node so that we can build SDK client to read model file from the cloud provider

sam-herman · 2024-01-24T18:48:14Z

@ylwu-amzn @IanMenendez @rishabh1815769 what is the status of this?
@khoaisohd actually have an RFC and an already working change which we can contribute in a PR, can you please let us know?

IanMenendez · 2024-01-24T20:05:13Z

@ylwu-amzn @IanMenendez @rishabh1815769 what is the status of this? @khoaisohd actually have an RFC and an already working change which we can contribute in a PR, can you please let us know?

As far as I know no one is working on a change. Please go ahead and create a PR

ylwu-amzn · 2024-01-24T20:18:28Z

@samuel-oci feel free to cut RFC to discuss first

sam-herman · 2024-01-25T00:17:30Z

@samuel-oci feel free to cut RFC to discuss first

Sounds good, @khoaisohd can you create an RFC with the proposed solution? I think you can also provide two options:

Add the info in the model attributes as you mentioned in [FEATURE] Allow downloading ML models from Cloud providers buckets #1371 (comment)
Leveraging the connector interface

sam-herman · 2024-01-26T04:38:43Z

For anyone interested RFC is here:
#1926

IanMenendez added enhancement New feature or request untriaged labels Sep 25, 2023

ylwu-amzn added the feature label Sep 25, 2023

ylwu-amzn added this to ml-commons projects Sep 25, 2023

ylwu-amzn moved this to Backlog in ml-commons projects Sep 25, 2023

ylwu-amzn assigned rishabh1815769 Oct 24, 2023

Zhangxunmt removed the untriaged label Oct 30, 2023

mingshl moved this from Backlog to In Progress in ml-commons projects Mar 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Allow downloading ML models from Cloud providers buckets #1371

[FEATURE] Allow downloading ML models from Cloud providers buckets #1371

IanMenendez commented Sep 25, 2023

ylwu-amzn commented Sep 25, 2023 •

edited

Loading

IanMenendez commented Sep 25, 2023

ylwu-amzn commented Sep 25, 2023

rishabh1815769 commented Sep 27, 2023

austintlee commented Oct 24, 2023

ylwu-amzn commented Oct 24, 2023

khoaisohd commented Nov 7, 2023 •

edited

Loading

ylwu-amzn commented Nov 7, 2023

khoaisohd commented Nov 7, 2023 •

edited

Loading

khoaisohd commented Nov 8, 2023

IanMenendez commented Nov 10, 2023

khoaisohd commented Nov 11, 2023

austintlee commented Nov 12, 2023

austintlee commented Nov 12, 2023

khoaisohd commented Nov 13, 2023

sam-herman commented Jan 24, 2024 •

edited

Loading

IanMenendez commented Jan 24, 2024

ylwu-amzn commented Jan 24, 2024

sam-herman commented Jan 25, 2024

sam-herman commented Jan 26, 2024

[FEATURE] Allow downloading ML models from Cloud providers buckets #1371

[FEATURE] Allow downloading ML models from Cloud providers buckets #1371

Comments

IanMenendez commented Sep 25, 2023

ylwu-amzn commented Sep 25, 2023 • edited Loading

IanMenendez commented Sep 25, 2023

ylwu-amzn commented Sep 25, 2023

rishabh1815769 commented Sep 27, 2023

austintlee commented Oct 24, 2023

ylwu-amzn commented Oct 24, 2023

khoaisohd commented Nov 7, 2023 • edited Loading

ylwu-amzn commented Nov 7, 2023

khoaisohd commented Nov 7, 2023 • edited Loading

Problem Statement

Scope

Assumption

Proposal

Model additional attributes

OCI object storage additional settings

khoaisohd commented Nov 8, 2023

S3 additional settings

IanMenendez commented Nov 10, 2023

khoaisohd commented Nov 11, 2023

austintlee commented Nov 12, 2023

austintlee commented Nov 12, 2023

khoaisohd commented Nov 13, 2023

sam-herman commented Jan 24, 2024 • edited Loading

IanMenendez commented Jan 24, 2024

ylwu-amzn commented Jan 24, 2024

sam-herman commented Jan 25, 2024

sam-herman commented Jan 26, 2024

ylwu-amzn commented Sep 25, 2023 •

edited

Loading

khoaisohd commented Nov 7, 2023 •

edited

Loading

khoaisohd commented Nov 7, 2023 •

edited

Loading

sam-herman commented Jan 24, 2024 •

edited

Loading