Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Failed deploy/load results in a new model group #1597

Open
austintlee opened this issue Nov 4, 2023 · 4 comments
Open

[BUG] Failed deploy/load results in a new model group #1597

austintlee opened this issue Nov 4, 2023 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@austintlee
Copy link
Collaborator

What is the bug?
While working on #844, I encountered an issue where loading a model without properly setting only_run_on_ml_node resulted in IllegalArgumentException, but ml-commons still created a model group in the model group index.

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. ./gradlew run
  2. Run
curl -XPOST http://localhost:9200/_plugins/_ml/models/_upload -H "Content-Type: application/json"  -d'
{
  "name": "huggingface/sentence-transformers/all-MiniLM-L12-v2",
  "version": "1.0.1",
  "model_format": "TORCH_SCRIPT"
}' | jq

Confirm you get this error:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "No eligible node found to execute this request. It's best practice to provision ML nodes to serve your models. You can disable this setting to serve the model on your data node for development purposes by disabling the \"plugins.ml_commons.only_run_on_ml_node\" configuration using the _cluster/setting api"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "No eligible node found to execute this request. It's best practice to provision ML nodes to serve your models. You can disable this setting to serve the model on your data node for development purposes by disabling the \"plugins.ml_commons.only_run_on_ml_node\" configuration using the _cluster/setting api"
  },
  "status": 400
}
  1. Now check the model group index:
curl -XPOST http://localhost:9200/_plugins/_ml/model_groups/_search -H "Content-Type: application/json"  -d'{"query": {"match_all": {}}}' | jq
{
  "took": 87,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": ".plugins-ml-model-group",
        "_id": "COp8m4sBAa-YxbqmAojc",
        "_version": 1,
        "_seq_no": 0,
        "_primary_term": 1,
        "_score": 1,
        "_source": {
          "created_time": 1699120677451,
          "access": "public",
          "latest_version": 0,
          "last_updated_time": 1699120677451,
          "name": "huggingface/sentence-transformers/all-MiniLM-L12-v2"
        }
      }
    ]
  }
}

I am not able to delete this.

What is the expected behavior?
A clear and concise description of what you expected to happen.

What is your host/environment?

  • OS: [e.g. iOS]
  • Version [e.g. 22]
  • Plugins

Do you have any screenshots?
If applicable, add screenshots to help explain your problem.

Do you have any additional context?
Add any other context about the problem.

@austintlee austintlee added bug Something isn't working untriaged labels Nov 4, 2023
@dhrubo-os
Copy link
Collaborator

Which OS version are you using?

@austintlee
Copy link
Collaborator Author

3.0.0-SNAPSHOT (main).

@dhrubo-os
Copy link
Collaborator

@rbhavna could you please look into this?

@rbhavna
Copy link
Collaborator

rbhavna commented Nov 15, 2023

@austintlee thanks for the catch. We have recently made a code change to delete the model group when the version update fails. Looks like this particular scenario is not being handled by that change. Let me take a look at this and fix it. For now you should be able to continue your testing. You should still be able to upload the "huggingface/sentence-transformers/all-MiniLM-L12-v2" model and continue testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: In Progress
Development

No branches or pull requests

4 participants