Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: MlClient.put_trained_model() got an unexpected keyword argument 'prefix_strings' when use eland to import model with "--ingest-prefix" #2448

Closed
qubit4zj opened this issue Mar 13, 2024 · 6 comments · Fixed by #2449
Assignees

Comments

@qubit4zj
Copy link

Support prefix_strings for MlClient.put_trained_model()

Use eland to import model intfloat/multilingual-e5-small with command:

docker run -it --rm docker.elastic.co/eland/eland eland_import_hub_model --cloud-id [cloud_id] -u elastic -p [pwd] --hub-model-id intfloat/multilingual-e5-small --task-type text_embedding --ingest-prefix "query: " --search-prefix "query: "
Will get error: TypeError: MlClient.put_trained_model() got an unexpected keyword argument 'prefix_strings'
Logs:
_2024-03-13 06:23:31,065 INFO : Establishing connection to Elasticsearch
2024-03-13 06:23:31,168 INFO : Connected to cluster named 'b4cbfe03eea046129e4ed159d14c302d' (version: 8.12.1)
2024-03-13 06:23:31,169 INFO : Loading HuggingFace transformer tokenizer and model 'intfloat/multilingual-e5-small'
tokenizer_config.json:

100%

STAGE:2024-03-13 06:23:42 1:1 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2024-03-13 06:23:43 1:1 ActivityProfilerController.cpp:300] Completed Stage: Collection
2024-03-13 06:23:48,172 INFO : Creating model with id 'test'
Traceback (most recent call last):
File "/usr/local/bin/eland_import_hub_model", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/site-packages/eland/cli/eland_import_hub_model.py", line 295, in main
ptm.put_config(config=config)
File "/usr/local/lib/python3.10/site-packages/eland/ml/pytorch/_pytorch_model.py", line 78, in put_config
self._client.ml.put_trained_model(model_id=self.model_id, **config_map)
File "/usr/local/lib/python3.10/site-packages/elasticsearch/sync/client/utils.py", line 426, in wrapped
return api(*args, **kwargs)
TypeError: MlClient.put_trained_model() got an unexpected keyword argument 'prefix_strings'

Elasticsearch version (bin/elasticsearch --version):

elasticsearch-py version (elasticsearch.__versionstr__): 8.12.0

Please make sure the major version matches the Elasticsearch server you are running.

Description of the problem including expected versus actual behavior:

Please add "prefix_strings" as input parameter to:
https://github.com/elastic/elasticsearch-py/blob/main/elasticsearch/_sync/client/ml.py#L3654

 def put_trained_model(
    self,
    *,
    model_id: str,
    ...
    body: t.Optional[t.Dict[str, t.Any]] = None,
    prefix_strings: t.Optional[t.Dict[str, t.Any]] = None, 
) -> ObjectApiResponse[t.Any]: ...
        if tags is not None:
            __body["tags"] = tags
        if prefix_strings is not None: 
            __body["prefix_strings"] = prefix_strings

Steps to reproduce:

@pquentin pquentin transferred this issue from elastic/elasticsearch-py Mar 13, 2024
@pquentin pquentin transferred this issue from elastic/eland Mar 13, 2024
@pquentin
Copy link
Member

Thanks @qubit4zj for the issue, and sorry for the double transfer. The Python client is generated from this repository, and prefix_strings is missing from the specification:

body: {
/**
* The compressed (GZipped and Base64 encoded) inference definition of the
* model. If compressed_definition is specified, then definition cannot be
* specified.
*/
compressed_definition?: string
/**
* The inference definition for the model. If definition is specified, then
* compressed_definition cannot be specified.
*/
definition?: Definition
/**
* A human-readable description of the inference trained model.
*/
description?: string
/**
* The default configuration for inference. This can be either a regression
* or classification configuration. It must match the underlying
* definition.trained_model's target_type. For pre-packaged models such as
* ELSER the config is not required.
*/
inference_config?: InferenceConfigCreateContainer
/**
* The input field names for the model definition.
*/
input?: Input
/**
* An object map that contains metadata about the model.
*/
metadata?: UserDefinedValue
/**
* The model type.
* @server_default tree_ensemble
*/
model_type?: TrainedModelType
/**
* The estimated memory usage in bytes to keep the trained model in memory.
* This property is supported only if defer_definition_decompression is true
* or the model definition is not supplied.
*/
model_size_bytes?: long
/**
* The platform architecture (if applicable) of the trained mode. If the model
* only works on one platform, because it is heavily optimized for a particular
* processor architecture and OS combination, then this field specifies which.
* The format of the string must match the platform identifiers used by Elasticsearch,
* so one of, `linux-x86_64`, `linux-aarch64`, `darwin-x86_64`, `darwin-aarch64`,
* or `windows-x86_64`. For portable models (those that work independent of processor
* architecture or OS features), leave this field unset.
*/
platform_architecture?: string
/**
* An array of tags to organize the model.
*/
tags?: string[]
}

@elastic/ml-core Would one of you be able to fix the definition here? Thank you 🙏

@davidkyle davidkyle self-assigned this Mar 13, 2024
@davidkyle
Copy link
Member

Thanks for the bug report @qubit4zj

If you have Elasticsearch version 8.12 then the multilingual-e5-small model can be downloaded and installed via the UI

https://www.elastic.co/guide/en/machine-learning/8.12/ml-nlp-e5.html

@davidkyle
Copy link
Member

The prefix_strings option was added to the model config used in the response in #2363 but are missing in the PUT request. I've added that in #2449

@pquentin
Copy link
Member

Thanks David for your help!

@qubit4zj This will be in the next version of elasticsearch-py that will be released in a few weeks.

@qubit4zj
Copy link
Author

@pquentin @davidkyle Thanks for your update! I'll try the new version once it's ready.

@pquentin
Copy link
Member

Now that this is in the main branch of elasticsearch-py, you can also test it using pip install "elasticsearch@git+https://github.com/elastic/elasticsearch-py" if you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants