Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Connector payload validation #2748

Closed
yuye-aws opened this issue Jul 25, 2024 · 21 comments
Closed

[RFC] Connector payload validation #2748

yuye-aws opened this issue Jul 25, 2024 · 21 comments
Labels
RFC Request For Comments from the OpenSearch Community

Comments

@yuye-aws
Copy link
Member

yuye-aws commented Jul 25, 2024

This RFC aims to resolve the bug issue resulted from payload validation in connector. You can refer to this issue on how to reproduce the bug.

How the bug works

The connector creates the payload and then validates it.

String payload = connector.createPayload(action, parameters);
connector.validatePayload(payload);

During payload creation, all parameters within ${parameters.foo} will be replaced by values in parameters

public <T> T createPayload(String action, Map<String, String> parameters) {
Optional<ConnectorAction> connectorAction = findAction(action);
if (connectorAction.isPresent() && connectorAction.get().getRequestBody() != null) {
String payload = connectorAction.get().getRequestBody();
payload = fillNullParameters(parameters, payload);
StringSubstitutor substitutor = new StringSubstitutor(parameters, "${parameters.", "}");
payload = substitutor.replace(payload);
if (!isJson(payload)) {
throw new IllegalArgumentException("Invalid payload: " + payload);
}
return (T) payload;
}
return (T) parameters.get("http_body");
}

For payload validation, any string with format like ${parameters.foo} will lead to an exception, which is possible for every parameter.

default void validatePayload(String payload) {
if (payload != null && payload.contains("${parameters")) {
Pattern pattern = Pattern.compile("\\$\\{parameters\\.([^}]+)}");
Matcher matcher = pattern.matcher(payload);
StringBuilder errorBuilder = new StringBuilder();
while (matcher.find()) {
String parameter = matcher.group(1);
errorBuilder.append(parameter).append(", ");
}
String error = errorBuilder.substring(0, errorBuilder.length() - 2).toString();
throw new IllegalArgumentException("Some parameter placeholder not filled in payload: " + error);
}
}

Solution

We propose a new validation and creation process for the connector payload.

  1. From the action request body, get all parameters which needs to be filled in with actual values.
  2. Validates that all parameters in step 1 have been filled.

Open question

  1. Do we still allow users to recursively fill in parameters? If so, we can add a connector setting like recursive_parameters (false by default). We need to instruct user that setting recursive_parameters to be true may lead to the bug.
@yuye-aws yuye-aws changed the title [RFC] Refactor connector payload validation [RFC] Connector payload validation Jul 25, 2024
@yuye-aws
Copy link
Member Author

yuye-aws commented Jul 25, 2024

Implemented this RFC in a draft PR: #2724. Despite being able to resolve the bug, the draft PR restricts that connector payload can only fill in required parameters. If users wants to fill in parameters with parameter map {"prompt": "${parameters.role}", "role": "bot"} via multiple steps:

${parameters.prompt} -> You are a ${parameters.role} -> You are a bot

Then the created payload would be You are a ${parameters.role}

@yuye-aws
Copy link
Member Author

yuye-aws commented Jul 25, 2024

Perhaps we can deal with the drawbacks with two different regex. The format ${parameters.foo} is only used for payload validation and required parameters. Another format (just an example) @{parameters.foo} can be further used to fill in aforementioned multi-step parameters.

@yuye-aws
Copy link
Member Author

Any comments would be appreciated :)

@zane-neo
Copy link
Collaborator

zane-neo commented Jul 25, 2024

Implemented this RFC in a draft PR: #2724. Despite being able to resolve the bug, the draft PR restricts that connector payload can only fill in required parameters. If users wants to fill in parameters with parameter map {"prompt": "${parameters.role}", "role": "bot"} via multiple steps:

${parameters.prompt} -> You are a ${parameters.role} -> You are a bot

Then the created payload would be You are a ${parameters.role}

This seem a pretty advanced use case, I don't think this exist now. No need to consider this until we received customer's callout.

@yuye-aws
Copy link
Member Author

yuye-aws commented Jul 25, 2024

Implemented this RFC in a draft PR: #2724. Despite being able to resolve the bug, the draft PR restricts that connector payload can only fill in required parameters. If users wants to fill in parameters with parameter map {"prompt": "${parameters.role}", "role": "bot"} via multiple steps:
${parameters.prompt} -> You are a ${parameters.role} -> You are a bot
Then the created payload would be You are a ${parameters.role}

This seem a pretty advanced use case, I don't think this exist now. No need to consider this until we received customer's callout.

The current createPayload function implicitly supports this use case. I am just wondering whether it would be a breaking change after my fix.

@ylwu-amzn
Copy link
Collaborator

Suggest to add a new parameter to parameters : skip_validating_missing_parameters, Default value is false , so we can keep backward compatible. If set as true, we will skip validating if there are any missing parameters. This solution just have one line of code change. This can solve the problem and effort is small.

@yuye-aws
Copy link
Member Author

yuye-aws commented Jul 26, 2024

Updated in PR: #2724. Can you help review? @ylwu-amzn @jngz-es @zane-neo

@b4sjoo b4sjoo moved this to Untriaged in ml-commons projects Jul 30, 2024
@b4sjoo b4sjoo added RFC Request For Comments from the OpenSearch Community and removed untriaged labels Jul 30, 2024
@b4sjoo b4sjoo moved this from Untriaged to In Progress in ml-commons projects Jul 30, 2024
@yuye-aws
Copy link
Member Author

yuye-aws commented Jul 31, 2024

Reached an agreement on the following solution:

  1. Introduce a new parameter named recursive_parameter_enabled to connector, the default value is true.
  2. If recursive_parameter_enabled is true, then follow the old connector behavior.
  3. If recursive_parameter_enabled is false, then validate only required parameters in the connector, nested parameter will not be filled.

@ylwu-amzn
Copy link
Collaborator

hi, Yuye, have you guys discussed pros/cons of using recursive_parameter_enabled and c?

By adding recursive_parameter_enabled, I see one cons is we need to add more changes: check required parameters and check if recursive_parameter_enabled flag is true. While the recursive_parameter_enabled one just need to check if recursive_parameter_enabled flag is true, it's simpler.

Can you explain how to get required parameters? Like we discussed before, I would suggest to fix the problem directly rather than introducing new unnecessary logic which may brings new issues.

@yuye-aws
Copy link
Member Author

yuye-aws commented Aug 1, 2024

Can you explain how to get required parameters?

I'll showcase with an example

  1. Create a raw payload from the action. Suppose the payload is {"input": "${parameters.input}", "temperature": "${parameters.temperature}"} and parameter map is {"input": "you are a ${parameters.role}", "temperature": "0.01"}.
  2. Extract required parameter from the raw payload. The required parameter is {"input", "temperature"}. In this case, role is not the required parameter.

@yuye-aws
Copy link
Member Author

yuye-aws commented Aug 1, 2024

Like we discussed before, I would suggest to fix the problem directly rather than introducing new unnecessary logic which may brings new issues.

The parameter is recursive_parameter_enabled is enabled by default. There would be no impact for users who are not aware of this parameter.

#2724 (comment) shows a drawback of the current connector, i.e., users may not want to recursively filling in parameter in their payload. We need a way for user to configure this behavior.

@zane-neo
Copy link
Collaborator

zane-neo commented Aug 1, 2024

Reached an agreement on the following solution:

  1. Introduce a new parameter named recursive_parameter_enabled to connector, the default value is true.
  2. If recursive_parameter_enabled is true, then follow the old connector behavior.
  3. If recursive_parameter_enabled is false, then validate only required parameters in the connector, nested parameter will not be filled.

Not supporting agent level override this parameter?

@yuye-aws
Copy link
Member Author

yuye-aws commented Aug 2, 2024

Not supporting agent level override this parameter?

If we are supporting this, how about introducing parameter to agents? @jngz-es @Zhangxunmt @ylwu-amzn

@yuye-aws
Copy link
Member Author

yuye-aws commented Aug 2, 2024

We can iterate and optimize. How about supporting agent level override in another PR?

@zane-neo
Copy link
Collaborator

zane-neo commented Aug 5, 2024

Not supporting agent level override this parameter?

If we are supporting this, how about introducing parameter to agents? @jngz-es @Zhangxunmt @ylwu-amzn

We have to support this case, because different agent might have different behaviors in terms of validating the payload, missing this can cause application running in unexpected status and customer not aware of it at all.

@ylwu-amzn
Copy link
Collaborator

If we are supporting this, how about introducing parameter to agents? @jngz-es @Zhangxunmt @ylwu-amzn

I think this already supported? User can set parameters in tool configuration when create agent.

@ylwu-amzn
Copy link
Collaborator

ylwu-amzn commented Aug 6, 2024

Can you explain how to get required parameters?

I'll showcase with an example

1. Create a raw payload from the action. Suppose the payload is `{"input": "${parameters.input}", "temperature": "${parameters.temperature}"}` and parameter map is `{"input": "you are a ${parameters.role}", "temperature": "0.01"}`.

2. Extract required parameter from the raw payload. The required parameter is `{"input", "temperature"}`. In this case, `role` is not the required parameter.

Like I said, you fixed one problem by adding something new. This required parameter is not a general feature. I think the general parameter validation feature should though model interface. @b4sjoo built model interface to validate input/output parameter list and type. Why we build another feature for the same thing?

Your example is the basic use case. We should consider more cases. For example connector http body is

{
  "prompt": "You are a helpful PPL query translate expert. Please  help translate this ${parameters.question} to PPL query."
}

The required parameter should be question, not prompt. The model interface is for defining parameter list and type. Phase1 already released (@b4sjoo built this feature). I think we should use model interface.

@yuye-aws
Copy link
Member Author

yuye-aws commented Aug 8, 2024

{
"prompt": "You are a helpful PPL query translate expert. Please help translate this ${parameters.question} to PPL query."
}

Do you expect user to send such a prompt parameter? If so, the users can simply follow the default (old) behavior.

@yuye-aws
Copy link
Member Author

yuye-aws commented Aug 8, 2024

{
"prompt": "You are a helpful PPL query translate expert. Please help translate this ${parameters.question} to PPL query."
}

Can you think of some use case where the user needs to disable the recursive_parameter_enabled setting but still expects filling in nested parameter?

@b4sjoo
Copy link
Collaborator

b4sjoo commented Aug 12, 2024

Side note

You can always use model interface to control the eligibility of a field name, for instance, consider an input like this:

{
    "inputs": "What is the meaning of life?"
}

Can represent a general schema like this:

General schema

{
      "type": "object",
      "properties": {
        "inputs": {
          "type": "string"
        }
    }
}

This represents any JSON object input, when it has a field called input inside, it must be a string entry. For instance:

# Valid case 1
{
    "inputs": "bar" # 'inputs' is a string
}

# Valid case 2
{
    "foo": [1,2,3] # no field called 'inputs' is detected, so everything is eligible in this JSON object
}

# Valid case 3
{
    "foo": [1,2,3],
    "inputs": "bar" # we only check if 'inputs' field is a string or not, other fields can be in any structure
}

# Invalid case 1
{
    "inputs": 3 # 'inputs' is a number
}

'required' Keyword

However, if we want the input field to be required, we can use a required keyword:

{
      "type": "object",
      "properties": {
        "inputs": {
          "type": "string"
        }
    },
    "required": ["inputs"]
}

So that:

# Valid case 1
{
    "inputs": "bar" # 'inputs' is presented, and is a string
}

# Valid case 2
{
    "foo": [1,2,3], # we only check if 'inputs' field is presented and if it is a string or not, other fields can be in any structure
    "inputs": "bar" 
}

# Invalid case 1
{
    "inputs": 3 # 'inputs' is a number
}

# Invalid case 2
{
    "foo": [1,2,3] # with 'required' keyword, a field called 'inputs' must be presented in this case
}

'additionalProperties' Keyword

In the meantime, if we don't want any other fields to be introduced by user, we can use `additionalProperties' Keyword:

{
      "type": "object",
      "properties": {
        "inputs": {
          "type": "string"
        }
    },
    "required": ["inputs"],
    "additionalProperties": false
}

So that:

# Valid case 1
{
    "inputs": "bar" # 'inputs' is presented, and is a string
}

# Invalid case 1
{
    "prompt": "${parameters.inputs}", # 'additionalProperties' is false, but there are fields other than 'inputs' presented
    "inputs": "bar" 
}

# Invalid case 2
{
    "inputs": 3 # 'inputs' is a number
}

# Invalid case 3
{
    "foo": [1,2,3] # 'additionalProperties' is false, but there are fields other than 'inputs' presented; 'inputs' is in 'required' field, but no field called 'inputs' exists.
}

'not' keyword with subschema

Last but not least, we can use a 'not' keyword to introduce a subschema to reject certain field. That means if we don't want some certain fields to be introduced by user, we can use 'not' keyword to reject those fields:

{
      "type": "object",
      "required": ["inputs"],
      "additionalProperties": true,
      "not": {
        "required": ["prompts"]
    }
}

Then

# Valid case 1
{
    "inputs": "bar" # 'inputs' is presented, and is a string
}

# Valid case 2
{
    "foo": [1,2,3], # 'additionalProperties' is true, and this field does not occur in the 'not' subschema, so this is a valid case
    "inputs": "bar" 
}

# Invalid case 1
{
    "inputs": 3 # 'inputs' is a number
}

# Invalid case 2
{
    "foo": [1,2,3] # with 'required' keyword, a field called 'inputs' must be presented in this case
}

# Invalid case 3
{
    "prompt": "${parameters.inputs}", # Although 'additionalProperties' is true, 'prompt' field is rejected by 'not' subschema
    "inputs": "bar" 
}

You can always check https://json-schema.org/learn/glossary to get more information on JSON schema.
Hope this helps.

@yuye-aws
Copy link
Member Author

Closing this issue as this PR: #2812 has been merged

@github-project-automation github-project-automation bot moved this from In Progress to Done in ml-commons projects Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC Request For Comments from the OpenSearch Community
Projects
Development

No branches or pull requests

4 participants