Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Make SQL align with PPL to support evaluate nested fields and update fields. #2933

Open
qianheng-aws opened this issue Aug 13, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@qianheng-aws
Copy link
Contributor

qianheng-aws commented Aug 13, 2024

Is your feature request related to a problem?
Currently in OpenSearch, SQL doesn't support update or evaluate nested fields like PPL. To make them align, we may need to make SQL support these features as well.

e.g.

  1. Update fields
    PPL supports updating the fields while SQL produces a new field with the same field_name which may leads to error.
# PPL
POST _plugins/_ppl/_explain
{
  "query": """
    source=opensearch_dashboards_sample_data_flights | eval FlightTimeMin = 1
  """
}

# response
{
  "root": {
    "name": "ProjectOperator",
    "description": {
      "fields": "[FlightNum, Origin, OriginLocation, DestLocation, FlightDelay, DistanceMiles, FlightTimeMin, OriginWeather, dayOfWeek, AvgTicketPrice, Carrier, FlightDelayMin, OriginRegion, DestAirportID, FlightDelayType, timestamp, Dest, FlightTimeHour, Cancelled, DistanceKilometers, OriginCityName, DestWeather, OriginCountry, DestCountry, DestRegion, DestCityName, OriginAirportID]"
    },
    "children": [
      {
        "name": "EvalOperator",
        "description": {
          "expressions": {
            "FlightTimeMin": "1"
          }
        },
        "children": [
          {
            "name": "OpenSearchIndexScan",
            "description": {
              "request": """OpenSearchQueryRequest(indexName=opensearch_dashboards_sample_data_flights, sourceBuilder={"from":0,"size":10000,"timeout":"1m"}, searchDone=false)"""
            },
            "children": []
          }
        ]
      }
    ]
  }
}

# SQL
POST _plugins/_sql
{
  "query": "SELECT *, 1 as FlightTimeMin FROM opensearch_dashboards_sample_data_flights LIMIT 5"
}

# response error
"{\n  \"error\": {\n    \"reason\": \"Invalid SQL query\",\n    \"details\": \"Multiple entries with same key: FlightTimeMin=1 and FlightTimeMin=1030.7704\",\n    \"type\": \"IllegalArgumentException\"\n  },\n  \"status\": 400\n}"
  1. Evaluate nested fields
    PPL supports evaluating fields from left to right while SQL cannot resolve the reference.
# PPL
POST _plugins/_ppl/_explain
{
  "query": """
    source=opensearch_dashboards_sample_data_flights | eval FlightTimeMinPlusOne = FlightTimeMin + 1, FlightTimeMinPlusTwo = FlightTimeMinPlusOne + 1 | fields FlightTimeMinPlusOne, FlightTimeMinPlusTwo
  """
}

# response
{
  "root": {
    "name": "ProjectOperator",
    "description": {
      "fields": "[FlightTimeMinPlusOne, FlightTimeMinPlusTwo]"
    },
    "children": [
      {
        "name": "EvalOperator",
        "description": {
          "expressions": {
            "FlightTimeMinPlusTwo": "+(FlightTimeMinPlusOne, 1)",
            "FlightTimeMinPlusOne": "+(FlightTimeMin, 1)"
          }
        },
        "children": [
          {
            "name": "OpenSearchIndexScan",
            "description": {
              "request": """OpenSearchQueryRequest(indexName=opensearch_dashboards_sample_data_flights, sourceBuilder={"from":0,"size":10000,"timeout":"1m"}, searchDone=false)"""
            },
            "children": []
          }
        ]
      }
    ]
  }
}

# SQL
POST _plugins/_sql/_explain
{
  "query": "SELECT  (FlightTimeMin + 1) as FlightTimeMinPlusOne, (FlightTimeMinPlusOne + 1) as FlightTimeMinPlusTwo FROM opensearch_dashboards_sample_data_flights LIMIT 5"
}

# response error
"{\n  \"error\": {\n    \"reason\": \"Invalid SQL query\",\n    \"details\": \"can't resolve Symbol(namespace=FIELD_NAME, name=FlightTimeMinPlusOne) in type env\",\n    \"type\": \"SemanticCheckException\"\n  },\n  \"status\": 400\n}"

What solution would you like?
These 2 features are all supported in EvalOperator. But when compiling SQL, it won't use EvalOperator now but only ProjectOperator, which contributes to the differences above.

To make them align, there could be 2 optional solutions:

  1. Make SQL generates EvalOperator as well like PPL
  2. Enhance ProjectOperator to support these 2 features.

What alternatives have you considered?
A clear and concise description of any alternative solutions or features you've considered.

Do you have any additional context?
Add any other context or screenshots about the feature request here.

@dblock
Copy link
Member

dblock commented Sep 2, 2024

[Weekly Catch All Triage - 1]

@YANG-DB
Copy link
Member

YANG-DB commented Nov 29, 2024

@qianheng-aws are you working of this feature ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants