Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] PPL can't extract value from map when the key is nested field alike #565

Closed
A-Gray-Cat opened this issue Aug 13, 2024 · 3 comments
Closed
Labels
bug Something isn't working Lang:PPL Pipe Processing Language support

Comments

@A-Gray-Cat
Copy link

A-Gray-Cat commented Aug 13, 2024

What is the bug?
The unmapped field is a map structure which looks like this

{“userIdentity.sessionContext.sessionIssuer.type”:“Role”,“tlsDetails.clientProvidedHostHeader”:“dynamodb.us-east-1.amazonaws.com”,“userIdentity.sessionContext.sessionIssuer.userName”:“lambda_rii_finding_etl_REDACTED",“userIdentity.sessionContext.sessionIssuer.principalId”:“REDACTED",“recipientAccountId”:“REDACTED",“readOnly”:“true”,“tlsDetails.tlsVersion”:“TLSv1.3",“managementEvent”:“true”,“tlsDetails.cipherSuite”:“TLS_AES_256_GCM_SHA384",“userIdentity.sessionContext.sessionIssuer.accountId”:“REDACTED"}

To retrieve values, SQL is able to do it by select unmapped['userIdentity.sessioncontext.sessionIssuer.type']. However, PPL isn't able to do the same. When the key has more than one level of depth, it will error out.

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Run the following query
source = securitylake.amazon_security_lake_glue_db_us_east_1.amazon_security_lake_table_us_east_1_cloud_trail_mgmt_2_0 | fields unmapped.userIdentity.sessioncontext.sessionIssuer.type 

It will throw this error:

{"Message":"Fail to analyze query. Cause: Can't extract value from unmapped#347[userIdentity]: need struct type but got string"}

However, if we only query for:

source = securitylake.amazon_security_lake_glue_db_us_east_1.amazon_security_lake_table_us_east_1_cloud_trail_mgmt_2_0 | fields unmapped.userIdentity

This will run successfully.

Please note, userIdentity.sessioncontext.sessionIssuer.type isn't a JSON path, but a "key" string. The unmapped field is a map structure with key value pairs, but the key is like a nested JSON path.

What is the expected behavior?
Correct value is extracted.

What is your host/environment?

  • OS: [e.g. iOS]
  • Version 2.13
  • Plugins

Do you have any screenshots?
If applicable, add screenshots to help explain your problem.

Do you have any additional context?
Add any other context about the problem.

@A-Gray-Cat A-Gray-Cat added bug Something isn't working untriaged labels Aug 13, 2024
@YANG-DB YANG-DB self-assigned this Aug 13, 2024
@YANG-DB YANG-DB added Lang:PPL Pipe Processing Language support and removed untriaged labels Aug 13, 2024
@YANG-DB YANG-DB moved this to Todo in PPL Commands Aug 16, 2024
@YANG-DB YANG-DB removed their assignment Aug 21, 2024
@YANG-DB
Copy link
Member

YANG-DB commented Aug 23, 2024

Hi @A-Gray-Cat
Can u plz validate the next test actually represent this issue:

 protected def createStructNestedTable2(testTable: String): Unit = {
    sql(s"""
           | CREATE TABLE $testTable
           | (
           |   unmapped  STRUCT<userIdentity: STRUCT<sessioncontext: STRUCT<sessionIssuer: STRUCT<type: STRING>>>>
           | )
           | USING JSON
           |""".stripMargin)

    sql(s"""
           | INSERT INTO $testTable
           | VALUES
           | ( STRUCT(STRUCT(STRUCT(STRUCT(STRUCT("example_type1"))))) )
           |""".stripMargin)
  }

 test("aaa") {
    val pplFrame = sql(s"""
                       | source = $testTable | fields unmapped.userIdentity.sessioncontext.sessionIssuer.type
                       | """.stripMargin)

    // Retrieve the results
    val pplResults: Array[Row] = pplFrame.collect()
    assert(pplResults.length == 1)
    val expectedResults: Array[Row] = Array(Row("example_type1"))
    // Compare the results
    implicit val rowOrdering: Ordering[Row] = Ordering.by[Row, String](_.getAs[String](0))
    assert(pplResults.sorted.sameElements(expectedResults.sorted))

    val sqlFrame = sql(s"""
                       | select unmapped.userIdentity.sessioncontext.sessionIssuer.type from $testTable
                       | """.stripMargin)

    // Retrieve the results
    val sqlResults: Array[Row] = sqlFrame.collect()
    assert(sqlResults.length == 1)
    assert(sqlResults.sorted.sameElements(expectedResults.sorted))
  }

Please advise...

@A-Gray-Cat
Copy link
Author

Hello,

the schema of unmapped is actually
{
"key": "string",
"value": "string"
}
An example should be something like
{"userIdentity.sessioncontext.sessionIssuer.type": "example" }

Here's the raw logs of of an unmapped field queried using Athena:

{userIdentity.sessionContext.sessionIssuer.type=Role, tlsDetails.clientProvidedHostHeader=glue.us-east-2.amazonaws.com, userIdentity.sessionContext.sessionIssuer.userName=AmazonSecurityLakeMetaStoreManagerV2, userIdentity.sessionContext.sessionIssuer.principalId=REDACTED, recipientAccountId=REDACTED, readOnly=true, tlsDetails.tlsVersion=TLSv1.3, managementEvent=true, userIdentity.sessionContext.sessionIssuer.accountId=REDACTED, tlsDetails.cipherSuite=TLS_AES_128_GCM_SHA256, additionalEventData.lakeFormationPrincipal=arn:aws:iam::REDACTED:role/service-role/AmazonSecurityLakeMetaStoreManagerV2}

@vmmusings vmmusings assigned A-Gray-Cat and unassigned A-Gray-Cat Aug 27, 2024
@salyh salyh moved this from Todo to In Progress in PPL Commands Aug 29, 2024
@YANG-DB YANG-DB moved this from In Progress to InReview in PPL Commands Aug 29, 2024
@YANG-DB
Copy link
Member

YANG-DB commented Oct 9, 2024

closing this issue with similar cause for the closing this

@YANG-DB YANG-DB closed this as completed Oct 9, 2024
@github-project-automation github-project-automation bot moved this from InReview to Done in PPL Commands Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Lang:PPL Pipe Processing Language support
Projects
Status: Done
Development

No branches or pull requests

2 participants