Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Definition of submodelDescriptor to support parquet files #942

Open
3 tasks
thomas-henn opened this issue Jun 21, 2024 · 7 comments
Open
3 tasks

Definition of submodelDescriptor to support parquet files #942

thomas-henn opened this issue Jun 21, 2024 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@thomas-henn
Copy link

thomas-henn commented Jun 21, 2024

Description

The description of a submodel descriptior in an Asset Administration Shell should be given, with respect to
"interface" (https://admin-shell-io.github.io/aas-specs-antora/IDTA-01002/v3.1/specification/interfaces-payload.html#_endpoint),
"semanticId"
for the use of parquet files.

Acceptance Criteria

  • [criteria 1]
  • [criteria 2]
  • [criteria 3]

Additional Information

Child of Feature: eclipse-tractusx/sig-release#721
linked to: eclipse-tractusx/sldt-semantic-models#762
Possible solutions:definition_submodeldescriptors_parquet.md

@thomas-henn thomas-henn added the enhancement New feature or request label Jun 21, 2024
@arnoweiss
Copy link
Contributor

This might work similarly to the PCF use-case where the submodelDescriptor.endpoint.interface property is used as discriminator how to access the "submodel". Afaik, parquet is a compression technology so there'd have to be a very precise spec

  1. how to obtain the file (http/s3/?) and how to parameterize that call
  2. how to decompress it
  3. how to parse the plaintext payload into a domain-model (like a SAMM aspect)

I'm interested in contributing here.

@tunacicek
Copy link
Contributor

tunacicek commented Jul 17, 2024

@BirgitBoss : Thanks for your input.
Four possible solutions (which need to be evaluated) to deliver parquet files:

  1. Define the parquet file in aspect as BLOB-type and add the the payload(This could be very large)
  2. Define the parquet file in aspect as File-type and add the path to the parquet file where the requester can download it. (Two steps needed to get file downloaded)
  3. Define only the meta information like the link to the FTP Server etc. in aspect
  4. Use API [/submodel/submodel-elements/{idShortPath}/attachment which delivers zip

@arnoweiss
Copy link
Contributor

Is there an explicit requirement

  • to have the submodelDescriptor point to a S3/BlobStorage resource?
  • to have (bidirectional) transformation rules from a nestable format (like SAMM) to a tabular format?

I'm inclined to reuse as much of the href/subprotocolBody mechanism from the SUBMODEL-3.0 interface as possible assuming that access will always be negotiated via DSP catalogs.

@tunacicek
Copy link
Contributor

tunacicek commented Aug 1, 2024

Hi @arnoweiss ,

  1. there is no explicit requirement to use File transfer via Bucket. But for larger files, it make sense to use the transfer.
  2. We assume to create a new model which maps the flatten hierarchy to represent the columns in the parquet file. (Rules like using "_" or "."). Therefore we have the second story:
    [New Model]: Asepct Model to Handle Parquet Files sldt-semantic-models#762

I added a md file in the description (Possible solutions) which includes two solutions on how to define the SubModelDescriptors.

@tunacicek
Copy link
Contributor

tunacicek commented Aug 1, 2024

See also md file in description Possible solutions: definition_submodeldescriptors_parquet.md

Definition of Submodel Descriptors for Parquet Data

Solution 1: File transfer via S3 bucket with EDC AWS extension

Introduction to File Transfer process in EDC

  1. Provider uploads the parquet file to an S3 bucket.
  2. Provider creates an edc-asset. The dataAddress includes information about the S3 bucket and the file name:
    {
      "edc:dataAddress": {
        "edc-type": "AmazonS3",
        "edc:bucket": "<provider-bucket>",
        "edc:keyName": "<file-name>"
      }
    }
    See also full example here: edc-asset for file transfer
  3. Provider creates a shell with shell descriptors and and the edc Asset with controplane URL.
  4. Consumer calls the DTR via EDC and read the submodeldescriptor from the shell.
  5. Consumer starts negotiation/transfer process and provide their own bucket information:
    {
      "edc:dataDestination": {
        "edc-type": "AmazonS3",
        "edc:bucket": "<consumer-bucket>",
        "edc:keyName": "<file-name>",
        "edc:accessKeyId":"<access-key-id>",
        "edc:secretAccessKey":"<secretAccessKey>"
      }
    }
  6. Provider retrieve the request and download the requested file from bucket and upload it to the consumer bucket.
  7. Consumer can download the file from their bucket.

SubmodelDescriptor for Parquet Data (via File Transfer)

For the file transfer via S3 bucket the submodelDescriptor can be defined like:

{
...
  "submodelDescriptors": [
    {
      "idShort": "quality-data",
      "id": "<uuid>",
      "endpoints": [
        {
           // TODO: Check which interface can be used here. See :https://github.com/admin-shell-io/questions-and-answers?tab=readme-ov-file#id47
          "interface": "SUBMODEL-3.0",
          "protocolInformation": {
           //TODO: href is required. Check if insert NONE is possible.
            "href": "not used for transfer via s3 bucket",
            "endpointProtocol": "HTTP",
            "endpointProtocolVersion": [
              "1.1"
            ],
            "subprotocol": "",
            "subprotocolBody": "id=<edc-asset-id>;dspEndpoint=<controlplane-url>",
            "subprotocolBodyEncoding": "application/octet-stream;type=parquet-snappy",
            "securityAttributes": [
              {
                "type": "NONE",
                "key": "NONE",
                "value": "NONE"
              }
            ]
          }
        }
      ],
      "semanticId": {
        "type": "ExternalReference",
        "keys": [
          {
            "type": "Submodel",
            "value": "urn:samm:io.catenax.vehicle.product_description:3.0.0#ProductDescription"
          }, 
           {
              "type": "Submodel",
              //TBD: New Model which map flatten hirarchy to model (like with "_" or "."). Outcome of the issue: https://github.com/eclipse-tractusx/sldt-semantic-models/issues/762
              "value": "urn:samm:io.catenax.vehicle.product_description:3.0.0#FlattenedProductDescription"
           }
        ]
      },
      "description": [
        {
          "language": "en",
          "text": "submodel-descriptor for quality data which will be transferred via S3 bucket."
        }
      ]
    }
  ]
}
Parameter Value Description
href "" not used for transfer via s3 bucket
subprotocolBody id=;dspEndpoint= Includes information about provider edc controlplane and edc-asset ID
subprotocolBodyEncoding application/octet-stream;type=parquet-snappy format of file. The file will be transfer via bucket.
semanticId.keys[].value urn:samm:io.catenax.vehicle.product_description:3.0.0#FlattenedProductDescription Aspect model which map flatten hirarchy to model
description[].text submodel-descriptor for quality data which will be transferred via S3 bucket. Further description for the consumer.

Solution 2: File transfer via S3 bucket without EDC AWS extension

Introduction to File Transfer process in EDC

  1. Provider uploads the parquet file to an S3 bucket.
  2. Provider creates a shell with shell descriptors and the edc Asset, controlplane URL, S3 Bucket Link and credentials to access s3 bucket of provider.
  3. Consumer calls the DTR via EDC and read the submodeldescriptor from the shell.
  4. Consumer reads the credentials and starts donwloading the file from provider s3 bucket.

SubmodelDescriptor for Parquet Data (via File Transfer)

For the file transfer via S3 bucket the submodelDescriptor can be defined like:

{
...
  "submodelDescriptors": [
    {
      "idShort": "quality-data",
      "id": "<uuid>",
      "endpoints": [
        {
           // TODO: Check which interface can be used here. See :https://github.com/admin-shell-io/questions-and-answers?tab=readme-ov-file#id47
          "interface": "SUBMODEL-3.0",
          "protocolInformation": {
           //Link to Provider AWS S3 Bucket (S3 URI)
            "href": "https://example-bucket.s3.us-east-2.amazonaws.com/productDescription. parquet",
            "endpointProtocol": "HTTP",
            "endpointProtocolVersion": [
              "1.1"
            ],
            "subprotocol": "",
            "subprotocolBody": "id=<edc-asset-id>;dspEndpoint=<controlplane-url>",
            "subprotocolBodyEncoding": "application/octet-stream;type=parquet-snappy",
             // TODO: Check credentials for S3 bucket can be used here
            "securityAttributes": [
              {
                // TODO: Clarify which enum can be used
                "type": "NONE",
                // S3 securityKey
                "key": "SecurityKey",
                // S3 accessKey
                "value": "AccessKey"
              }
            ]
          }
        }
      ],
      "semanticId": {
        "type": "ExternalReference",
        "keys": [
          {
            "type": "Submodel",
            "value": "urn:samm:io.catenax.vehicle.product_description:3.0.0#ProductDescription"
          }, 
           {
              "type": "Submodel",
              //TBD: New Model which map flatten hirarchy to model (like with "_" or "."). Outcome of the issue: https://github.com/eclipse-tractusx/sldt-semantic-models/issues/762
              "value": "urn:samm:io.catenax.vehicle.product_description:3.0.0#FlattenedProductDescription"
           }
        ]
      },
      "description": [
        {
          "language": "en",
          "text": "submodel-descriptor for quality data which will be transferred via S3 bucket."
        }
      ]
    }
  ]
}

Solution 3: File transfer via HTTPS

If the file size is not too large, the file can be transferred via HTTPS in the usual way. The submodelDescriptor for this case can be defined like:

{
...
  "submodelDescriptors": [
    {
      "idShort": "quality-data",
      "id": "<uuid>",
      "endpoints": [
        {
          "interface": "productDescription",
          "protocolInformation": {
            "href": "<provider-edc-dataplane-url>/<path-to-download-file>",
            "endpointProtocol": "HTTP",
            "endpointProtocolVersion": [
              "1.1"
            ],
            "subprotocol": "",
            "subprotocolBody": "id=<edc-asset-id>;dspEndpoint=<controlplane-url>",
            "subprotocolBodyEncoding": "application/octet-stream;type=parquet-snappy",
            "securityAttributes": [
              {
                "type": "NONE",
                "key": "NONE",
                "value": "NONE"
              }
            ]
          }
        }
      ],
      "semanticId": {
        "type": "ExternalReference",
        "keys": [
          {
            "type": "Submodel",
            //TBD: New Model which map flatten hirarchy to model (like with "_" or ".")
            "value": "urn:samm:io.catenax.vehicle.product_description:3.0.0#FlattenedProductDescription"
          }
        ]
      },
      "description": [
        {
          "language": "en",
          "text": "submodel-descriptor for quality data which will be transferred via HTTPS."
        }
      ]
    }
  ]
}

@agg3fe
Copy link

agg3fe commented Sep 13, 2024

As discussed with Birgit, the submodel interface value 'productDescription' is not according to IDTA standards. We need to use interface value as 'SUBMODEL-3.0'.
We need to create a simple submodel/aspect, like we are using before. That new aspect model will contain the reference to the Parquet file as a property. It will also contain the semantics of that Parquet file and the conversion fields/values required.

Need to check the mechanism that is used currently through EDC to exchange the consumer Bucket credentials. Through this new defined submodel, consumer won't be able to provide the bucket credentials. So need to find out a way to make async call to provider and get the response for Parquet file location.

Example of submodel is attached.ProductDesriptionAsParquetFile.txt

@tunacicek
Copy link
Contributor

See also md file in description Possible solutions: definition_submodeldescriptors_parquet.md

Definition of Submodel Descriptors for Parquet Data

Solution 1: File transfer via S3 bucket with EDC AWS extension

Introduction to File Transfer process in EDC

  1. Provider uploads the parquet file to an S3 bucket.

  2. Provider creates an edc-asset. The dataAddress includes information about the S3 bucket and the file name:

    {
      "edc:dataAddress": {
        "edc-type": "AmazonS3",
        "edc:bucket": "<provider-bucket>",
        "edc:keyName": "<file-name>"
      }
    }

    See also full example here: edc-asset for file transfer

  3. Provider creates a shell with shell descriptors and and the edc Asset with controplane URL.

  4. Consumer calls the DTR via EDC and read the submodeldescriptor from the shell.

  5. Consumer starts negotiation/transfer process and provide their own bucket information:

    {
      "edc:dataDestination": {
        "edc-type": "AmazonS3",
        "edc:bucket": "<consumer-bucket>",
        "edc:keyName": "<file-name>",
        "edc:accessKeyId":"<access-key-id>",
        "edc:secretAccessKey":"<secretAccessKey>"
      }
    }
  6. Provider retrieve the request and download the requested file from bucket and upload it to the consumer bucket.

  7. Consumer can download the file from their bucket.

SubmodelDescriptor for Parquet Data (via File Transfer)

For the file transfer via S3 bucket the submodelDescriptor can be defined like:

{
...
  "submodelDescriptors": [
    {
      "idShort": "quality-data",
      "id": "<uuid>",
      "endpoints": [
        {
           // TODO: Check which interface can be used here. See :https://github.com/admin-shell-io/questions-and-answers?tab=readme-ov-file#id47
          "interface": "SUBMODEL-3.0",
          "protocolInformation": {
           //TODO: href is required. Check if insert NONE is possible.
            "href": "not used for transfer via s3 bucket",
            "endpointProtocol": "HTTP",
            "endpointProtocolVersion": [
              "1.1"
            ],
            "subprotocol": "",
            "subprotocolBody": "id=<edc-asset-id>;dspEndpoint=<controlplane-url>",
            "subprotocolBodyEncoding": "application/octet-stream;type=parquet-snappy",
            "securityAttributes": [
              {
                "type": "NONE",
                "key": "NONE",
                "value": "NONE"
              }
            ]
          }
        }
      ],
      "semanticId": {
        "type": "ExternalReference",
        "keys": [
          {
            "type": "Submodel",
            "value": "urn:samm:io.catenax.vehicle.product_description:3.0.0#ProductDescription"
          }, 
           {
              "type": "Submodel",
              //TBD: New Model which map flatten hirarchy to model (like with "_" or "."). Outcome of the issue: https://github.com/eclipse-tractusx/sldt-semantic-models/issues/762
              "value": "urn:samm:io.catenax.vehicle.product_description:3.0.0#FlattenedProductDescription"
           }
        ]
      },
      "description": [
        {
          "language": "en",
          "text": "submodel-descriptor for quality data which will be transferred via S3 bucket."
        }
      ]
    }
  ]
}

Parameter Value Description
href "" not used for transfer via s3 bucket
subprotocolBody id=;dspEndpoint= Includes information about provider edc controlplane and edc-asset ID
subprotocolBodyEncoding application/octet-stream;type=parquet-snappy format of file. The file will be transfer via bucket.
semanticId.keys[].value urn:samm:io.catenax.vehicle.product_description:3.0.0#FlattenedProductDescription Aspect model which map flatten hirarchy to model
description[].text submodel-descriptor for quality data which will be transferred via S3 bucket. Further description for the consumer.

Solution 2: File transfer via S3 bucket without EDC AWS extension

Introduction to File Transfer process in EDC

  1. Provider uploads the parquet file to an S3 bucket.
  2. Provider creates a shell with shell descriptors and the edc Asset, controlplane URL, S3 Bucket Link and credentials to access s3 bucket of provider.
  3. Consumer calls the DTR via EDC and read the submodeldescriptor from the shell.
  4. Consumer reads the credentials and starts donwloading the file from provider s3 bucket.

SubmodelDescriptor for Parquet Data (via File Transfer)

For the file transfer via S3 bucket the submodelDescriptor can be defined like:

{
...
  "submodelDescriptors": [
    {
      "idShort": "quality-data",
      "id": "<uuid>",
      "endpoints": [
        {
           // TODO: Check which interface can be used here. See :https://github.com/admin-shell-io/questions-and-answers?tab=readme-ov-file#id47
          "interface": "SUBMODEL-3.0",
          "protocolInformation": {
           //Link to Provider AWS S3 Bucket (S3 URI)
            "href": "https://example-bucket.s3.us-east-2.amazonaws.com/productDescription. parquet",
            "endpointProtocol": "HTTP",
            "endpointProtocolVersion": [
              "1.1"
            ],
            "subprotocol": "",
            "subprotocolBody": "id=<edc-asset-id>;dspEndpoint=<controlplane-url>",
            "subprotocolBodyEncoding": "application/octet-stream;type=parquet-snappy",
             // TODO: Check credentials for S3 bucket can be used here
            "securityAttributes": [
              {
                // TODO: Clarify which enum can be used
                "type": "NONE",
                // S3 securityKey
                "key": "SecurityKey",
                // S3 accessKey
                "value": "AccessKey"
              }
            ]
          }
        }
      ],
      "semanticId": {
        "type": "ExternalReference",
        "keys": [
          {
            "type": "Submodel",
            "value": "urn:samm:io.catenax.vehicle.product_description:3.0.0#ProductDescription"
          }, 
           {
              "type": "Submodel",
              //TBD: New Model which map flatten hirarchy to model (like with "_" or "."). Outcome of the issue: https://github.com/eclipse-tractusx/sldt-semantic-models/issues/762
              "value": "urn:samm:io.catenax.vehicle.product_description:3.0.0#FlattenedProductDescription"
           }
        ]
      },
      "description": [
        {
          "language": "en",
          "text": "submodel-descriptor for quality data which will be transferred via S3 bucket."
        }
      ]
    }
  ]
}

Solution 3: File transfer via HTTPS

If the file size is not too large, the file can be transferred via HTTPS in the usual way. The submodelDescriptor for this case can be defined like:

{
...
  "submodelDescriptors": [
    {
      "idShort": "quality-data",
      "id": "<uuid>",
      "endpoints": [
        {
          "interface": "productDescription",
          "protocolInformation": {
            "href": "<provider-edc-dataplane-url>/<path-to-download-file>",
            "endpointProtocol": "HTTP",
            "endpointProtocolVersion": [
              "1.1"
            ],
            "subprotocol": "",
            "subprotocolBody": "id=<edc-asset-id>;dspEndpoint=<controlplane-url>",
            "subprotocolBodyEncoding": "application/octet-stream;type=parquet-snappy",
            "securityAttributes": [
              {
                "type": "NONE",
                "key": "NONE",
                "value": "NONE"
              }
            ]
          }
        }
      ],
      "semanticId": {
        "type": "ExternalReference",
        "keys": [
          {
            "type": "Submodel",
            //TBD: New Model which map flatten hirarchy to model (like with "_" or ".")
            "value": "urn:samm:io.catenax.vehicle.product_description:3.0.0#FlattenedProductDescription"
          }
        ]
      },
      "description": [
        {
          "language": "en",
          "text": "submodel-descriptor for quality data which will be transferred via HTTPS."
        }
      ]
    }
  ]
}

Hi @thomas-henn : Could you please review the solutions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: In Review
Development

No branches or pull requests

4 participants