Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QLDB streaming to Kinesis sends invalid ION data #405

Closed
ori-maci opened this issue Apr 24, 2023 · 8 comments
Closed

QLDB streaming to Kinesis sends invalid ION data #405

ori-maci opened this issue Apr 24, 2023 · 8 comments

Comments

@ori-maci
Copy link

I am currently working with QLDB -> Kinesis stream -> AWS lambda (node.js 18+) and I am encountering the following when loading an ion record in the lambda:

"errorType": "Runtime.UnhandledPromiseRejection",
    "errorMessage": "TypeError: The encoded data was not valid for encoding utf-8",
    "reason": {
        "errorType": "TypeError",
        "errorMessage": "The encoded data was not valid for encoding utf-8",
        "code": "ERR_ENCODING_INVALID_ENCODED_DATA",

Package versions are:

    "ion-js": "^4.3.0",
    "jsbi": "^4.3.0"

Here is an example code to reproduce this issue:

import { load } from 'ion-js';

const data = "84mawgpANDk2ZTVmYzM4NmUwZjU3ZTQ1MmFjM2U0OWI0NWE1NDUwYjU5MGQ1OWI1ZjI2Mzg3MDQyYzBhMTEzNzVhYTQ4MQpANDljNmM0MzE2Y2Q2YzlkNjM0Zjk0N2FkNzA1NjVkYjYxZmU0MmE4ODI1ZmU1MmYxMWM2YzVjYTYwNmE0ZDc0ZhrhCAgAGtwI4AEA6u4Cu4GD3gK2h74Cso1xbGRiU3RyZWFtQXJuinJlY29yZFR5cGWHcGF5bG9hZIxibG9ja0FkZHJlc3OIc3RyYW5kSWSKc2VxdWVuY2VOb410cmFuc2FjdGlvbklkjo5ibG9ja1RpbWVzdGFtcIlibG9ja0hhc2iLZW50cmllc0hhc2iOkXByZXZpb3VzQmxvY2tIYXNojo9lbnRyaWVzSGFzaExpc3SOj3RyYW5zYWN0aW9uSW5mb4pzdGF0ZW1lbnRziXN0YXRlbWVudIlzdGFydFRpbWWOj3N0YXRlbWVudERpZ2VzdIlkb2N1bWVudHOOlkFkM0FXYlpQNnp1OUhFVGROZzZwcEGJdGFibGVOYW1lh3RhYmxlSWSOkXJldmlzaW9uU3VtbWFyaWVzhGhhc2iKZG9jdW1lbnRJZN4Gl4qOzGFybjphd3M6cWxkYjp1cy1lYXN0LTE6MDEzMjQyMTkwMTg0OnN0cmVhbS9zdGctT3JkZXJzL0tyVTVjVXpkVzM1S0hQWkVjM043STSLjUJMT0NLX1NVTU1BUlmM3gW1jd6djo6WMHJFTjFheHM3UWpKQ25KWXVQQzVSYo8iHguQjpYydXl4WjBZOHk3NDRuT09ydURodWJqkWuAD+eEj5O4osMAhJKuoEluX8OG4PV+RSrD5JtFpUULWQ1ZtfJjhwQsChE3WqSBk66gqfuETMGDyOUE0M6bw5Xy5LZrR6g/Pg3yw7rLHAgTNvmUrqDEt5D5RfCLqxqoN9aYlp53vM43fzQ11k8jdmeUO3fbuJW+AYiuoCIpiZTEUHJR54AxsdM/V6dZXNbY/3behh9NlifFnzxVrqBJxsQxbNbJ1jT5R61wVl22H+QqiCX+UvEcbFymBqTXT66gNM1SpPLZqpH3L2W+RiST6lCt/1CXHpwnu09d7WnTWzSuoEncf6mN7E2qxjhSlH0vwaEgcI3+osU2iA0O5mJ8tDjvlt4CtZe+Aobe2JiOplNFTEVDVCAqIGZyb20gT3JkZXJzIFdIRVJFIG9yZGVySWQgPSA/mWqAD+eEj5O4osMtmq6gJCAdJXhOXYA4eNcQSo7MnuvZHel3OYTzj2e9tSmqptreAamYjvdVUERBVEUgT3JkZXJzIGFzIHQgQlkgaWQgU0VUIHQub3JkZXJTdGF0dXMgPSA/LCB0LnVwZGF0ZWRBdCA9ID8sIHQudXBkYXRlZEJ5ID0gPywgdC5sbXNEYWlseUV2ZW50ID0gPyBXSEVSRSBvcmRlcklkID0gP5lqgA/nhI+TuKLDXJquoL2d2K96cDxjFa3q4DjahDo322blvdjH8Y4p+zzSgFqEm96onN6lnYZPcmRlcnOejpYwbHVMVzBqOXVlR0Vtemp3OTZKWVBsl7IhAZ++vt68oK6gScbEMWzWydY0+UetcFZdth/kKogl/lLxHGxcpgak10+hjpZBZDNBV2JaUDZ6dTlIRVRkTmc2cHBBGsgJCAEawwngAQDq7gPngYPeA+KHvgPejXFsZGJTdHJlYW1Bcm6KcmVjb3JkVHlwZYdwYXlsb2FkiXRhYmxlSW5mb4l0YWJsZU5hbWWHdGFibGVJZIhyZXZpc2lvboxibG9ja0FkZHJlc3OIc3RyYW5kSWSKc2VxdWVuY2VOb4RoYXNohGRhdGGHb3JkZXJJZIlwcm9ncmFtSWSOjm9yZ2FuaXphdGlvbklkjpZsb2FuWWVhcmx5RHVyYXRpb25EYXlzjo5leHRlcm5hbExvYW5JZItvcmRlckFjdGlvbo6TZnVuZGluZ1N1YnNpZGlhcnlJZIxjdXN0b21lck5hbWWOkGludGVyZXN0UmF0ZVR5cGWOkGZsb2F0aW5nUmF0ZVR5cGWHZmVlVHlwZYp0ZXJtRGF0ZUF0hW5vdGVzhmFtb3VudIxpbnRlcmVzdFJhdGWJZmVlQW1vdW50jo53cml0ZU9mZkFtb3VudI6Tb3duZXJPcmdhbml6YXRpb25JZIljcmVhdGVkQnmJY3JlYXRlZEF0gmlki29yZGVyU3RhdHVziXVwZGF0ZWRBdIl1cGRhdGVkQnmNbG1zRGFpbHlFdmVudIlldmVudFR5cGWOj2FjY3J1ZWRJbnRlcmVzdIhtZXRhZGF0YYZ0eFRpbWWEdHhJZN4F0oqOzGFybjphd3M6cWxkYjp1cy1lYXN0LTE6MDEzMjQyMTkwMTg0OnN0cmVhbS9zdGctT3JkZXJzL0tyVTVjVXpkVzM1S0hQWkVjM043STSLjpBSRVZJU0lPTl9ERVRBSUxTjN4E7I3eoY6GT3JkZXJzj46WMGx1TFcwajl1ZUdFbXpqdzk2SllQbJDeBMSR3p2SjpYwckVOMWF4czdRakpDbkpZdVBDNVJikyIeC5SuoEnGxDFs1snWNPlHrXBWXbYf5CqIJf5S8RxsXKYGpNdPld4DuJaOpGEwNTlmN2M3LWMyMGYtNDk3OS05MWIwLTc5NmE0MjBlMThjNpeOpGRkNzQzYjFkLTAyYmEtNGM1Ny1iMmRkLTE0ZTM4ZmU1YTg1YpiOpDY5MGZhNDlmLWU5ZDUtNDhjNC1iNDliLWQ3MDFlN2E0YmU2YpkiAWiaijg1NzY1NDYyNjibIQGcjqRhNDRiZjliYy1jYzU0LTRiZTUtYjlkZi1kOTAxYzQ5MDM4MDGdikJyYXZvIFRpbGWeIJ8goIChgKKAoyMBhqCkSEAVAAAAAAAApSCmIKeOpDYwOGFjZTI0LWY1MGItNDdmYy04MzdkLWE4NDlmNWI4MzFhZKiOlmFkbWluQGFyY2FkaWFmdW5kcy5jb22pjpgyMDIzLTA0LTEzVDIwOjAxOjM0LjcwOFqqjpZBZDNBV2JaUDZ6dTlIRVRkTmc2cHBBqyECrI6YMjAyMy0wNC0xNVQxOTo1NjozNS4zNjFarY6Wb21hY2lAYXJjYWRpYWZ1bmRzLmNvba7esq8hA6mOmDIwMjMtMDQtMTVUMDQ6MDA6MDAuMDAwWrBIQC0qqqqqqqukSEAVAAAAAAAAsd7Cqo6WQWQzQVdiWlA2enU5SEVUZE5nNnBwQYUhLbJrgA/nhI+TuKLDAIKzjpYydXl4WjBZOHk3NDRuT09ydURodWJq5Zv539x9CYjfG+BB7dduxQ==";

const res = Buffer.from(data, 'base64')

const ionRecord = load(res); // Error is throw here

After talking to the ion-js team they mentioned the following is the root cause: amazon-ion/ion-js#753 (comment)

Your base64 data does not start with the Ion Version Marker (IVM), so your data is not Ion binary format. Therefore, the load function assumes that you are trying to provide UTF-8 encoded Ion text, which is why you get a UTF-8 encoding error.

This issue happens intermittently as well. Any help or insight would be greatly appreciated.

@battesonb
Copy link
Member

Could you please provide the strategy you are using to obtain the base64 blob? I assume you're using the aws-kcl package. It looks like you might have the Kinesis partition key and sequence number inside of that blob.

@ori-maci
Copy link
Author

ori-maci commented Apr 24, 2023

I am just using the ion-js library:

/**
 * A lambda that streams to Redshift from QLDB via Kinesis stream
 */
export const lambdaHandler: KinesisStreamHandler = async (event) => {
  const envStage = process.env.envStage ?? '';

  await initClient();
  await executeQuery(SQL_CODE);

  logger.info(`Records count for ${envStage}`, { count: event.Records.length });
  event.Records.forEach(async (record, index) => {
    logger.info('DEBUG: Received event:', JSON.stringify(event, null, 2));
    await processRecords([record.kinesis]);
  });

  logger.info(`DEBUG: Successfully processed ${event.Records.length} records.`);
};

export async function processRecords(records: KinesisStreamRecordPayload[]) {
  await Promise.all(
    records.map(async (record) => {
      // Kinesis data is base64 encoded so decode here
      let payload;
      let ionRecord;
      try {
        logger.info(`processRecords attempting to load ion record: ${record.data}`);
        payload = Buffer.from(record.data, 'base64');

        // payload is the actual ion binary record published by QLDB to the stream
        ionRecord = load(payload) as Value;
      } catch (error) {
        logger.error(`processRecords failed to load ion record error:${error}`);
        throw error;
      }

      logger.info(`ionRecord is ${JSON.stringify(ionRecord, null, 2)}`);

      const recordType = ionRecord?.get('recordType')?.stringValue();
      // Only process records where the record type is REVISION_DETAILS
      if (recordType !== REVISION_DETAILS) {
        logger.info(
          `processRecords Skipping record of type ${dumpPrettyText(
            recordType,
          )}`,
        );
        logger.info(`processRecords The other record is:`, ionRecord);
      } else {
        logger.info('processRecords The Ion Record is:', ionRecord);
        logger.info('processRecords Revision_details record found');
        await processIon(ionRecord);
      }
    }),
  );
}

Ok, so not using the aws-kcl library is the issue then?

@battesonb
Copy link
Member

battesonb commented Apr 24, 2023

I think so. Lambda is likely interpreting your processRecords function as a Lambda handler. So your records: KinesisStreamRecordPayload[] field is actually a Lambda event.

@ori-maci
Copy link
Author

oh take a look at the full code snippet now updated above @battesonb, I included the lambda handler that includes processRecords and you will see I am breaking down the lambda event to the raw record and trying to parse the data it has.

I couldn't find how kcl library would handle this any different

@battesonb
Copy link
Member

battesonb commented Apr 24, 2023

Are you using Kinesis Data Stream Record aggregations? The stream in the QLDB console will list whether it's enabled. There's an NPM package called aws-kinesis-agg for deaggregating batched results. This would explain why it's an intermittent issue.

You'd essentially just do the following:

import { deaggregateSync, UserRecord } from "aws-kinesis-agg";

// inside your handler
deaggregateSync(record.kinesis, true, (err, records) => {
  if (records) {
    processRecords(records); // Note the type change
  } // else handle the error
});

https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-deaggregation.html

Alternatively, you can turn off record aggregations.

@battesonb
Copy link
Member

The first few bytes of your buffer line up with the magic number specified in the repository: https://github.com/awslabs/kinesis-aggregation/blob/master/node/lib/common.js#L17

<Buffer f3 89 9a c2 ...

@ori-maci
Copy link
Author

@battesonb I see that I have this enabled. I will try your suggestion and get back to you.

I appreciate your help on this

@ori-maci
Copy link
Author

Ok looks like that was definitely the root cause. Thank you @battesonb!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants