QLDB streaming to Kinesis sends invalid ION data #405

ori-maci opened this issue Apr 24, 2023 · 8 comments

I am currently working with QLDB -> Kinesis stream -> AWS lambda (node.js 18+) and I am encountering the following when loading an ion record in the lambda:

"errorType": "Runtime.UnhandledPromiseRejection",
    "errorMessage": "TypeError: The encoded data was not valid for encoding utf-8",
    "reason": {
        "errorType": "TypeError",
        "errorMessage": "The encoded data was not valid for encoding utf-8",

Package versions are:

    "ion-js": "^4.3.0",
    "jsbi": "^4.3.0"

Here is an example code to reproduce this issue:

import { load } from 'ion-js';

const data = "84mawgpANDk2ZTVmYzM4NmUwZjU3ZTQ1MmFjM2U0OWI0NWE1NDUwYjU5MGQ1OWI1ZjI2Mzg3MDQyYzBhMTEzNzVhYTQ4MQpANDljNmM0MzE2Y2Q2YzlkNjM0Zjk0N2FkNzA1NjVkYjYxZmU0MmE4ODI1ZmU1MmYxMWM2YzVjYTYwNmE0ZDc0ZhrhCAgAGtwI4AEA6u4Cu4GD3gK2h74Cso1xbGRiU3RyZWFtQXJuinJlY29yZFR5cGWHcGF5bG9hZIxibG9ja0FkZHJlc3OIc3RyYW5kSWSKc2VxdWVuY2VOb410cmFuc2FjdGlvbklkjo5ibG9ja1RpbWVzdGFtcIlibG9ja0hhc2iLZW50cmllc0hhc2iOkXByZXZpb3VzQmxvY2tIYXNojo9lbnRyaWVzSGFzaExpc3SOj3RyYW5zYWN0aW9uSW5mb4pzdGF0ZW1lbnRziXN0YXRlbWVudIlzdGFydFRpbWWOj3N0YXRlbWVudERpZ2VzdIlkb2N1bWVudHOOlkFkM0FXYlpQNnp1OUhFVGROZzZwcEGJdGFibGVOYW1lh3RhYmxlSWSOkXJldmlzaW9uU3VtbWFyaWVzhGhhc2iKZG9jdW1lbnRJZN4Gl4qOzGFybjphd3M6cWxkYjp1cy1lYXN0LTE6MDEzMjQyMTkwMTg0OnN0cmVhbS9zdGctT3JkZXJzL0tyVTVjVXpkVzM1S0hQWkVjM043STSLjUJMT0NLX1NVTU1BUlmM3gW1jd6djo6WMHJFTjFheHM3UWpKQ25KWXVQQzVSYo8iHguQjpYydXl4WjBZOHk3NDRuT09ydURodWJqkWuAD+eEj5O4osMAhJKuoEluX8OG4PV+RSrD5JtFpUULWQ1ZtfJjhwQsChE3WqSBk66gqfuETMGDyOUE0M6bw5Xy5LZrR6g/Pg3yw7rLHAgTNvmUrqDEt5D5RfCLqxqoN9aYlp53vM43fzQ11k8jdmeUO3fbuJW+AYiuoCIpiZTEUHJR54AxsdM/V6dZXNbY/3behh9NlifFnzxVrqBJxsQxbNbJ1jT5R61wVl22H+QqiCX+UvEcbFymBqTXT66gNM1SpPLZqpH3L2W+RiST6lCt/1CXHpwnu09d7WnTWzSuoEncf6mN7E2qxjhSlH0vwaEgcI3+osU2iA0O5mJ8tDjvlt4CtZe+Aobe2JiOplNFTEVDVCAqIGZyb20gT3JkZXJzIFdIRVJFIG9yZGVySWQgPSA/mWqAD+eEj5O4osMtmq6gJCAdJXhOXYA4eNcQSo7MnuvZHel3OYTzj2e9tSmqptreAamYjvdVUERBVEUgT3JkZXJzIGFzIHQgQlkgaWQgU0VUIHQub3JkZXJTdGF0dXMgPSA/LCB0LnVwZGF0ZWRBdCA9ID8sIHQudXBkYXRlZEJ5ID0gPywgdC5sbXNEYWlseUV2ZW50ID0gPyBXSEVSRSBvcmRlcklkID0gP5lqgA/nhI+TuKLDXJquoL2d2K96cDxjFa3q4DjahDo322blvdjH8Y4p+zzSgFqEm96onN6lnYZPcmRlcnOejpYwbHVMVzBqOXVlR0Vtemp3OTZKWVBsl7IhAZ++vt68oK6gScbEMWzWydY0+UetcFZdth/kKogl/lLxHGxcpgak10+hjpZBZDNBV2JaUDZ6dTlIRVRkTmc2cHBBGsgJCAEawwngAQDq7gPngYPeA+KHvgPejXFsZGJTdHJlYW1Bcm6KcmVjb3JkVHlwZYdwYXlsb2FkiXRhYmxlSW5mb4l0YWJsZU5hbWWHdGFibGVJZIhyZXZpc2lvboxibG9ja0FkZHJlc3OIc3RyYW5kSWSKc2VxdWVuY2VOb4RoYXNohGRhdGGHb3JkZXJJZIlwcm9ncmFtSWSOjm9yZ2FuaXphdGlvbklkjpZsb2FuWWVhcmx5RHVyYXRpb25EYXlzjo5leHRlcm5hbExvYW5JZItvcmRlckFjdGlvbo6TZnVuZGluZ1N1YnNpZGlhcnlJZIxjdXN0b21lck5hbWWOkGludGVyZXN0UmF0ZVR5cGWOkGZsb2F0aW5nUmF0ZVR5cGWHZmVlVHlwZYp0ZXJtRGF0ZUF0hW5vdGVzhmFtb3VudIxpbnRlcmVzdFJhdGWJZmVlQW1vdW50jo53cml0ZU9mZkFtb3VudI6Tb3duZXJPcmdhbml6YXRpb25JZIljcmVhdGVkQnmJY3JlYXRlZEF0gmlki29yZGVyU3RhdHVziXVwZGF0ZWRBdIl1cGRhdGVkQnmNbG1zRGFpbHlFdmVudIlldmVudFR5cGWOj2FjY3J1ZWRJbnRlcmVzdIhtZXRhZGF0YYZ0eFRpbWWEdHhJZN4F0oqOzGFybjphd3M6cWxkYjp1cy1lYXN0LTE6MDEzMjQyMTkwMTg0OnN0cmVhbS9zdGctT3JkZXJzL0tyVTVjVXpkVzM1S0hQWkVjM043STSLjpBSRVZJU0lPTl9ERVRBSUxTjN4E7I3eoY6GT3JkZXJzj46WMGx1TFcwajl1ZUdFbXpqdzk2SllQbJDeBMSR3p2SjpYwckVOMWF4czdRakpDbkpZdVBDNVJikyIeC5SuoEnGxDFs1snWNPlHrXBWXbYf5CqIJf5S8RxsXKYGpNdPld4DuJaOpGEwNTlmN2M3LWMyMGYtNDk3OS05MWIwLTc5NmE0MjBlMThjNpeOpGRkNzQzYjFkLTAyYmEtNGM1Ny1iMmRkLTE0ZTM4ZmU1YTg1YpiOpDY5MGZhNDlmLWU5ZDUtNDhjNC1iNDliLWQ3MDFlN2E0YmU2YpkiAWiaijg1NzY1NDYyNjibIQGcjqRhNDRiZjliYy1jYzU0LTRiZTUtYjlkZi1kOTAxYzQ5MDM4MDGdikJyYXZvIFRpbGWeIJ8goIChgKKAoyMBhqCkSEAVAAAAAAAApSCmIKeOpDYwOGFjZTI0LWY1MGItNDdmYy04MzdkLWE4NDlmNWI4MzFhZKiOlmFkbWluQGFyY2FkaWFmdW5kcy5jb22pjpgyMDIzLTA0LTEzVDIwOjAxOjM0LjcwOFqqjpZBZDNBV2JaUDZ6dTlIRVRkTmc2cHBBqyECrI6YMjAyMy0wNC0xNVQxOTo1NjozNS4zNjFarY6Wb21hY2lAYXJjYWRpYWZ1bmRzLmNvba7esq8hA6mOmDIwMjMtMDQtMTVUMDQ6MDA6MDAuMDAwWrBIQC0qqqqqqqukSEAVAAAAAAAAsd7Cqo6WQWQzQVdiWlA2enU5SEVUZE5nNnBwQYUhLbJrgA/nhI+TuKLDAIKzjpYydXl4WjBZOHk3NDRuT09ydURodWJq5Zv539x9CYjfG+BB7dduxQ==";

const res = Buffer.from(data, 'base64')

const ionRecord = load(res); // Error is throw here

After talking to the ion-js team they mentioned the following is the root cause: amazon-ion/ion-js#753 (comment)

Your base64 data does not start with the Ion Version Marker (IVM), so your data is not Ion binary format. Therefore, the load function assumes that you are trying to provide UTF-8 encoded Ion text, which is why you get a UTF-8 encoding error.

This issue happens intermittently as well. Any help or insight would be greatly appreciated.

Could you please provide the strategy you are using to obtain the base64 blob? I assume you're using the aws-kcl package. It looks like you might have the Kinesis partition key and sequence number inside of that blob.

ori-maci commented Apr 24, 2023

I am just using the ion-js library:

 * A lambda that streams to Redshift from QLDB via Kinesis stream
export const lambdaHandler: KinesisStreamHandler = async (event) => {
  const envStage = process.env.envStage ?? '';

  await initClient();
  await executeQuery(SQL_CODE);`Records count for ${envStage}`, { count: event.Records.length });
  event.Records.forEach(async (record, index) => {'DEBUG: Received event:', JSON.stringify(event, null, 2));
    await processRecords([record.kinesis]);
  });`DEBUG: Successfully processed ${event.Records.length} records.`);

export async function processRecords(records: KinesisStreamRecordPayload[]) {
  await Promise.all( (record) => {
      // Kinesis data is base64 encoded so decode here
      let payload;
      let ionRecord;
      try {`processRecords attempting to load ion record: ${}`);
        payload = Buffer.from(, 'base64');

        // payload is the actual ion binary record published by QLDB to the stream
        ionRecord = load(payload) as Value;
      } catch (error) {
        logger.error(`processRecords failed to load ion record error:${error}`);
        throw error;
      }`ionRecord is ${JSON.stringify(ionRecord, null, 2)}`);

      const recordType = ionRecord?.get('recordType')?.stringValue();
      // Only process records where the record type is REVISION_DETAILS
      if (recordType !== REVISION_DETAILS) {
          `processRecords Skipping record of type ${dumpPrettyText(
        );`processRecords The other record is:`, ionRecord);
      } else {'processRecords The Ion Record is:', ionRecord);'processRecords Revision_details record found');
        await processIon(ionRecord);

Ok, so not using the aws-kcl library is the issue then?

battesonb commented Apr 24, 2023

I think so. Lambda is likely interpreting your processRecords function as a Lambda handler. So your records: KinesisStreamRecordPayload[] field is actually a Lambda event.

oh take a look at the full code snippet now updated above @battesonb, I included the lambda handler that includes processRecords and you will see I am breaking down the lambda event to the raw record and trying to parse the data it has.

I couldn't find how kcl library would handle this any different

battesonb commented Apr 24, 2023

Are you using Kinesis Data Stream Record aggregations? The stream in the QLDB console will list whether it's enabled. There's an NPM package called aws-kinesis-agg for deaggregating batched results. This would explain why it's an intermittent issue.

You'd essentially just do the following:

import { deaggregateSync, UserRecord } from "aws-kinesis-agg";

// inside your handler
deaggregateSync(record.kinesis, true, (err, records) => {
  if (records) {
    processRecords(records); // Note the type change
  } // else handle the error

Alternatively, you can turn off record aggregations.

The first few bytes of your buffer line up with the magic number specified in the repository:

<Buffer f3 89 9a c2 ...

Copy link

@battesonb I see that I have this enabled. I will try your suggestion and get back to you.

I appreciate your help on this

Ok looks like that was definitely the root cause. Thank you @battesonb!

