DynamoDB load always uses full scan instead of the specified global secondary index in Python #101

anujpareek · 2021-07-09T04:18:36Z

In Python I'm are trying to read DynamoDB with a global secondary index with a provided schema and filters. It's a very large dynamo table and it takes approximately 4 hrs to do a full table scan. We've created a global secondary index to improve the performance. However we are not sure if this library supports this functionality. Or perhaps we are using it incorrectly. Currently we are using the following code to do a full scan. I tried adding the commented out line to use the index but that didn't work and couldn't find any examples of this.

        dynamo_df = spark.read.schema(table_schema) \
        .option("tableName", "table") \
        // .option("indexName", "x-y-global-secondary-index") \
        .option("region", region) \
        .option("throughput", 2500) \
        .format("dynamodb") \
        .load()

filtered_df = dynamo_df.filter((dynamo_df.x == x ) & (dynamo_df.y > y)

Appreciate the help!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DynamoDB load always uses full scan instead of the specified global secondary index in Python #101

DynamoDB load always uses full scan instead of the specified global secondary index in Python #101

anujpareek commented Jul 9, 2021 •

edited

Loading

DynamoDB load always uses full scan instead of the specified global secondary index in Python #101

DynamoDB load always uses full scan instead of the specified global secondary index in Python #101

Comments

anujpareek commented Jul 9, 2021 • edited Loading

anujpareek commented Jul 9, 2021 •

edited

Loading