Skip to content
This repository has been archived by the owner on Aug 31, 2021. It is now read-only.

Incomplete schema inference while reading from DynamoDB table #90

Open
siah210 opened this issue Jan 21, 2021 · 3 comments
Open

Incomplete schema inference while reading from DynamoDB table #90

siah210 opened this issue Jan 21, 2021 · 3 comments

Comments

@siah210
Copy link

siah210 commented Jan 21, 2021

DynamoDB Table:
Screenshot 2021-01-21 at 1 30 54 PM

I am reading the above table using the following code:

spark.read
        .option("tableName", config.tableName)
        .option("region", config.ddbConfig.region)
        .format("dynamodb")
        .load()
df.show()

Result:
|s_id| created_on|p_id|
+----+-------------------+----+
| 002|2018-11-20 12:01:19| 2|
| 001|2018-11-19 12:01:19| 1|
| 006|2018-11-20 12:01:19| 6|
| 005|2018-11-19 12:01:20| 5|
| 004|2018-12-19 12:01:19| 4|
| 003|2019-11-19 12:01:19| 3|

The "num" column was missing from the df. Why did this happen? Is there any flag which I need to set to ensure complete schema inference?

@Aniruddha-2016
Copy link

Aniruddha-2016 commented Feb 8, 2021

You can pass userSchema option and your schema along with that otherwise it creates schema from the data on first page of dynamodb table.

@siah210
Copy link
Author

siah210 commented Feb 22, 2021

thanks! this helps.

This library returned an empty dataframe when I tried to read a DDB table with both range key and hash key. Is this a known behaviour?

@phitotient
Copy link

@siah210 you should pass schema with .schema() parameter just like you do to normal DF that would work.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

3 participants