-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unveiling the Full Dataset Structure: Leveraging platform_sdk.dataset_reader in AEPP #12
Comments
Thanks for bringing the idea @yoyo6022. |
Hello @yoyo6022 Here is the simple documentation : https://github.com/adobe/aepp/blob/main/docs/schema.md#schemamanager We will need to work on more documentation in the future but if you are familiar with python and notebooks, you may be able to learn by playing with it as all of the docstring are provided. |
To enhance our understanding of the dataset's structure, I propose making the platform_sdk.dataset_reader accessible. This will enable us to unpack the entire dataset and view it comprehensively, including the nested fields. Currently, the AEPP supports data loading through the queryservice module by specifying a SQL query, which loads the data into a pandas dataframe. However, each column in the dataframe only represents the first hierarchy of the nested object in the schema, unless we manually unpack a certain object in the query. For example: "select web.* from table_abc" will give us the fields nested in the second layer under "web" object.
By utilizing the platform_sdk.dataset_reader, we can effortlessly load the data with its nested fields unpacked, resulting in a more extensive perspective of the dataset. This approach enables us to grasp a clearer understanding of the data's structure by having access to all the fields it contains. Furthermore, it enhances the efficiency of querying and data processing, data manipulation since we no longer need to manually unpack individual object and the value won't be nested for each field.
Example of using SDK dataset reader, automatically unpack all the nested fields under "web" object.
The text was updated successfully, but these errors were encountered: