Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unveiling the Full Dataset Structure: Leveraging platform_sdk.dataset_reader in AEPP #12

Open
yoyo6022 opened this issue Jun 28, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@yoyo6022
Copy link

yoyo6022 commented Jun 28, 2023

To enhance our understanding of the dataset's structure, I propose making the platform_sdk.dataset_reader accessible. This will enable us to unpack the entire dataset and view it comprehensively, including the nested fields. Currently, the AEPP supports data loading through the queryservice module by specifying a SQL query, which loads the data into a pandas dataframe. However, each column in the dataframe only represents the first hierarchy of the nested object in the schema, unless we manually unpack a certain object in the query. For example: "select web.* from table_abc" will give us the fields nested in the second layer under "web" object.

By utilizing the platform_sdk.dataset_reader, we can effortlessly load the data with its nested fields unpacked, resulting in a more extensive perspective of the dataset. This approach enables us to grasp a clearer understanding of the data's structure by having access to all the fields it contains. Furthermore, it enhances the efficiency of querying and data processing, data manipulation since we no longer need to manually unpack individual object and the value won't be nested for each field.

Example of using SDK dataset reader, automatically unpack all the nested fields under "web" object.
image

@pitchmuc pitchmuc added the enhancement New feature or request label Jun 29, 2023
@pitchmuc
Copy link
Contributor

Thanks for bringing the idea @yoyo6022.
We will consider it for the future development.
FYI: The SDK dataset reader and this library are 2 different projects working in different environment and connecting to different sources.
I do not mean it is not doable, but it is not as easy as it may sound.

@pitchmuc
Copy link
Contributor

pitchmuc commented Dec 4, 2023

Hello @yoyo6022
I am coming back to that.
Have you checked the latest version of aepp, and especially the SchemaManager part ?
It is not as efficient that the SDK reader because it will not provide the values in the fields, but there is a way to flatten schema structure and work with the field path to use query service more efficiently.

Here is the simple documentation : https://github.com/adobe/aepp/blob/main/docs/schema.md#schemamanager

We will need to work on more documentation in the future but if you are familiar with python and notebooks, you may be able to learn by playing with it as all of the docstring are provided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants