Provide option to not flatten json data #29

mpcarter · 2020-09-30T12:48:41Z

Problem

Data from sources like MongoDb may not always follow a consistent schema. So the columns generated by the flattening of json data may not always be consistent over time. This creates some problems if you are trying to maintain a long term pipeline.

Proposed changes

Add a configuration option to control flattening behavior. Default to True to maintain original behavior.

Types of changes

What types of changes does your code introduce to PipelineWise?
Put an x in the boxes that apply

Bugfix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation Update (if none of the other choices apply)

Checklist

Description above provides context of the change
I have added tests that prove my fix is effective or that my feature works
Unit tests for changes (not needed for documentation changes)
CI checks pass with my changes
Bumping version in setup.py is an individual PR and not mixed with feature or bugfix PRs
Commit message/PR title starts with [AP-NNNN] (if applicable. AP-NNNN = JIRA ID)
Branch name starts with AP-NNN (if applicable. AP-NNN = JIRA ID)
Commits follow "How to write a good git commit message"
Relevant documentation is updated including usage instructions

koszti · 2020-10-01T08:03:35Z

This is cool but we should keep newly added functionalities to be in sync with other pipelinewise targets. pipelinewise-target-snowflake and pipelinewise-target-redshift implements the data_flattening_max_level option and we should implement the same parameter rather than implementing a new one for similar purpose:

data_flattening_max_level (from target-snowflake):

(Default: 0) Object type RECORD items from taps can be loaded into VARIANT columns as JSON (default) or we can flatten the schema by creating columns automatically.

When value is 0 (default) then flattening functionality is turned off.

I think what you're looking for is data_flattening_max_level: 0

Would that be possible to implement the same option so all targets will use the the same parameters?
You can see how it's implemented by searching data_flattening_max_level in target_snowflake/db_sync.py. We can also implement similar unit tests from target_snowflake/tests/unit/test_db_sync

Let me know your thoughts. 🙇

mpcarter · 2020-10-02T14:49:18Z

Makes sense to me. I'll see if I can modify to model based on the snowflake one. Given that the current default behavior on target s3 is to flatten, do you think that should still be the case since default for snowflake is to not flatten? Not sure what default level should be used otherwise.

Changing default behavior might unintended effect on anyone already using target s3.

koszti · 2020-10-08T19:46:59Z

good question. If people want to use this tap via the main pipelinewise then we can set the default data flattening level automatically to a certain level by changing this line in tap_properties.py and we can keep the current behaviour.

But what should be the default level? There is no option for unlimit level of data flattening. Should we set it to just a "big enough" number like 100? What do yo think, do you have any other better idea maybe?

mpcarter added 2 commits September 30, 2020 03:57

Provide option to not flatten json data

a0eaff2

Provide option to not flatten json data

9bf4961

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide option to not flatten json data #29

Provide option to not flatten json data #29

mpcarter commented Sep 30, 2020

koszti commented Oct 1, 2020 •

edited

Loading

mpcarter commented Oct 2, 2020

koszti commented Oct 8, 2020 •

edited

Loading

Provide option to not flatten json data #29

Are you sure you want to change the base?

Provide option to not flatten json data #29

Conversation

mpcarter commented Sep 30, 2020

Problem

Proposed changes

Types of changes

Checklist

koszti commented Oct 1, 2020 • edited Loading

mpcarter commented Oct 2, 2020

koszti commented Oct 8, 2020 • edited Loading

koszti commented Oct 1, 2020 •

edited

Loading

koszti commented Oct 8, 2020 •

edited

Loading