Add support for patternProperties #26

pcorbel · 2019-06-24T08:55:54Z

Issue:
If a tap have some fields not explicitly declared, but declared in a

"patternProperties": {
  ".+": {}
}

block, they won't be loaded into Redshift.
While, when using another target (like target-csv), the fields are available.

How to reproduce:

data.jsonl.txt
( a file with a schema and an example record)
issues.csv.txt
A CSV generated by the following command
cat data.jsonl | target-csv

Version:
Python 3.7.3
target-csv==0.3.0
target-redshift==0.0.7

Documentation:
The link to the target-csv flatten function

The text was updated successfully, but these errors were encountered:

AlexanderMann · 2019-06-24T13:55:38Z

@pcorbel so I took a look at things in tap-csv and the way I think it actually works is that it simply takes the first record for a stream, and then uses that records keys for the headers and subsequent keys for all future records.

Since it's not doing any batching, it doesn't have any ability to add new keys as they come up.

So for instance, if you have:

{"patternProperties": {
  "a.+": {type: int}
}}

{a: 123}
{a_1: 456}
{a_2: 789}
...

You'll get:

a
123
NULL
NULL

All records will pass validation, but you'll still effectively lose the data in a_1 and a_2.

Is this the same as your experience and is this how you were expecting the support herein to work?

pcorbel · 2019-06-24T14:50:35Z

@AlexanderMann With the taps I work with, all keys are always represented, so I'll always have

{a: 123, a_1: None, a_2: None}
{a: None, a_1: 456, a_2: None}
{a: None, a_1: None, a_2: 789}

and I think a lot of API/taps are working that way.

I think it would still be perfectible but it would be an enormous improvement to implement that like the CSV target.

pcorbel · 2019-07-23T14:34:51Z

@AlexanderMann Hello Alexander, could you give me some pointers to where to begin to implement it target-csv way?

AlexanderMann · 2019-07-31T15:13:21Z

Hey @pcorbel. So there's an issue over in datamill-co/target-postgres#129 which deals with a similar issue you're describing here.

I'm thinking that your option of using target-csv and simply taking the fields off of the first record is a good alternative option.

Since this repo depends on target-postgres pretty heavily, whatever solution we come up with for target-redshift will most likely effect target-postgres.

(also, sorry about the tardy reply, vacation and then wrapping up a project with a client)

AlexanderMann · 2019-07-31T15:52:06Z

Also, @pcorbel forgot to ask, but what taps are you using which are leveraging patternProperties?

AlexanderMann · 2019-07-31T16:00:18Z

Also, @pcorbel forgot to ask, but what taps are you using which are leveraging patternProperties?

pcorbel · 2019-08-05T08:16:46Z

@AlexanderMann I was using tap-jira and most of the fields in an issue was discarded because they were in a patternProperties field

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for patternProperties #26

Add support for patternProperties #26

pcorbel commented Jun 24, 2019 •

edited

Loading

AlexanderMann commented Jun 24, 2019

pcorbel commented Jun 24, 2019

pcorbel commented Jul 23, 2019

AlexanderMann commented Jul 31, 2019

AlexanderMann commented Jul 31, 2019

AlexanderMann commented Jul 31, 2019

pcorbel commented Aug 5, 2019

Add support for patternProperties #26

Add support for patternProperties #26

Comments

pcorbel commented Jun 24, 2019 • edited Loading

AlexanderMann commented Jun 24, 2019

pcorbel commented Jun 24, 2019

pcorbel commented Jul 23, 2019

AlexanderMann commented Jul 31, 2019

AlexanderMann commented Jul 31, 2019

AlexanderMann commented Jul 31, 2019

pcorbel commented Aug 5, 2019

pcorbel commented Jun 24, 2019 •

edited

Loading