Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NString value to JSON #341

Merged
merged 1 commit into from
Apr 10, 2024

Conversation

t3t5u
Copy link
Contributor

@t3t5u t3t5u commented Mar 13, 2024

Fix NString value to JSON same as StringColumnSetter.

@t3t5u t3t5u requested a review from a team as a code owner March 13, 2024 10:36
@t3t5u
Copy link
Contributor Author

t3t5u commented Mar 13, 2024

@dmikurube @hiroyuki-sato
Please review. 🙇‍♂️

@@ -61,6 +61,6 @@ public void timestampValue(final Instant v) throws IOException, SQLException
@Override
public void jsonValue(Value v) throws IOException, SQLException
{
defaultValue.setNString();
batch.setNString(v.toJson());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly for compatibility with the following changes.
https://github.com/embulk/embulk-output-jdbc/pull/320/files

@hiroyuki-sato
Copy link
Member

hiroyuki-sato commented Mar 13, 2024

Hello, @t3t5u

Thank you for creating this PR.
Could you explain why this PR is needed?
Did #320 cause a problem?

@hiroyuki-sato
Copy link
Member

hiroyuki-sato commented Mar 13, 2024

Hello, @t3t5u

Thank you for your comment.
LGTM👍

Just to double-check, please allow me to confirm.
Is my understanding correct?

This PR fixes if input data contains JSON data and uses column type nvarchar in a column_opions, write JSON string instead of null string.
This problem was introduced #320. (embulk-output-sqlserver: v0.10.3 )

  • input data: my_col_1: '{"sample":"json"}'

example config

out:
  type: sqlserver
  column_options:
    my_col_1: {type: 'NVARCHAR'}
  • expect: {"sample":"json"}
  • actual: null

@dmikurube, @hito4t Please take a look when you get a chance.

@dmikurube
Copy link
Member

@t3t5u Setting a JSON value into a stringified NString sounds basically more reasonable than setting a JSON value always into null NString, but what we are always concerned are about :

  1. compatibility "with earlier versions"
  2. (in case of JDBC plugins) unexpected impacts on other types of JDBC plugins (e.g. you showed an example with SQL Server, but how about MySQL, PostgreSQL, or any other possible JDBC plugins like Oracle, DB2, other third-party.)

Just making a small change can easily cause an incompatibility like you experienced in #320 as shown here. #320 looked compatible enough, but it was actually not.

Please confirm more with (try iterating over) possible combinations of input Embulk data types x output RDB data types x JDBC driver types.

@t3t5u
Copy link
Contributor Author

t3t5u commented Mar 13, 2024

Thank you for your response.

@hiroyuki-sato

This PR fixes if input data contains JSON data and uses column type nvarchar in a column_opions, write JSON string instead of null string.
This problem was introduced #320. (embulk-output-sqlserver: v0.10.3 )

This understanding is correct.

@dmikurube

Please confirm more with (try iterating over) possible combinations of input Embulk data types x output RDB data types x JDBC driver types.

Okay. But please give me some time.

@t3t5u
Copy link
Contributor Author

t3t5u commented Mar 27, 2024

@dmikurube @hiroyuki-sato

The following is the results of the confirmation.

First, NCHAR or NVARCHAR has no meaning in MySQL, PostgreSQL, Redshift, Oracle, or Db2, only in SQL Server.

Even if syntactically available, NCHAR or NVARCHAR is essentially Unicode-restricted CHAR or VARCHAR, and as far as I can find out, there is no type that maps to java.sql.Types.NCHAR, java.sql.Types.NVARCHAR, java.sql.Types.LONGNVARCHAR, or java.sql.Types.NCLOB on JDBC, like in SQL Server.

References:

And I have confirmed MySQL, PostgreSQL, Redshift, and SQL Server using the following scripts.

Please confirm the results.

@hiroyuki-sato
Copy link
Member

@t3t5u Thank you for your work. Can you show us the test results just in case?

I read this file change. but It doesn't contain test results.
https://github.com/trocco-io/embulk-output-jdbc/pull/9/files

@t3t5u
Copy link
Contributor Author

t3t5u commented Mar 29, 2024

@hiroyuki-sato

Added raw result logs (credentials are masked).
trocco-io@3d57644

Thank you.

@hiroyuki-sato
Copy link
Member

hiroyuki-sato commented Mar 29, 2024

Hello, @t3t5u Thank you for your work.

Questions

  • What version set null value when column_options set to value_type: nstring? (I thought Use NVARCHAR(max) for CLOB, instead of TEXT, when creating a table in SQLServer #320. But It seems more earlier version.)
    • It means I would like to know which
    • value_type: nstring have not been working properly since the first version.
    • value_type: nstring workd at once, but It breaked compatibility commit ???
  • Do we need to check other json values like below? (Just question.)
    • "test_json_nstring": null
    • "test_json_nstring": 12345
    • "test_json_nstring": 123.45
    • "test_json_nstring": "hoge"
  • What do you think to add this test code to this project? (ex. MySQL and PostgreSQL)?

I confirmed that all of test results are complete the same this description.
trocco-io#9 (comment)

Input

input data1

column value
test_boolean true
test_long 123
test_double 1.23
test_string あいうえお
test_timestamp 1999-12-31 23:59:59.000000 +0000
test_json { "キー": "値" }
test_json_text { "キー": "値" }
test_json_string { "キー": "値" }
test_json_nstring { "キー": "値" }

input data2

column value
test_boolean" false
test_long" 456
test_double" 4.56
test_string" かきくけこ
test_timestamp" 2000-01-01 00:00:00.000000 +0000
test_json [{ "キー1": "値1"},{"キー2": "値2"}]
test_json_text [{ "キー1": "値1"},{"キー2": "値2"}]
test_json_string [{ "キー1": "値1"},{"キー2": "値2"}]
test_json_nstring [{ "キー1": "値1"},{"キー2": "値2"}]
column embulk type column_options(MySQL,PostgreSQL) Redshift SQL Server
test_boolean" boolean
test_long" long
test_double" double
test_string" string
test_timestamp" timestamp (TIMESTAMPZ(PG),DATETIME(MySQL)) {type: TIMESTAMPZ } {type: DATETIME }
test_json json
test_json_text json {type: TEXT} {type: TEXT} {type: TEXT }
test_json_string json {type: TEXT, value_type: string } {type: 'VARCHAR(65535)', value_type: string} {type: 'VARCHAR(max)', value_type: string }
test_json_nstring json {type: TEXT, value_type: nstring } {type: 'VARCHAR(65535)', value_type: nstring} {type: 'VARCHAR(max)', value_type: nstring }

Test MySQL(default,8.3.0), PostgreSQL(default,42.7.3) and Redshift

input data1

column Before Before Comment
test_boolean true true
test_long 123 123
test_double 1.23 1.23
test_string あいうえお あいうえお
test_timestamp 1999-12-31 23:59:59.000000 +0000 1999-12-31 23:59:59.000000 +0000
test_json { "キー": "値" } { "キー": "値" }
test_json_text { "キー": "値" } { "キー": "値" }
test_json_string { "キー": "値" } { "キー": "値" }
test_json_nstring null { "キー": "値" } FIXED THIS

input data2

column value value comment
test_boolean" false false
test_long" 456 456
test_double" 4.56 4.56
test_string" かきくけこ かきくけこ
test_timestamp" 2000-01-01 00:00:00.000000 +0000 2000-01-01 00:00:00.000000 +0000
test_json [{ "キー1": "値1"},{"キー2": "値2"}] [{ "キー1": "値1"},{"キー2": "値2"}]
test_json_text [{ "キー1": "値1"},{"キー2": "値2"}] [{ "キー1": "値1"},{"キー2": "値2"}]
test_json_string [{ "キー1": "値1"},{"キー2": "値2"}] [{ "キー1": "値1"},{"キー2": "値2"}]
test_json_nstring null [{ "キー1": "値1"},{"キー2": "値2"}] FIXED THIS

SQL Server

input data1

column Before Before COMMENT
test_boolean true true
test_long 123 123
test_double 1.23 1.23
test_string あいうえお あいうえお
test_timestamp 1999-12-31 23:59:59.000000 +0000 1999-12-31 23:59:59.000000 +0000
test_json null { "キー": "値" } FIXED THIS
test_json_text { "??": "?" } { "??": "?" } Multibyte strings are garbled
test_json_string { "??": "?" } { "??": "?" } Multibyte strings are garbled
test_json_nstring null { "キー": "値" } FIXED THIS

input data2

column value value COMMENT
test_boolean" false false
test_long" 456 456
test_double" 4.56 4.56
test_string" かきくけこ かきくけこ
test_timestamp" 2000-01-01 00:00:00.000000 +0000 2000-01-01 00:00:00.000000 +0000
test_json null [{ "キー1": "値1"},{"キー2": "値2"}] FIXED THIS
test_json_text [{ "??1": "?1"},{"??2": "?2"}] [{ "??1": "?1"},{"??2": "?2"}] Multibyte strings are garbled
test_json_string [{ "??1": "?1"},{"??2": "?2"}] [{ "??1": "?1"},{"??2": "?2"}] Multibyte strings are garbled
test_json_nstring null [{ "キー1": "値1"},{"キー2": "値2"}] FIXED THIS

result.tgz

@t3t5u
Copy link
Contributor Author

t3t5u commented Apr 1, 2024

@hiroyuki-sato

What version set null value when column_options set to value_type: nstring?

Since early versions, if value_type is explicitly set to nstring, json columns have been null in all JDBC plugins.

Before #320, if value_type was not set, there was no type that mapped to nstring by default.
After #320, even if value_type is not set, json (and string) columns are now mapped to nstring by default in sqlserver.

Do we need to check other json values like below?

Updated test scripts & logs.

In summary, json with nstring will always be null before the fix, and will only be null if the input is null after the fix.

What do you think to add this test code to this project?

I don't think this scrips are suitable for use in CI.

  • Need to prepare the environment.
  • Results need to be visually confirmed.

I think it should only be used as a reference.

@hiroyuki-sato
Copy link
Member

Hello, @t3t5u

I confirmed this fix looking good for me. 👍
I have a confirm about versioning. (What next version number)

@dmikurube , @hito4t Please take a look when you get a chance.

Inptut data

input data1

column value
test_boolean null
test_long null
test_double null
test_string null
test_timestamp null
test_json null
test_json_text null
test_json_string null
test_json_nstring null

input data2

column value
test_boolean null
test_long null
test_double null
test_string null
test_timestamp null
test_json 12345
test_json_text 12345
test_json_string 12345
test_json_nstring 12345

input data3

column value
test_boolean null
test_long null
test_double null
test_string null
test_timestamp null
test_json 123.45
test_json_text 123.45
test_json_string 123.45
test_json_nstring 123.45

input data4

column value
test_boolean null
test_long null
test_double null
test_string null
test_timestamp null
test_json hoge
test_json_text hoge
test_json_string hoge
test_json_nstring hoge

MySQL, PostgreSQL, Redshift

column input before after
test_boolean null null null
test_long null null null
test_double null null null
test_string null null null
test_timestamp null null null
test_json null null null
test_json_text null null null
test_json_string null null null
test_json_nstring null null null

input data2

column input before after
test_boolean null null null
test_long null null null
test_double null null null
test_string null null null
test_timestamp null null null
test_json 12345 12345 12345
test_json_text 12345 12345 12345
test_json_string 12345 12345 12345
test_json_nstring 12345 null 12345 FIX THIS

input data3

column input before after
test_boolean null null null
test_long null null null
test_double null null null
test_string null null null
test_timestamp null null null
test_json 123.45 123.45 123.45
test_json_text 123.45 123.45 123.45
test_json_string 123.45 123.45 123.45
test_json_nstring 123.45 null 123.45 FIX THIS

input data4

column value value value
test_boolean null null null
test_long null null null
test_double null null null
test_string null null null
test_timestamp null null null
test_json hoge hoge hoge
test_json_text hoge hoge hoge
test_json_string hoge hoge hoge
test_json_nstring hoge null hoge FIX THIS

SQL Server

column input before after
test_boolean null null null
test_long null null null
test_double null null null
test_string null null null
test_timestamp null null null
test_json null null null
test_json_text null null null
test_json_string null null null
test_json_nstring null null null

input data2

column input before after
test_boolean null null null
test_long null null null
test_double null null null
test_string null null null
test_timestamp null null null
test_json 12345 null 12345 FIX THIS
test_json_text 12345 12345 12345
test_json_string 12345 12345 12345
test_json_nstring 12345 null 12345 FIX THIS

input data3

column input before after
test_boolean null null null
test_long null null null
test_double null null null
test_string null null null
test_timestamp null null null
test_json 123.45 null 123.45 FIX THIS
test_json_text 123.45 123.45 123.45
test_json_string 123.45 123.45 123.45
test_json_nstring 123.45 null 123.45 FIX THIS

input data4

column value value value
test_boolean null null null
test_long null null null
test_double null null null
test_string null null null
test_timestamp null null null
test_json hoge null hoge FIX THIS
test_json_text hoge hoge hoge
test_json_string hoge hoge hoge
test_json_nstring hoge null hoge FIX THIS

Copy link
Member

@dmikurube dmikurube left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to be late, and thanks for the great investigations, @t3t5u, and summarizing them, also @hiroyuki-sato !

Finally it looks good to me! Let me merge this, and release a next version. (It'd be simply v0.10.6 as it's a "fix".)

Cc'ing: @hito4t, but I guess he would be okay with it.


Apart from this change by itself, we're still so happy if you could consider contributing :

  1. These tests into this repo somehow
    • No need to be right now, which can be in another pull request, weeks/months later
    • No need to be runnable as-is in CI immediately, but just manual test script(s) would help
    • Just the idea to test it is still helpful for later
  2. A quick document about the relations of types that you confirmed
    • Not just in this pull request discussion, but nice to have it as a part of this repo
    • Such a document would be in Markdown under embulk-output-jdbc/docs/
    • If you have any hesitation about, for example, ex. documenting, English, format, ..., I could help about that

The JDBC plugins are much historical, and we have just very little information about these. Any tests and documents would help!

@dmikurube dmikurube added this to the v0.10.6 milestone Apr 10, 2024
@dmikurube
Copy link
Member

Merging...

@dmikurube dmikurube merged commit e013f37 into embulk:master Apr 10, 2024
4 checks passed
@hito4t
Copy link
Contributor

hito4t commented May 14, 2024

I also think the new specification is desirable. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants