SNOW-1871175: Add support for specifying a schema string for `DataFrame.create_dataframe` #2828

sfc-gh-jdu · 2025-01-07T06:05:27Z

Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

Fixes SNOW-1871175
Fill out the following pre-review checklist:
- I am adding a new automated test(s) to verify correctness of my new code
  - If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
- I am adding new logging messages
- I am adding a new telemetry message
- I am adding new credentials
- I am adding a new dependency
- If this is a new feature/behavior, I'm adding the Local Testing parity changes.
- I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
Please describe how your code solves the related issue.

Support schema string

sfc-gh-aling · 2025-01-09T22:36:05Z

src/snowflake/snowpark/_internal/type_utils.py

+            bracket_depth += 1
+            # we don't store the opening bracket in 'inside_chars'
+            # if bracket_depth was 0 -> 1, to skip the outer bracket
+            if bracket_depth > 1:
+                inside_chars.append(c)


is this form allowed array<<<...>>>?

yea for something like array<array<...>>, added a comment here

sfc-gh-aling · 2025-01-09T22:43:24Z

src/snowflake/snowpark/_internal/type_utils.py

+    for field_def in field_defs:
+        # Try splitting on colon first, else whitespace
+        if ":" in field_def:
+            left, right = field_def.split(":", 1)


should we consider multiple colon cases like "a:b:c"? or this is handled by upstream/downstream logic already

nah, PySpark's simpleString format only considers the first colon or whitespace.

sfc-gh-aling · 2025-01-09T22:46:31Z

src/snowflake/snowpark/_internal/type_utils.py

+    for i, c in enumerate(s):
+        if c in ["<", "("]:
+            bracket_depth += 1
+        elif c in [">", ")"]:
+            bracket_depth -= 1
+            if bracket_depth < 0:
+                raise ValueError(f"Mismatched bracket in '{s}'.")


I feel this bracket check logic has repeated multiple times
do you think it's possible to check the bracket match as the initial step for only one time for the whole input string, and then in the downstream logic we can only focus on extracting the names and types

We need to parse bracket to split fields, and extract names and types anyway. There is indeed a duplicate of validating whether the bracket expression is valid or not, maybe we can remove it. But to make the function self-contained, maybe let's still keep it? They are also covered in the test.

sfc-gh-aling · 2025-01-09T22:49:18Z

src/snowflake/snowpark/session.py

+                    f"Invalid schema string: {schema}. "
+                    f"You should provide a valid schema string representing a struct type."
+                )
+        if isinstance(schema, (StructType, str)):


would the schema still be of type str after being processed by the type_string_to_type_object?
I feel we don't need to the str in the instance check here

Suggested change

if isinstance(schema, (StructType, str)):

if isinstance(schema, StructType):

yea it's by mistake

init

c773837

sfc-gh-jdu requested review from a team as code owners January 7, 2025 06:05

sfc-gh-jdu requested review from sfc-gh-yuwang, sfc-gh-aalam and sfc-gh-jrose January 7, 2025 06:05

fix

0f073dc

sfc-gh-aling reviewed Jan 9, 2025

View reviewed changes

address comment

64fe8b4

sfc-gh-jdu requested a review from sfc-gh-aling January 9, 2025 23:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SNOW-1871175: Add support for specifying a schema string for `DataFrame.create_dataframe` #2828

SNOW-1871175: Add support for specifying a schema string for `DataFrame.create_dataframe` #2828

sfc-gh-jdu commented Jan 7, 2025 •

edited

Loading

sfc-gh-aling Jan 9, 2025

sfc-gh-jdu Jan 9, 2025 •

edited

Loading

sfc-gh-aling Jan 9, 2025

sfc-gh-jdu Jan 9, 2025

sfc-gh-aling Jan 9, 2025 •

edited

Loading

sfc-gh-jdu Jan 9, 2025

sfc-gh-aling Jan 9, 2025

sfc-gh-jdu Jan 9, 2025

	if isinstance(schema, (StructType, str)):
	if isinstance(schema, StructType):

SNOW-1871175: Add support for specifying a schema string for DataFrame.create_dataframe #2828

Are you sure you want to change the base?

SNOW-1871175: Add support for specifying a schema string for DataFrame.create_dataframe #2828

Conversation

sfc-gh-jdu commented Jan 7, 2025 • edited Loading

sfc-gh-aling Jan 9, 2025

Choose a reason for hiding this comment

sfc-gh-jdu Jan 9, 2025 • edited Loading

Choose a reason for hiding this comment

sfc-gh-aling Jan 9, 2025

Choose a reason for hiding this comment

sfc-gh-jdu Jan 9, 2025

Choose a reason for hiding this comment

sfc-gh-aling Jan 9, 2025 • edited Loading

Choose a reason for hiding this comment

sfc-gh-jdu Jan 9, 2025

Choose a reason for hiding this comment

sfc-gh-aling Jan 9, 2025

Choose a reason for hiding this comment

sfc-gh-jdu Jan 9, 2025

Choose a reason for hiding this comment

SNOW-1871175: Add support for specifying a schema string for `DataFrame.create_dataframe` #2828

SNOW-1871175: Add support for specifying a schema string for `DataFrame.create_dataframe` #2828

sfc-gh-jdu commented Jan 7, 2025 •

edited

Loading

sfc-gh-jdu Jan 9, 2025 •

edited

Loading

sfc-gh-aling Jan 9, 2025 •

edited

Loading