Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-45554][PYTHON] Introduce flexible parameter to `assertSchemaEq…
…ual` ### What changes were proposed in this pull request? This PR proposes to add three new parameters to the `assertSchemaEqual`: `ignoreNullable`, `ignoreColumnOrder` and `ignoreColumnName` to provide users with more flexibility in schema testing. ### Why are the changes needed? To enhance the utility of `assertSchemaEqual` by accommodating various common schema comparison scenarios that users might encounter, without necessitating manual adjustments or workarounds. ### Does this PR introduce _any_ user-facing change? Yes. `assertDataFrameEqual` now have the option to use the five new parameters: <!DOCTYPE html> Parameter | Type | Comment -- | -- | -- ignoreNullable | Boolean [optional] | Specifies whether a column’s nullable property is included when checking for schema equality.</br></br> When set to True (default), the nullable property of the columns being compared is not taken into account and the columns will be considered equal even if they have different nullable settings.</br></br>When set to False, columns are considered equal only if they have the same nullable setting. ignoreColumnOrder | Boolean [optional] | Specifies whether to compare columns in the order they appear in the DataFrames or by column name.</br></br> When set to False (default), columns are compared in the order they appear in the DataFrames.</br></br> When set to True, a column in the expected DataFrame is compared to the column with the same name in the actual DataFrame. </br></br>ignoreColumnOrder cannot be set to True if ignoreColumnNames is also set to True. ignoreColumnName | Boolean [optional] | Specifies whether to fail the initial schema equality check if the column names in the two DataFrames are different.</br></br> When set to False (default), column names are checked and the function fails if they are different.</br></br> When set to True, the function will succeed even if column names are different. Column data types are compared for columns in the order they appear in the DataFrames.</br></br> ignoreColumnNames cannot be set to True if ignoreColumnOrder is also set to True. ### How was this patch tested? Added usage examples into doctest for each parameter. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43450 from itholic/SPARK-45554. Authored-by: Haejoon Lee <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
- Loading branch information