Support read unstructured excel file #901
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
It has been confirmed that the current Dataframe.readExcel function only supports structured Excel formats.
Of course, it would be ideal if everyone created Excel files in a structured format, but as shown in the image, when dealing with unstructured Excel formats, the current DataFrame approach of always designating the first row as the header causes difficulties in usage.
The Python pandas library supports this, making it very efficient to use.
As a result, for unstructured Excel formats, i've implemented support by using a value called withDefaultHeader.
When set to true, it automatically generates headers using NameRepairStrategy,
thus enabling support for unstructured Excel formats.
when withDefaultHeader is set to true, it operates as [NameRepairStrategy.MAKE_UNIQUE]
Is there a better approach?