readExcel created <Comparable> column #150

LeandroC89 · 2022-08-14T21:10:51Z

Hello,
found this annoying situation where a schema would be printed as

F1: Comparable
F2: String
F3: String
F4: String
F5: Comparable
F6: String

This creates 3 kinds of issues:

If a filter is used, sometimes only one of the entries will be retrieved even if there are 2 for the given filter.
If an update or sort operation is used this issues occurs -> class java.lang.Double cannot be cast to class java.lang.String (java.lang.Double and java.lang.String are in module java.base of loader 'bootstrap')
If a convert{ "F1"() }.to() is called this error occurs -> Can't find converter from kotlin.Comparable<> to kotlin.String*

For situation 1 I tried to update or convert the column to a String, hence why I discovered situations 2 & 3

Thanks

The text was updated successfully, but these errors were encountered:

koperagen · 2022-08-16T12:03:22Z

Hi! Would it be helpful and clear what's going on if schema was printed like this?

F1: Comparable (String & Double)
F2: String
F3: String
F4: String
F5: Comparable (String & Double)
F6: String

LeandroC89 · 2022-08-16T23:39:03Z

Hi,
my issue is not with the schema itself but with using Comparable as a type.

If you have 2 occurrences of the same value on the same column. Excel filter would detect both instances. But Dataframe filter may return only 1. (I suppose this has to do with one being considered String and the other a Double. Despite the reason it is a huge risk as filter suddenly becomes unreliable)

Same happens with the sort operation, instead of sorting it throws an Exception since it can't compare String with a Double.

If one tries to use convert to change types, it either isn't possible or you have define the column type as either String or Double since Comparable is not a valid type. And then you get the same Exception as before.

If one tries the convert to String it'll fail. Likewise for the update function.

My workaround was to add a new column which is the Comparable typed column .toString() and use it instead.

My point is, if Comparable is unreliable and clearly the library isn't prepared for it, wouldn't it be better to simply remove it? Make it so if a Text is found on a column of Number up until then, then the whole column would be type String.

I even ensured I had selected all populated Excel cells and had them typed as Text beforehand (not that Excel is any good at typings unless you go through each cell, double click and press Enter but that would take forever).

Thank you.

koperagen · 2022-08-17T11:48:36Z

Make it so if a Text is found on a column of Number up until then, then the whole column would be type String.

This sounds good, i think. If it means no loss of data (i.e. all those numbers can be converted back by something like it.toDoubleOrNull()), then we probably can do it.

But i still would like to see if we can improve experience of how to handle situations when this weird type shows up in input data.

Because, in fact, you can tidy up this column like this
df.convert { F1 }.with(Infer.Type) { (it as? Double)?.let { it.toString() } }

Another solution could be saving F1 as a ColumnGroup:
F1

string: String
double: Double

So you could
df.convert { F1 }.with { it.string ?: it.double?.toString() }

What do you think? My concern is that first solution is probably easy to miss, and the second one can be confusing, because you suddenly get ColumnGroup instead of DataColumn. Maybe we should print schema after read operations by default in notebooks and add some extra information, idk

LeandroC89 · 2022-08-19T08:20:03Z

Hello,
and thank you for your explanation.

I'm having some difficulty getting the inferType to properly work. I managed to get it like this:
(I wanted nulls as "" for this case)

.convert { "F1"<Any>() }
            .with(inferType = true) {
                 it.toString() }

I used Any because Comparable is not a valid Column Type, whether used for String access or for Column Accessors
(Using String would cause an issue with InferType)

Still had to deal with the ".0" resulting from the conversion Double to String but i's already something I can work with.

But as for the Comparable use case:

It can cause issues when using sort
Filters may become unreliable if data isn't properly converted beforehand
Column is not accepted, having to resort to using one of the column types or Any

Does it make sense to keep it as a possible column type when reading a file? (Just adding some remarks, I already have a workaround for my main issue thanks to your reply 👍 )

Thank you!

Jolanrensen · 2023-10-09T14:36:22Z

@koperagen had other ideas for handling multiple types in one column, which I summarized here: #466.

koperagen · 2024-07-05T11:10:28Z

Found a good way to solve it after all #745

zaleslaw added this to the 0.11.0 milestone Apr 25, 2023

zaleslaw added the question Further information is requested label Apr 25, 2023

zaleslaw self-assigned this Jun 12, 2023

zaleslaw modified the milestones: 0.11.0, 0.12.0, Backlog Jun 19, 2023

Jolanrensen self-assigned this Jun 22, 2023

Jolanrensen modified the milestones: Backlog, 0.12.0 Jun 22, 2023

zaleslaw removed their assignment Jun 23, 2023

Jolanrensen added the invalid This issue/PR doesn't seem right label Oct 9, 2023

Jolanrensen mentioned this issue Oct 9, 2023

Union type columns #466

Open

Jolanrensen mentioned this issue Oct 9, 2023

Improved rendering of types with arguments, like Comparable<*> #467

Merged

Jolanrensen modified the milestones: 0.12.0, Backlog Nov 7, 2023

zaleslaw removed the invalid This issue/PR doesn't seem right label Apr 8, 2024

koperagen modified the milestones: Backlog, 0.14.0 Jul 5, 2024

koperagen closed this as completed Jul 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readExcel created <Comparable> column #150

readExcel created <Comparable> column #150

LeandroC89 commented Aug 14, 2022

koperagen commented Aug 16, 2022

LeandroC89 commented Aug 16, 2022 •

edited

Loading

koperagen commented Aug 17, 2022

LeandroC89 commented Aug 19, 2022

Jolanrensen commented Oct 9, 2023

koperagen commented Jul 5, 2024

readExcel created <Comparable> column #150

readExcel created <Comparable> column #150

Comments

LeandroC89 commented Aug 14, 2022

koperagen commented Aug 16, 2022

LeandroC89 commented Aug 16, 2022 • edited Loading

koperagen commented Aug 17, 2022

LeandroC89 commented Aug 19, 2022

Jolanrensen commented Oct 9, 2023

koperagen commented Jul 5, 2024

LeandroC89 commented Aug 16, 2022 •

edited

Loading