-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
readExcel created <Comparable> column #150
Comments
Hi! Would it be helpful and clear what's going on if schema was printed like this?
|
Hi, If you have 2 occurrences of the same value on the same column. Excel filter would detect both instances. But Dataframe filter may return only 1. (I suppose this has to do with one being considered String and the other a Double. Despite the reason it is a huge risk as filter suddenly becomes unreliable) Same happens with the sort operation, instead of sorting it throws an Exception since it can't compare String with a Double. If one tries to use convert to change types, it either isn't possible or you have define the column type as either String or Double since Comparable is not a valid type. And then you get the same Exception as before. If one tries the convert to String it'll fail. Likewise for the update function. My workaround was to add a new column which is the Comparable typed column .toString() and use it instead. My point is, if Comparable is unreliable and clearly the library isn't prepared for it, wouldn't it be better to simply remove it? Make it so if a Text is found on a column of Number up until then, then the whole column would be type String. I even ensured I had selected all populated Excel cells and had them typed as Text beforehand (not that Excel is any good at typings unless you go through each cell, double click and press Enter but that would take forever). Thank you. |
This sounds good, i think. If it means no loss of data (i.e. all those numbers can be converted back by something like it.toDoubleOrNull()), then we probably can do it. But i still would like to see if we can improve experience of how to handle situations when this weird type shows up in input data. Because, in fact, you can tidy up this column like this Another solution could be saving F1 as a ColumnGroup:
So you could What do you think? My concern is that first solution is probably easy to miss, and the second one can be confusing, because you suddenly get ColumnGroup instead of DataColumn. Maybe we should print schema after read operations by default in notebooks and add some extra information, idk |
Hello, I'm having some difficulty getting the inferType to properly work. I managed to get it like this: .convert { "F1"<Any>() }
.with(inferType = true) {
it.toString() } I used Any because Comparable is not a valid Column Type, whether used for String access or for Column Accessors Still had to deal with the ".0" resulting from the conversion Double to String but i's already something I can work with. But as for the Comparable use case:
Does it make sense to keep it as a possible column type when reading a file? (Just adding some remarks, I already have a workaround for my main issue thanks to your reply 👍 ) Thank you! |
@koperagen had other ideas for handling multiple types in one column, which I summarized here: #466. |
Found a good way to solve it after all #745 |
Hello,
found this annoying situation where a schema would be printed as
This creates 3 kinds of issues:
For situation 1 I tried to update or convert the column to a String, hence why I discovered situations 2 & 3
Thanks
The text was updated successfully, but these errors were encountered: