You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the reader functions use a default "ParseFixer" by default. This means that errors are swallowed / autocorrected until some threshold number of errors is reached. This is a problem for usability; and it may hide unexpected bugs (at least one discovered so far #72 ).
This Issue requests the following, in descending order of priority:
turn off ParseFixer by default
probably redesign the whole ParseFixer architecture, currently quite convoluted
easily allow user to specify their own fixer policy
The default ParseFixer makes debugging difficult; during debugging we usually want errors to be raised immediately. There is a way to kind of achieve this by monkeypatching a _called_from_test attribute onto the ParseFixer, but that's much too convoluted.
There should be an easy way to run read_csv() and read_excel() without any ParseFixer. Right now read_csv(..., , fixer=None) doesn't do that, because "None" gets interpreted as, use the default ParseFixer.
Principle of least surprise: the default parse fixer is quite aggressive. It will more or less silently replace omitted values by "defaults", fixer second-guesses what the user "really" intended to write; but the user may not have wanted these values and they could do damage downstream.
Possible use cases for a ParseFixer
I can kind of see the following use cases, with my comments:
Ergonomics: Don't crash on the first tiny error, but instead collect all errors as they are encoutered, and crash at the end, reporting all errors at once. So if there are 10 errors, don't have to run the reader 10 times to reveal them all. <<< this is an option I think makes sense to have... but that's not what the current default fixer does.
Fix errors as they are encountered, according to a user-specified policy. <<< This is not something that should be done by default like it is now. But it would be okay to give the user the option to do so, on their own terms, if they explicitly specify this.
Report all encountered errors, for user checking in an external tool, e.g. the StarTable Editor that @BennyLassen proposed. <<< This is a good idea. But again, it should not be the default that we force-feed to all pdtable users. At best make it an option to be specified explicitly.
The text was updated successfully, but these errors were encountered:
Think we are in agreement here. My take would be that the only purpose would be to allow a channel for reporting parser errors. Since we already have the block stream, the simplest way of doing so would probably be to emit Error blocks. That way, users would even be free to implement "fixers" on top.
Currently, the reader functions use a default "ParseFixer" by default. This means that errors are swallowed / autocorrected until some threshold number of errors is reached. This is a problem for usability; and it may hide unexpected bugs (at least one discovered so far #72 ).
This Issue requests the following, in descending order of priority:
Rationale: multiple reasons:
I suspect there could be more hidden bugs in the way the default fixer "fixes" "errors". Quotation marks here, because sometimes the fixes actually make things worse and creates errors by fixing things that weren't errors in the first place (cf. Reading table blocks fails when there is a comment to the right of the column name row #72 ).
The default ParseFixer makes debugging difficult; during debugging we usually want errors to be raised immediately. There is a way to kind of achieve this by monkeypatching a
_called_from_test
attribute onto the ParseFixer, but that's much too convoluted.There should be an easy way to run
read_csv()
andread_excel()
without any ParseFixer. Right nowread_csv(..., , fixer=None)
doesn't do that, because "None" gets interpreted as, use the default ParseFixer.Principle of least surprise: the default parse fixer is quite aggressive. It will more or less silently replace omitted values by "defaults", fixer second-guesses what the user "really" intended to write; but the user may not have wanted these values and they could do damage downstream.
Possible use cases for a ParseFixer
I can kind of see the following use cases, with my comments:
Ergonomics: Don't crash on the first tiny error, but instead collect all errors as they are encoutered, and crash at the end, reporting all errors at once. So if there are 10 errors, don't have to run the reader 10 times to reveal them all. <<< this is an option I think makes sense to have... but that's not what the current default fixer does.
Fix errors as they are encountered, according to a user-specified policy. <<< This is not something that should be done by default like it is now. But it would be okay to give the user the option to do so, on their own terms, if they explicitly specify this.
Report all encountered errors, for user checking in an external tool, e.g. the StarTable Editor that @BennyLassen proposed. <<< This is a good idea. But again, it should not be the default that we force-feed to all pdtable users. At best make it an option to be specified explicitly.
The text was updated successfully, but these errors were encountered: