-
-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for escaping the delimiter like the python counterpart (Write) #234
base: master
Are you sure you want to change the base?
Conversation
46a8554
to
8d84634
Compare
Forcing checking for escaping is very slow. We're using |
I assumed performance would take a hit but not by that much, thanks for commenting. I totally agree with you that it's undesirable. As I mentioned it would have to be a new option then, since escape is not an I've been pondering about this a little, but I'd like to hear from @BurntSushi first on what the philosophies are, and if this feature would even be considered in Writer or not. All I know is that it exists in python std, and I personally prefer this CSV style to all others. You save a few bytes in storage, reading and writing, it's simple and predictable (just escape delimiter + escape char). It's also (for me) more readable raw. |
So basically, I don't have time to really dig into this right now, but I'll say these things based on skimming the comments above:
One question I personally have is whether or where this particular dialect of CSV is used. @leaty Have you seen it used in the wild? Or is this just something you happen to personally use? |
That's absolutely fine, no need to rush. It was more a "should I elaborate for upstream or not" moment since what I've done already works for my immediate purposes. As long as it's beneficial to the library as a whole I'll fix it up and await the review, days or months from now doesn't really matter. Answer in your own time.
I am of course biased, and I can't speak for other companies or organizations, but my company has used this style in all CSV data paths for years and I've personally become a strong advocate for it. Disregarding the benefit of it generally resulting in less bytes, it's also useful in some minor cases where you have no access to any good CSV libraries, or if you're reading it raw in shell for some obscure debugging reason- since you can literally just temporarily replace
My thoughts are one of the following:
|
This is a proof of concept for escaping the delimiter in Writer. However, it overrides the default of
QuoteStyle::Never
, i.e. it will rather escape than produce potentially broken CSV.One con of this, if you force
Never
because you know there is nothing to escape by quoting, it will still check if it has to escape something and if so escape using escape char. That's naturally going to be slower, although probably negligible.Another con is that the escape char will always be escaped when it's on
Never
. This breaks reading it if you don't enable escape in Reader after #233. I would likely need to add a new option to avoid these cons because escape is not anOption
in Writer.Basically, any delimiter/escape char is prepended with the escape char. E.g.
Examples below, note though that
\\
inStringRecord
is just escaped output for\
:Writing:
Reading with #233: