Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Readers? #91

Open
marcusklaas opened this issue Oct 27, 2015 · 10 comments
Open

Readers? #91

marcusklaas opened this issue Oct 27, 2015 · 10 comments

Comments

@marcusklaas
Copy link

It would be convenient to have an object that implements Read, so one could for example easily and efficiently read from a file in an encoding other than utf-8.

@SimonSapin
Copy link
Collaborator

It sounds like you want something that implements not std::io::Read (which is a stream of bytes) but another trait for a Unicode stream. But as discussed in this RFC: rust-lang/rfcs#57, doing it for reading is tricky. The bytes one takes a &mut [u8] argument, writes to it, and returns the number of written bytes. But doing that with &mut str might require some zeroing, or something. The contents of str must be well-formed UTF-8.

I’m experimenting with things that could help here. I’ll post again where there’s something more fully formed to show.

@marcusklaas
Copy link
Author

Sorry for my vague description. I meant some kind of adapter between a stream of bytes in for examples Windows-1252 and a stream of bytes in utf-8. The unicode stream would be very nice, but there's a lot of code that already works with std::io::Read.

@SimonSapin
Copy link
Collaborator

That sounds like it could be built on top of "raw" decoders.

@SimonSapin
Copy link
Collaborator

… probably with an impl of encoding::types::StringWriter for &mut [u8], to be used with the argument to Read::read.

@bbigras
Copy link
Contributor

bbigras commented Aug 13, 2016

Any progress?
Anything changed since last time that would make it easier?

@mitsuhiko
Copy link

I just came across the same myself. Would this be something that is in the scope of the crate?

@BurntSushi
Copy link

I have to write these impls for a project of mine and would also like to hear whether @lifthrasiir thinks they might be in scope for this crate.

I've also started a conversation on the encoding_rs crate: hsivonen/encoding_rs#8

@BurntSushi
Copy link

BurntSushi commented Mar 13, 2017

To cross pollinate a bit here from the encoding_rs crate... @SimonSapin and I worked on our own versions of Read trait implementations (except @SimonSapin did quite a bit more!). @SimonSapin's work is in this PR: hsivonen/encoding_rs#9 My work is here: https://github.com/BurntSushi/ripgrep/blob/75f1855a91ca00b5d0e62740595b1b91bc5142a2/src/decoder.rs

The big idea here is that implementing these traits is quite tricky, and neither of our implementations is fully correct. Mine gets most of the way there, but doesn't support single-byte-reads, which means the bytes adapter method doesn't work at all. It's possible to make this work, but requires a bit more book-keeping.

@mitsuhiko
Copy link

I wonder if the traits are misdesigned for non utf-8 usage. It's weird that they work with both strings and bytes.

@BurntSushi
Copy link

In my case, I very much wanted to ever avoid materializing a &str and the costs associated with it. So operating on &[u8] is perfect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants