Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide buffer parser i.e. runParserPtr :: ParserIO e a -> Ptr Word8 -> Int -> _ #57

Open
raehik opened this issue Mar 25, 2024 · 11 comments

Comments

@raehik
Copy link
Collaborator

raehik commented Mar 25, 2024

Brief preamble. I'm using flatparse for some binary file parsing. I'm parsing a filetable, where I know the precise length. I'm operating on Ptrs because I want the core to be source-agnostic (whether working on bytestrings directly, or file handles etc.). I could copy the bytes into a bytestring and parse, but I figure, why don't I parse an address directly? To my (limited) knowledge, it would seem sensible, assuming the lifetime is handled externally (e.g. using withForeignPtr).

Assuming good intentions (no lying about pointers or their length), would a runParserPtr :: ParserIO e a -> Ptr Word8 -> Int -> _ be useful and safe? With lots of warnings on pitfalls of course.

I can provide some example code if useful.

@raehik
Copy link
Collaborator Author

raehik commented Mar 25, 2024

Something I note is that since flatparse carries around a ForeignPtrContents in its internal state, we have to come up with one in such cases. FinalPtr :: ForeignPtrContents exists and seems to be intended for such cases (though I've not seen or used it before).

@raehik
Copy link
Collaborator Author

raehik commented Apr 7, 2024

The problem I've come against here is that Result e a has OK a B.ByteString. runParserPtr can't return a ByteString because it doesn't have "memory rights" to the bytes it's parsing. We can only return an Int indicating the number of bytes left unparsed.

On a related note, what is the reason for ferrying a ForeignPtrContents along in the parser type? Flicking through code, I don't see it used in parser combinators. Can we not use something like withForeignPtr in the parser runner to keep the inner Ptr/Addr# in scope? Then I could imagine adapting Result to enable returning an Int in these such cases.

Edit: I see that the ForeignPtrContents is used in some combinators to enable non-copying bytestring parsing e.g. byteStringOf. So any lying to GHC (like attaching a FinalPtr empty finalizer to an arbitrary Ptr) would be very unsafe. I'll just copy my Ptr Word8 to a fresh ByteString before parsing.

I might try writing a buffer parser parallel to flatparse but without carrying around the ForeignPtrContents.

@AndrasKovacs
Copy link
Owner

IIRC I added ForeignPtrContents specifically to enable ByteString creation. I found it convenient and observed that it's reliably eliminated by GHC when it's not used.

@raehik
Copy link
Collaborator Author

raehik commented Apr 8, 2024

As I understand it's convenient because all the ByteStrings we generate where we use the ForeignPtrContents are just references to parts of our input, rather than being copied. I think that's usually OK (though could be a source of memory leakage?). But in the case where we're parsing a Ptr Word8, where all we know is 1) it's in scope and 2) it's readable, we can't do that as we don't have any control over the input's lifecycle. (I'm new to low-level Haskell and I might be wrong about this.)

It's not a complaint with the library since we advertise it as a ByteString parser, not a buffer one. But it's an interesting little hidden detail in the library design.

@AndrasKovacs
Copy link
Owner

AndrasKovacs commented Apr 8, 2024

How can it happen that we have a Ptr and don't know anything about its provenance? I would say that any such situation is a bug. And if we know where the Ptr comes from we can create a finalizer for it.

@raehik
Copy link
Collaborator Author

raehik commented Apr 8, 2024

What about parsing from a buffer provided by allocaBytes :: Int -> (Ptr a -> IO b) -> IO b? I would be happy to use FinalPtr for the finalizer. The problem is that some of our combinators use the finalizer to create "free" bytestrings (without copying). If I understand correctly, that wouldn't work in this case.

@AndrasKovacs
Copy link
Owner

AndrasKovacs commented Apr 8, 2024

It would work with as much safety guarantee as allocaBytes in general, where we shouldn't let the Ptr escape, and there's no runtime or compile time check to enforce this. I can imagine situations where it's convenient to pack FinalPtr in bytestrings and use them while we're still under the alloca. Users who are parsing from alloca-d Ptr already do manual memory management, so why not let them manually scope bytestrings.

That said, if we add runParserPtr, the docs for the bytestring operations should mention the lifetime issues and there should be also copying variants of bytestring creation for convenience.

@raehik
Copy link
Collaborator Author

raehik commented Apr 8, 2024

I can imagine situations where it's convenient to pack FinalPtr in bytestrings and use them while we're still under the alloca.

You're right, now I understand that I'm very happy with the internal parser design. Thanks for your patience and explanations :)
Could I start a PR adding runParserPtr? Should it take a ForeignPtrContents, or should we always fill in with FinalPtr?

@AndrasKovacs
Copy link
Owner

AndrasKovacs commented Apr 8, 2024

Btw, is there a standard function that turns Ptr into an "unsafe" bytestring with FinalPtr? If there's such thing I don't necessarily see need for runParserPtr.

@raehik
Copy link
Collaborator Author

raehik commented Apr 8, 2024

I can't find any Ptr a -> Int -> ByteString on Hoogle. I realize now I could pack a fake bytestring without much work. But it feels like a convenient place to document lifetime issues for anyone trying to do similar things. And I like to avoid importing Data.ByteString.Internal in my code where possible.

@AndrasKovacs
Copy link
Owner

I guess we can just go with runParserPtr then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants