Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

binary protocol [design] #24

Open
bergey opened this issue Jul 5, 2019 · 9 comments
Open

binary protocol [design] #24

bergey opened this issue Jul 5, 2019 · 9 comments

Comments

@bergey
Copy link
Contributor

bergey commented Jul 5, 2019

Postgres (and libpq) support both text & binary formats in the wire protocol. postgresql-simple currently only uses the text format. Would you be interested in adding support for the binary format? If so, I'll post some design ideas here.

I think we can upgrade many queries to the binary protocol without changes to downstream code, and fall back to the text when we don't have a binary decoder for some custom field type.

@phadej
Copy link
Collaborator

phadej commented Jul 5, 2019

If there's good upgrade story, I'm interested. Otherwise there is https://hackage.haskell.org/package/postgresql-binary (which I never used)

@bergey
Copy link
Contributor Author

bergey commented Jul 6, 2019

I also use postgresql-binary, by way of hasql. I have some ideas to contribute code there, also. =)

In the design below, my goal is that most application authors can pick up the speed improvements without changing their code. I think we should also provide some interface that gives more control, but I think most programmers won't want to think about encoding issues.

  • queries should use binary if possible
  • queries should fall back to text if any of the columns don't have binary decoders
  • downstream instances of type classes should keep compiling

Key changes to get there:

  class FromField a where
      fromField :: FieldParser a
      fromFieldBinary :: Maybe (FieldParser a)
      fromFieldBinary = Nothing
 
  data RowParser a = RowParser
      { textRowParser :: ReaderT Row (StateT PQ.Column Conversion) a
      , binaryRowParser :: Maybe (ReaderT Row (StateT PQ.Column Conversion) a)
      } deriving Functor

  instance Applicative RowParser where
      pure a = RowParser (pure a) (Just (pure a))
      f <*> (RowParser text binary) = RowParser (f <*> text) (fmap (f <*>) binary)

  field :: FromField a => FromRow a
  field = RowParser (fieldWith fromField) (fieldWith <$> fromFieldBinary)

The binary decoder for any given field is optional. Downstream instances of FromField keep compiling, with the default Nothing. FromRow does not change, but the RowParser type carries both ways of parsing. This way the Applicative combines both parsers at the same time, so typical FromRow instances like (,,) <$> field <*> field <*> field still work.

What do you think? Worth turning into a PR that compiles?

@phadej
Copy link
Collaborator

phadej commented Jul 6, 2019

Sorry, for asking dumb questions:

  • So for each query we can select whether the response will use binary protocol or not?
  • binary protocol isn't used in queries itself (i.e. no need to change ToField)
    (I wonder if there's some benefit in executeMany like use cases)
    • I'm ok with leaving that out, I just want to know if that's possible

That change doesn't look too intrusive.

Note: there's MaybeT which make working with RowParser simpler. Maybe :)

I'd happy to see how it would look like, if it's doable in few hours. Feel free to implement only FromField Int and FromField Text, that should be enough to try it almost for real.


Thanks for contributing!

@phadej
Copy link
Collaborator

phadej commented Jul 6, 2019

I'm kind of worried, that maybe users want rather be sure that the binary protocol is used,
so we'd rather have FromBinaryField and FromBinaryRecord classes duplicating things,
but OTOH users will know which works, and be able to fallback to non-binary option...

Yet again, I wonder how something like beam would work. In FromField they
might get some benefit here and there for free, but then no benefit.

So maybe the final design would have both Maybe field in FromField and still have FromBinaryField.

Something to think about.

Real use cases would help motivate. I think that guaranteed case would be important for some (many). My DB usage is usually very small data-wise, so I don't care that much.

EDIT: e.g. in aeson there's toJSON and toEncoding, and one cannot be sure at which point something intermediate forces everything beyond go though toJSON... a compromise...

@bergey
Copy link
Contributor Author

bergey commented Jul 6, 2019

Good questions; I just didn't want my earlier comment to get too long.

libpq lets us decide for each query, whether to use binary or text encoding. Every field in the row needs to use the same choice. I imagine we'll want 3 variants (not serious suggestions for names):

  • queryText always uses text encoding (might help for debugging)
  • queryBinary always uses binary encoding (runtime error / log if it can't?)
  • queryBest uses binary if available, text otherwise

libpq also supports binary encoding for values sent to the database. I think we could make parallel changes in ToField. I'd like to do this someday, but my sense is it won't be as valuable. I rarely insert many rows at once in a latency-sensitive program, but I read many rows at once all the time.

I'm sure you're right that some users will really want to know that they're getting the fast path. I'm not sure how to balance that with the users who won't think to read the release notes, but should get the fast path anyway.

I'd be fine having both the Maybe in FromField and FromBinaryField. It's potentially confusing when writing instances, but probably not very many of us do that. In that case, I wonder if it's better to change the default implementation to fromFieldBinary :: FromBinaryField a => Maybe (RowParser a). I have no idea how many FromField instances there are outside postgresql-simple.

Thanks for the feedback!

@phadej
Copy link
Collaborator

phadej commented Jul 6, 2019

runtime error / log if it can't?)

That's one problem. I'd prefer compile-time error.


There are at least FromField custom instances for custom newtype wrappers (e.g. over Text, UTCTime, Int...) and CREATE ENUM types.

@chshersh
Copy link

chshersh commented Sep 19, 2019

I found this issue, and I wanted to share that there's postgres-wire project which implements efficient native binary PostgreSQL protocol:

It's not finished yet but could be a decent alternative. You can find more context about the status of the project under the following issue:

@jessekempf
Copy link

Hi @phadej:

A few days ago I started working on a proof of concept of binary protocol support after tracking down some performance problems with a Persistent-based personal project of mine. I've taken a diametrically opposite approach, altering the core types like so:

data TextFormat
data BinaryFormat

type FieldParser fmt a = Field -> Maybe ByteString -> Conversion a

class FromField fmt a where
    fromField :: FieldParser fmt a

class FromRow fmt a where
    fromRow :: RowParser fmt a
    default fromRow :: (Generic a, GFromRow fmt (Rep a)) => RowParser fmt a
    fromRow = to <$> gfromRow

query_ :: (FromRow TextFormat r) => Connection -> Query -> IO [r]
query_ = queryWith_ fromRow

queryBinary_ :: (FromRow BinaryFormat r) => Connection -> Query -> IO [r]
queryBinary_ = queryBinaryWith_ fromRow

Some parts of the existing codebase are format-agnostic, but the concrete FromField a decoders are now all FromField TextFormat a.

I've got support for numeric types working and the test suite passing.

There's an unavoidable semantic change between using PostgreSQL simple query and PostgreSQL extended query. Extended Query is needed in order to tell the server whether to respond using textual or binary formats, but it only accepts one command at a time. A string like BEGIN; SELECT * FROM myTable; COMMIT is a parse error under extended query rules.

There's a second unavoidable semantic change between using text and binary formats, which is that the text format is guaranteed to be stable in a way that the binary format is not -- the integer_datetimes compile-time switch changes the internal storage format of dates and times, but also changes the binary format from floating point to integer.

At least for a viable proof of concept, I'd suggest bifurcating the API is preferable.

The question I've got is, is this a worthwhile approach to pursue?

@phadej
Copy link
Collaborator

phadej commented Aug 6, 2020

My best understanding is that having someone develop a separate postgresql-simple-binary package is easier to get right, than to try to leave with both.

When there is that, one can try to share common parts, but I don't think that postgresql-simple is a place for iterative experimentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants