Skip to content

meooow25/parser-regex

Repository files navigation

parser-regex

Hackage Haskell-CI

Regex based parsers

Features

  • Parsers based on regular expressions, capable of parsing regular languages. There are no extra features that would make parsing non-regular languages possible.
  • Regexes are composed using combinators.
  • Resumable parsing of sequences of any type containing values of any type.
  • Special support for Text and String in the form of convenient combinators and operations like find and replace.
  • Parsing runtime is linear in the length of the sequence being parsed. No exponential backtracking.

Example

{-# LANGUAGE OverloadedStrings #-}
import Control.Applicative (optional)
import Data.Text (Text)

import Regex.Text (REText)
import qualified Regex.Text as R
import qualified Data.CharSet as CS

data URI = URI
  { scheme    :: Maybe Text
  , authority :: Maybe Text
  , path      :: Text
  , query     :: Maybe Text
  , fragment  :: Maybe Text
  } deriving Show

-- ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
-- A non-validating regex to extract parts of a URI, from RFC 3986
-- Translated:
uriRE :: REText URI
uriRE = URI
  <$> optional (R.someTextOf (CS.not ":/?#") <* R.char ':')
  <*> optional (R.text "//" *> R.manyTextOf (CS.not "/?#"))
  <*> R.manyTextOf (CS.not "?#")
  <*> optional (R.char '?' *> R.manyTextOf (CS.not "#"))
  <*> optional (R.char '#' *> R.manyText)
>>> R.reParse uriRE "https://github.com/meooow25/parser-regex?tab=readme-ov-file#parser-regex"
Just (URI { scheme = Just "https"
          , authority = Just "github.com"
          , path = "/meooow25/parser-regex"
          , query = Just "tab=readme-ov-file"
          , fragment = Just "parser-regex" })

Documentation

Please find the documentation on Hackage: parser-regex

Already familiar with regex patterns? See the Regex pattern cheat sheet.

Alternatives

regex-applicative

regex-applicative is the primary inspiration for this library, and provides a similar set of features. parser-regex attempts to be a more fully-featured library built on the ideas of regex-applicative.

Traditional regex libraries

Other alternatives are more traditional regex libraries that use regex patterns, like regex-tdfa and regex-pcre/ regex-pcre-builtin.

Reasons to use parser-regex over traditional regex libraries:

  • You prefer parser combinators over regex patterns
  • You need more powerful parsing capabilities than just submatch extraction
  • You need to parse a sequence type that is not supported by these regex libraries

Reasons to use traditional regex libraries over parser-regex:

  • The terseness of regex patterns is better suited for your use case
  • You need something very fast, and adversarial input is not a concern. Use regex-pcre/regex-pcre-builtin.

For a more detailed comparison of regex libraries, see here.

Contributing

Questions, bug reports, documentation improvements, code contributions welcome! Please open an issue as the first step.