Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming parser #205

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 47 additions & 1 deletion yaml/src/Data/Yaml/Internal.hs
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,9 @@ module Data.Yaml.Internal
, prettyPrintParseException
, Warning(..)
, parse
, parseStream
, Parse
, ParseState (..)
, decodeHelper
, decodeHelper_
, decodeAllHelper
Expand Down Expand Up @@ -52,7 +54,7 @@ import qualified Data.ByteString.Lazy as BL
import Data.ByteString.Builder.Scientific (scientificBuilder)
import Data.Char (toUpper, ord)
import Data.List
import Data.Conduit ((.|), ConduitM, runConduit)
import Data.Conduit ((.|), ConduitM, runConduit, yield)
import qualified Data.Conduit.List as CL
import qualified Data.HashSet as HashSet
import Data.Map (Map)
Expand Down Expand Up @@ -217,6 +219,50 @@ parseAll = do
_ -> missed documentStart
missed event = liftIO $ throwIO $ UnexpectedEvent event Nothing

-- | Parse a yaml file (as a stream of events) to a stream of values.
-- It only accepts documents whose top-level entity is an array.
--
-- In combination with 'Y.decodeFile', it can be used to consume
-- a yaml file that consists in a very large list of values.
parseStream :: ReaderT JSONPath (ConduitM Event Value Parse) ()
parseStream = do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change the name of this if there's a better name for it.

streamStart <- lift CL.head
case streamStart of
Nothing ->
-- empty string input
return ()
Just EventStreamStart ->
-- empty file input, comment only string/file input
parseDocs
_ -> missed streamStart
where
parseDocs = do
documentStart <- lift CL.head
case documentStart of
Just EventStreamEnd -> return ()
Just EventDocumentStart -> do
parseSubStream
requireEvent EventDocumentEnd
parseDocs
_ -> missed documentStart
missed event = liftIO $ throwIO $ UnexpectedEvent event Nothing

parseSubStream :: ReaderT JSONPath (ConduitM Event Value Parse) ()
parseSubStream = do
me <- lift CL.head
case me of
Just (EventSequenceStart _ _ a) -> parseArrayStreaming 0 a
_ -> liftIO $ throwIO $ UnexpectedEvent me $ Just $ EventSequenceStart NoTag AnySequence Nothing
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some values here for the "expected" event, hoping that would be more helpful that using Nothing.


parseArrayStreaming :: Int -> Y.Anchor -> ReaderT JSONPath (ConduitM Event Value Parse) ()
parseArrayStreaming !n a = do
me <- lift CL.peek
case me of
Just EventSequenceEnd -> lift $ CL.drop 1
_ -> do
local (Index n :) parseO >>= lift . yield
parseArrayStreaming (n+1) a

parseScalar :: ByteString -> Anchor -> Style -> Tag
-> ReaderT JSONPath (ConduitM Event o Parse) Text
parseScalar v a style tag = do
Expand Down
2 changes: 1 addition & 1 deletion yaml/yaml.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ cabal-version: 1.12
-- see: https://github.com/sol/hpack

name: yaml
version: 0.11.7.0
version: 0.11.8.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't change this myself. I don't know why it kept happening.

synopsis: Support for parsing and rendering YAML documents.
description: README and API documentation are available at <https://www.stackage.org/package/yaml>
category: Data
Expand Down