-
Notifications
You must be signed in to change notification settings - Fork 0
Handling (async) exceptions in haskell: snap server (case study)
(See new home for this post)
(The wiki is editable for registered github users. Feel free to fix any typos, style issue or any other error or issue. English is foreign language for me, sorry)
Exception handling is hard, and asynchronous exceptions make it even harder.
But there are common patterns, that simplifies exception handling and make our
life much easier. Here we will explore a widely used open source library,
snap-server
, and identify common issues, difficulties and mistakes. Also we'll
see how to avoid most of the mistakes and describe useful patterns. Thanks to
Gregory Collins for letting me use snap-server here.
(I probably should note, that I'm snapframework user, and I like it. So the criticism here is friendly in it's nature.)
The post is organized into a series of examples of real code. I'll use this specific source code tree. We'll mostly discuss simple, obvious cases.
sendFileFunc :: Socket -> SendFileHandler
sendFileFunc sock !_ builder fPath offset nbytes = bracket acquire closeFd go
where
sockFd = Fd (fdSocket sock)
acquire = openFd fPath ReadOnly Nothing defaultFileFlags
go fileFd = do sendHeaders builder sockFd
sendFile sockFd fileFd offset nbytes
That is an example how your exception handling code should look like. Here
bracket
does most of heavy lifting to prepare safe environment. Acquire and
cleanup actions effectively are library functions (from unix
package).
So the function on itself is perfect. But it relies on library functions to be correct. Lets formulate explicitly the contract we are expecting:
-
openFd
should not leak file descriptor if it fails.
That is pretty natural requirement, because we will not get the file
descriptor at all if openFd
fails, so there is nothing we can do on our
side.
-
closeFd
should not leak file descriptor if it fails.
That doesn't look so natural, but the reason is the same -- there is nothing we can do in case of failure.
Note: the contract should be preserved even in case of async exception, because there is no good way to distinguish async and sync exception.
In the ideal world we should not simply assume the contract is preserved, we should read documentation instead. Unfortunately the is no explicit contract in the documentation. Probably we should pester the maintainer. Anyway, we can either look for other library or use the existing.
One may think that we should inspect source code of openFd
and closeFd
. It
may be useful, but it doesn't solve the issue. Unless the contract is
explicitly states, the author is free to change it. But you can inspect the
particular version of the unix
package and commit yourself to it.
withLoggers afp efp act =
bracket (do mvar <- newMVar ()
let f s = withMVar mvar
(const $ S.hPutStr stderr s >> hFlush stderr)
alog <- maybeSpawnLogger f afp
elog <- maybeSpawnLogger f efp
return (alog, elog))
(\(alog, elog) -> do
maybe (return ()) stopLogger alog
maybe (return ()) stopLogger elog)
(\(alog, elog) -> act ( liftM logMsg alog <|> maybeIoLog afp
, liftM logMsg elog <|> maybeIoLog efp))
The function spawns two loggers (using maybeSpawnLogger
) and stops them
(using stopLogger
) on exit. It is not important what loggers do, but you
can check the source code if you are interested.
Do you see any issue here? Probably not. Lets say the function has no obvious issues. (Actually I found one minor issue, but I spent half an hour reading code.) But it doesn't mean the function is correct.
I convinced myself that maybeSpawnLogger
and stopLogger
never throw
exception. (Well, stopLogger
is interruptible, so it actually can
sometimes.) But is there any reason we need to assume that? What if someone
change them when refactoring? Sometimes we need to rely on actions not to
throw exception, but not here. Lets fix it:
withLoggers afp efp act =
bracket (do mvar <- newMVar ()
let f s = withMVar mvar
(const $ S.hPutStr stderr s >> hFlush stderr)
bracketOnError
(maybeSpawnLogger f afp)
(maybe (return ()) stopLogger)
(\alog -> do
elog <- maybeSpawnLogger f efp
return (alog, elog))
(\(alog, elog) -> do
maybe (return ()) stopLogger alog
`finally`
maybe (return ()) stopLogger elog)
(\(alog, elog) -> act ( liftM logMsg alog <|> maybeIoLog afp
, liftM logMsg elog <|> maybeIoLog efp))
Here we did two fixes. First, we use bracketOnError
to ensure the first
logger will be stopped if we fail to spawn the second. Also finally
is used
to ensure the second logger will be stopped if the first one fails to stop.
Pretty complicated, but now it is at least correct.
Note that the code is correct w.r.t. both sync and async exception. Here we did nothing special to handle async exceptions.
However the function still relies on maybeSpawnLogger
and stopLogger
to
preserve the contract (see example #1).
The function is still far from ideal. In the acquire action it does
unnecessary work -- allocates MVar
. Is there any reason to do that inside
bracket
? Probably not. Lets simplify the code to make reasoning easer:
withLoggers afp efp act = do
mvar <- newMVar ()
let f s = withMVar mvar
(const $ S.hPutStr stderr s >> hFlush stderr)
bracket (do bracketOnError
(maybeSpawnLogger f afp)
(maybe (return ()) stopLogger)
(\alog -> do
elog <- maybeSpawnLogger f efp
return (alog, elog))
(\(alog, elog) -> do
maybe (return ()) stopLogger alog
`finally`
maybe (return ()) stopLogger elog)
(\(alog, elog) -> act ( liftM logMsg alog <|> maybeIoLog afp
, liftM logMsg elog <|> maybeIoLog efp))
The function is very complex and hard to reason still. But notice that there is no dependency between loggers here, so we can handle them separately. Lets use two brackets, one per logger:
withLoggers afp efp act = do
mvar <- newMVar ()
let f s = withMVar mvar
(const $ S.hPutStr stderr s >> hFlush stderr)
bracket (maybeSpawnLogger f afp)
(maybe (return ()) stopLogger) $ \alog -> do
bracket (maybeSpawnLogger f efp)
(maybe (return ()) stopLogger) $ \elog -> do
act ( liftM logMsg alog <|> maybeIoLog afp
, liftM logMsg elog <|> maybeIoLog efp))
Now the function consists of two nested handlers, that are very similar to example #1. It is obviously correct.
I'd like to draw your attention to the next. We reason about exception handling locally. Obviously all functions we use here affect exception safety of our code, but we use the contract (see example #1) to build a wall between our code and it's dependencies. That is the only way to handle exceptions. Sometimes the contract is more complex, and we have to provide safe environment for our dependencies, but that should be explicitly stated in documentation.
We'll discuss maybeSpawnLogger
and stopLogger
later, and you'll see that
the contract protects us from caller site too, allowing local reasoning.
Lets inspect maybeSpawnLogger
. It is a simple wrapper over other function:
maybeSpawnLogger f (ConfigFileLog fp) =
liftM Just $ newLoggerWithCustomErrorFunction f fp
maybeSpawnLogger _ _ = return Nothing
newLoggerWithCustomErrorFunction :: (ByteString -> IO ())
-> FilePath -- ^ log file to use
-> IO Logger
newLoggerWithCustomErrorFunction errAction fp = do
q <- newIORef mempty
dw <- newEmptyMVar
th <- newEmptyMVar
let lg = Logger q dw fp th errAction
mask_ $ do
tid <- forkIOLabeledWithUnmaskBs "snap-server: logging" $
loggingThread lg
putMVar th tid
return lg
In most cases it is wrong, so don't use it unless you exactly know what you
are doing. If newLoggerWithCustomErrorFunction
action is called without
async exceptions masked, then it will leak resource. It is too late to mask
async exceptions here. Probably it should be stated in documentation to the
function, but actually it is clear enough -- it should be used only inside
bracket
or similar context. There is nothing wrong with mask_
here, it is
just unnecessary. But it gives you false feeling of safety, and can be a sign
of other issues.
One can think that mask_
here is useful because protects at least part of
the code from async exceptions. But that is not true, async exceptions will be
postponed till end of mask_
, but then will be delivered anyway (unless
masked on caller site). The result is the same -- orphan thread.
Lets identify all points where something can go wrong. First of all, the
resource here is a thread, so we should make sure it doesn't become orphan on
failure. Everything above forkIOLabeledWithUnmaskBs
is irrelevant, because
the thread is not yet created here. The only point if failure is putMVar
.
To be carefull we should protect forkIOLabeledWithUnmaskBs
against failure in
putMVar
, but it is not necessary.
From documentation it is clear, that it will not throw async exception unless
the MVar
is full. Here it is empty for sure. (Note: it is documented not in
Control.Concurrent.MVar
, but in Control.Exception
.)
Unfortunately the documentation for putMVar
doesn't say anything about sync
exceptions. (And it can throw at least BlockedIndefinitelyOnMVar
, but not in
our case.) But it is very common to rely on it not to throw sync exception, so
nobody will break that assumption ever. We can think about it as a
documentation bug.
Conclusion: the function is safe.
-- | Kills a logger thread, causing any unwritten contents to be
-- flushed out to disk
stopLogger :: Logger -> IO ()
stopLogger lg = withMVar (_loggingThread lg) killThread
Lets remember the contract: function should release resource even in case of failure. Well, it is not always possible, but it should do it's best.
There is two IO
actions here, withMVar
and killThread
. Both are
potential points of failure.
Documentation for withMVar
states:
it is only atomic if there are no other producers for this MVar.
The wording probably can be better. But it will not throw if the MVar
is
full initially and nobody tries to put anything into it until exit from
withMVar
. In out case it is true -- the mvar is not used anywhere after
spawning the thread. But we need to inspect the code around to find that out,
so it would be better to document the design.
killThread
is problematic though. AFAIK it can't throw sync exceptions, but
it can throw async exception even when wrapped into uninterruptibleMask
(!!!). Also it is interruptible even when doesn't block. You should read it's
documentation carefully before using it.
In this particular case it is enough to wrap killThread
into
uninterruptibleMask
. Other option is to provide eventual guarantees, e.g.
spawn other thread to do killThread
asynchronously:
stopLogger lg = withMVar (_loggingThread lg) $ \threadId ->
killThread threadId `onException` void (forkIO $ killThread threadId)
Here the ThreadId
of the helper thread is unknown to anybody else, so we can
be sure nobody will send async exception to it.
runLoops = E.bracket (mapM newLoop [0 .. (nLoops - 1)])
(mapM_ killLoop)
(mapM_ waitLoop)
Do you see the issue here?
Never assume anything. What if one of newLoop
fails? Then all already
created loops will leak. What if one of killLoop
fails? Then all subsequent
loops will not be killed.
Note: It is possible that newLoop
and killLoop
never throw (not in our
case though), but the principle of local reasoning forces us to handle
exceptions here or at least write a comment. We probably should fold the killLoop
s
with finally
. Something like ResourceT
can simplify this case a lot.
Internally snap-server
does a lot of non-local manipulations with mask state.
It is hard to extract any meaningful example because of non-locality. I'll
simply provide a link
The restore
seems to be changing masking state, but it is not clear how it
affects the code. Probably there are good reasons for such design though.
Also eatException
is used a lot, and that is a bad sign.
bracket
is not the only source of errors, but the same approach can be
applied to most cases.
I didn't have a goal to uncover all issues with exception handling here. And I'm sure there are cases much harder to find and fix. But in most cases you'll avoid a lot of mistakes if you follow the basic principles:
-
Don't assume anything
-
Keep it simple
-
Divide and conquer
-
Reason locally
Thanks to all authors of snap-server
for excellent library, I really appreciate your work.