Polysemy - Part III - Tests
This is part of a series on effect handling in Haskell using Polysemy
One of the benefits of writing pure code is that it’s so easy to test. You provide input, you get output, you assert on the output, that’s it. But “real world applications” have many functions with effects all over the place. And we also need to test those to ensure quality.
The problem is, how do you test effectful code? As the name indicates, naive tests would have various effects, rendering them “hard” to both write and run.
A solution particularly favored in languages with no clear effect boundary is to use ephemeral “containerized” environments, like Docker containers, to run their PostgreSQL databases, Kafka clusters, etc., during tests. These ephemeral containers lower the pain of testing effectful code, but with limited benefits: they remain slow and they are rather complex to write/maintain.
Another solution, when effectful code is well separated from business logic, is to mock the effects, i.e. to “replace” them with fake logic that suits each test exactly. These mocks usually have a greater LOC cost per test compared to containerized environments, but are extremely fast, and easier to maintain over time.
And guess what? Effect frameworks like Polysemy make it pretty simple to mock effects in tests.
Mocking effects
All we need to do to mock an effect is to change the interpreter layer! The effect declaration and the effect use in business code remain unchanged.
I will reuse the example from my previous post:
import Polysemy
data Log m a where
  LogInfo :: String -> Log m ()
makeSem ''Log
myBusinessFunction :: Member Log r => Integer -> Integer -> Sem r Integer
myBusinessFunction m n = do
  logInfo $ "myBusinessFunction was called with parameters " <> show m <> 
            " and " <> show n
  let result = m + n
  logInfo $ "myBusinessFunction result is " <> show result
  pure result
logToIO :: Member (Embed IO) r => Sem (Log ': r) a -> Sem r a
logToIO = interpret (\(LogInfo stringToLog) -> embed $ putStrLn stringToLog)Step 1: No mocking
Let’s first write a test where the logging still happens in IO (logging to stdout):
import Polysemy
import Test.HUnit
test_1and2is3 = TestCase $ do
  result <- runM . logToIO $ myBusinessFunction 1 2
  result @?= 3The test passes but stdout was polluted with the logs:
Cases: 1  Tried: 0  Errors: 0  Failures: 0myBusinessFunction was called with parameters 1 and 2
myBusinessFunction result is 3
Cases: 1  Tried: 1  Errors: 0  Failures: 0Imagine when you have hundreds of tests running, your terminal (or your CI logs) will quickly get cluttered. Even worse: if our logging interpreter logged in a file, each test run would create and write into such a log file! This would also not scale well with other effects, like database calls.
We can do better.
Step 2: Disable logs
Our goal is to write tests for myBusinessFunction without actually logging to stdout.
Let’s write another interpreter logToSilence for the Log effect, except this interpreter will simply ignore the log and do nothing:
logToSilence :: Sem (Log ': r) a -> Sem r a
logToSilence = interpret (\(LogInfo _) -> pure ())
test_1and2is3 = TestCase $ do
  result <- runM . logToSilence $ myBusinessFunction 1 2
  result @?= 3Now the logs are clean, the effect was nicely interpreted:
Cases: 1  Tried: 1  Errors: 0  Failures: 0Note, we previously had to run in IO because logToIO required it. Our silencing interpreter is pure, though, so we can replace Polysemy’s runM with run and work with pure code:
test_1and2is3 = TestCase $
  let result = run . logToSilence $ myBusinessFunction 1 2
  in  result @?= 3Step 3: Test effects too
Rather than silencing those logs, maybe logging is part of our requirements. In such case, we should actually check that the function logs correctly!
Let’s replace our silencing interpreter with another one, that records all logs, so that we can check exactly what was logged. We will rely on another pre-existing Polysemy effect, namely Polysemy.Writer, which is the Polysemy equivalent of Writer or WriterT.
Long story short, a Writer a effect allows you to write values of type a which will be glued together using mappend (thus the Monoid constraint). Note you can’t read those values as long as you are in code under this effect. Then when interpreting this effect, the result will be a pair of the resulting written a and the value returned by the effectful code. For our needs, a ~ [String], i.e. we record each log and we will get the list of all logged lines when interpreting this Writer [String] effect:
import Polysemy
import Polysemy.Writer
import Test.HUnit
logToRecord :: Member (Writer [String]) r => Sem (Log ': r) a -> Sem r a
logToRecord = interpret (\(LogInfo stringToLog) -> tell [stringToLog])
test_1and2is3 = TestCase $
  let (logs, result) = run . runWriter . logToRecord $ myBusinessFunction 1 2
  in do
      result @?= 3
      logs @?= [ "myBusinessFunction was called with parameters 1 and 2"
               , "myBusinessFunction result is 3" ]So what’s going on here?
- logToRecordinterprets the- Logeffect in terms of- Writer [String], i.e. we record all the logged lines as a list of strings (using- tellfrom- Polysemy.Writerto add logs)
- We run this Writereffect usingrunWriter(this will aggregate all recorded logs thanks to theMonoidconstraint, using list appending)
- We can now assert both on the business result 3and on the logged lines
That’s it! We have successfully removed the IO effect, our tests are pure, yet we can fully assert on both the business results and the effects!
Note: Technically we should use runWriterAssocR instead of runWriter since the monoid is a list, for performance reasons, but this is beyond the scope of this post.
It is exactly the same for property based tests (PBT). Say we want to (quick)check our business function is associative (on the result) and check that the last log will be the same (others will not because of order of application):
import Polysemy
import Polysemy.Writer
import Test.QuickCheck
logToRecord :: Member (Writer [String]) r => Sem (Log ': r) a -> Sem r a
logToRecord = interpret (\(LogInfo stringToLog) -> tell [stringToLog])
test_associative = \a b c ->
  let
    (logsAB_then_C, resultAB_then_C) = run . runWriter . logToRecord $ do
      resultAB <- myBusinessFunction a b
      myBusinessFunction resultAB c
    (logsA_then_BC, resultA_then_BC) = run . runWriter . logToRecord $ do
      resultBC <- myBusinessFunction b c
      myBusinessFunction a resultBC
   in
    last logsAB_then_C == last logsA_then_BC && resultAB_then_C == resultA_then_BCYou can find the full code example on my Github repo.
This first example showed how to change the interpreter to mock/record the effect behavior, but a major part of mocking effects is to return a dummy value instead of executing the effect to retrieve the value (e.g. database access or environment variable).
I think it’s interesting to showcase another example where the effect action has a return value other than ().
Intermediary example: environment variables
Let’s consider the use case of environment variable access.
Many applications need to read some global configuration, often passed by Kubernetes/Rancher through environment variables or secret files. A database URL, the logging level, a port number, an API key, you name it.
The effect declaration
import Polysemy
data Configuration m a where
  ReadConf :: String -> Configuration m (Maybe String)
makeSem ''ConfigurationNo surprise here. When you readConf, you have to pass the parameter name to read, and you get back a Maybe String (Nothing if the parameter is not configured).
The effect use in business code
import Data.Maybe
import Polysemy
myBusinessFunction :: Member Configuration r => Int -> Sem r (Either String Int)
myBusinessFunction amount = do
  maybeMinimumAmount <- fmap (fmap read) (readConf "MINIMUM_AMOUNT")
  let minimumAmount = fromMaybe 500 maybeMinimumAmount
  pure $ if amount >= minimumAmount
           then Right amount
           else Left $ show amount ++ " is lower than the minimum allowed amount " ++ show minimumAmountThis function reads the MINIMUM_AMOUNT configuration setting, or uses 500 as default value, then checks that the passed value is greater than or equal to the minimum amount.
Note: the double fmap may look weird, this is because we want to convert the resulting String to an Int but there are 2 layers to map over: IO and Maybe.
Again, the business code is not concerned with how the configuration is retrieved. Is it from an environment variable? A file? A cache? A hardcoded value? Or a combination of those? This decision is up to the interpreter!
The interpreters
This is an example of interpreter that reads from environment variables:
import System.Environment
import Polysemy
confToIO :: Member (Embed IO) r => Sem (Configuration ': r) a -> Sem r a
confToIO = interpret (\(ReadConf envVarName) -> embed $ lookupEnv envVarName)Now let’s write a mock interpreter for our tests!
As stated in introduction, mocking means that each test gets to decide the behavior of effects. In this particular case, it means the decision of how to transform the configuration name (e.g. MINIMUM_AMOUNT) to a value (of type Maybe String) is up to each test, not to the interpreter.
Said differently, the interpreter should take as argument how to do this transformation.
In a functional language, it means: the interpreter should take as argument the function String -> Maybe String, and each test should pass such a function (the mock behavior).
confToMock :: (String -> Maybe String) -> Sem (Configuration ': r) a -> Sem r a
confToMock mockLookupEnv = interpret (\(ReadConf envVarName) -> pure $ mockLookupEnv envVarName)And now a couple of unit tests showing how to use it:
import Test.HUnit
test_defaultMinimumAmount_lower = TestCase $
  let 
    mockLookupEnv _ = Nothing
    result = run . confToMock mockLookupEnv $ myBusinessFunction 400
  in 
    result @?= Left "400 is lower than the minimum allowed amount 500"
test_minimumAmount_greater = TestCase $
  let 
    mockLookupEnv "MINIMUM_AMOUNT" = Just "250"
    mockLookupEnv _                = Nothing
    result = run . confToMock mockLookupEnv $ myBusinessFunction 400
  in 
    result @?= Right 400As you can see, each test provides the mocking function mockLookupEnv which is then injected in the interpreter.
You can find the full code example on my Github repo.
Conclusion
As shown in this post, testing (by mocking) is nice and inexpensive in Haskell with an effect framework like Polysemy.
The separation between effect declaration, effect use and effect interpretation is a big plus: tests only need to change the interpreter to mock effects, all other things remaining equal.
We have been using this technique for more than 6 months in my team, and we truly enjoy it. Most of us have experience with mocking techniques and frameworks in other languages (e.g. Java, JavaScript) but testing in Haskell with Polysemy is far more enjoyable, and by a long shot.
Remember: all the things shown here are loosely coupled to the testing libraries (HUnit and QuickCheck) and the effect library (Polysemy). You can achieve similar benefits with any testing or effect library that relies on the same separation of effect declaration, effect use and effect interpretation.
Enjoy testing!