Test-driven development

Radovan Bast

Licensed under CC BY 4.0. Code examples: OSI-approved MIT license.

Tests are no guarantee

https://twitter.com/dave1010/status/613601365529657344

Motivation 1: code quality

More robust code
As projects grow, it becomes easier to break things without noticing immediately
In particular for non-modular entangled projects (effects at distance)
Testing helps detecting errors early: programming without constantly testing your changes will introduce errors and it is extremely time consuming to detect and fix them
Interpreted dynamically typed imperative languages need to be tested
So do compiled languages but the compiler catches at least some errors
We publish scientific papers based on scientific code

Motivation 2: distributing code

Users and computing center personnel may not be able to identify incorrect installation without automated testing
Users of your code publish papers based on results produced by your code

Motivation 3: developers

Satisfying instant visual feedback
Avoid coding constipation
Avoid "gold-plating" the code

Motivation 4: external contributors

Easier for external developers to contribute to the project
Tests can be good documentation; up to date by definition

Motivation 5: managing complexity

Help to modularize/encapsulate
Unit tests sharpen and document interfaces
Enable reuse and migration
Faster coding: allows to make big changes quickly; refactoring
You start with design; often leads to better design
Well structured code is easy to test

Imperfect tests run frequently are better than perfect tests which are never written

Important

Test frequently (each commit)
Test automatically (Travis CI)
Test with numerical tolerance
Think about code coverage (https://coveralls.io)
Write test, red, green, refactor

Unit tests

Test one unit: module or even single function
Sharpen interfaces
Speed up testing
Good documentation of the capability and dependencies of a module
In fact it is a documentation that is up to date by definition

Test-driven development

Write tests before writing code
Very often we know the result that a function is supposed to produce
Development cycle:
- Write the test
- Write an empty function template
- Test that the test fails
- Program until the test passes
- Perhaps refactor
- Move on

Functionality regression tests

Like in a car assembly we have to test all components and also whether the components work together
Typically tests entire code for a set of input examples and compares them with reference outputs
Hopefully with numerical tolerance
Typically we need both unit and functionality regression tests
Both are no guarantee that the code is correct "always"

Performance regression tests

Real example
- In one code module somebody forgot and committed debug code
- This did not break functionality
- It slowed down that part of the code by factor 2
- Nobody noticed for one year because we had no automatic detection of performance degradation
Good idea
- Create benchmark calculations to cover various performance-critical modules
- Run these benchmarks regularly (weekly?)
- Monitor the timing
- If timing changes significantly, alarm bells should be ringing

Code coverage

If I break the code and all tests pass who is to blame?
A code that is not tested will break sooner or later
Code coverage measures and documents which lines of code have been traversed during a test run
It is possible to have line-by-line coverage (example later)
In all the files that have zero coverage you can multiply all numbers with 10E5 and all tests will happily pass

Total time to test matters

You should require developers to run the test set before each git push
Professional projects require running the test set before each commit
In extreme cases after each save
With a pre-commit hook you can make sure that commits that break tests are rejected
Total time to test matters
If the test set takes 7 hours to run, it is likely that nobody will run it
Identify fast essential test set that has sufficient coverage and is sufficiently short to be run before each commit or push

Continuous integration

Test each commit on every branch
Makes it possible for the mainline maintainer to see whether a modification breaks functionality before accepting the merge

Good practices (1/2)

Test before committing (use the Git staging area)
Fix broken tests immediately (dirty dishes effect)
Do not deactivate tests "temporarily"
Think about coverage (physics and lines of code)
Go public with your testing dashboard and issues/tickets
Dogfooding: use the code/scripts that you ship

Good practices (2/2)

Think about input sanitization (green testing dashboard does not prevent me from making unexpected keyword combinations)
Test controlled errors: if something is expected to fail, test for that
Make testing easy to run (make test)
Make testing easy to analyze
- Do not flood screen with pages of output in case everything runs OK
- Test with numerical tolerance (extremely annoying to compare digits by eye)

Unit test frameworks

Python
- py.test
- nose
- doctest
- unittest
C(++)
- Google Test
- CppUnit
- Boost.Test
- UnitTest++
- https://github.com/philsquared/Catch
Fortran
- pFUnit
- Fruit

py.test

Very easy to set up

$ pip install pytest  # ideally into a virtualenv

Very easy to use

def get_word_lengths(s):
    """
    Returns a list of integers representing
    the word lengths in string s.
    """
    return [len(w) for w in s.split()]

def test_get_word_lengths():
    text = "Three tomatoes are walking down the street"
    assert get_word_lengths(text) == [5, 8, 3, 7, 4, 3, 6]

Example output: https://travis-ci.org/bast/pytest-demo/builds/104182942
Example project: https://github.com/bast/pytest-demo

Google Test

Widely used
Very rich in functionality
Source: https://github.com/google/googletest

#include "gtest/gtest.h"
#include "example.h"

TEST(example, add)
{
    double res;
    res = add_numbers(1.0, 2.0);
    ASSERT_NEAR(res, 3.0, 1.0e-11);
}

Example output: https://travis-ci.org/bast/gtest-demo/builds/104190982
Example project: https://github.com/bast/gtest-demo

pFUnit

Very rich in functionality
Source: http://pfunit.sourceforge.net
Requires modern Fortran compilers (uses F2003 standard)

@test
subroutine test_add_numbers()

   use hello
   use pfunit_mod

   implicit none

   real(8) :: res

   call add_numbers(1.0d0, 2.0d0, res)
   @assertEqual(res, 3.0d0)

end subroutine

Example output: https://travis-ci.org/bast/pfunit-demo/builds/104193675
Example project: https://github.com/bast/pfunit-demo

CTest and CDash

CTest runs a set of tests through bash/perl/python scripts and reports to CDash which aggregates results
The test scripts can be anything you like

add_test(my_test_name ${PROJECT_BINARY_DIR}/my_test_script --flags)

include(CTest)
enable_testing()

CTest looks for the return codes: 0/non-0 (success/failure)
CTest is a powerful test runner, no need to implement one
Reporting to CDash is easy

$ make Nightly

CDash example: https://testboard.org

Fun: Jenkins Chuck Norris plugin

https://wiki.jenkins-ci.org/display/JENKINS/ChuckNorris+Plugin

Fun: Jenkins "Extreme Feedback" Contraption

https://github.com/codedance/Retaliation

"Retaliation is a Jenkins CI build monitor that automatically coordinates a foam missile counter-attack against the developer who breaks the build. It does this by playing a pre-programmed control sequence to a USB Foam Missile Launcher to target the offending code monkey."

Travis and Coveralls

GitHub plus Travis plus Coveralls is a killer combination - use it!
Free for public repositories

Travis CI

https://travis-ci.org

Coveralls: code coverage history and stats

https://coveralls.io

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tdd.mkd

tdd.mkd

Test-driven development

Radovan Bast

Tests are no guarantee

Motivation 1: code quality

Motivation 2: distributing code

Motivation 3: developers

Motivation 4: external contributors

Motivation 5: managing complexity

Imperfect tests run frequently are better than perfect tests which are never written

Important

Unit tests

Test-driven development

Functionality regression tests

Performance regression tests

Code coverage

Total time to test matters

Continuous integration

Good practices (1/2)

Good practices (2/2)

Unit test frameworks

py.test

Google Test

pFUnit

CTest and CDash

Fun: Jenkins Chuck Norris plugin

Fun: Jenkins "Extreme Feedback" Contraption

Travis and Coveralls

Travis CI

Coveralls: code coverage history and stats

Files

tdd.mkd

Latest commit

History

tdd.mkd

File metadata and controls

Test-driven development

Radovan Bast

Tests are no guarantee

Motivation 1: code quality

Motivation 2: distributing code

Motivation 3: developers

Motivation 4: external contributors

Motivation 5: managing complexity

Imperfect tests run frequently are better than perfect tests which are never written

Important

Unit tests

Test-driven development

Functionality regression tests

Performance regression tests

Code coverage

Total time to test matters

Continuous integration

Good practices (1/2)

Good practices (2/2)

Unit test frameworks

py.test

Google Test

pFUnit

CTest and CDash

Fun: Jenkins Chuck Norris plugin

Fun: Jenkins "Extreme Feedback" Contraption

Travis and Coveralls

Travis CI

Coveralls: code coverage history and stats