13 June 2023¶

In this meeting we asked participants to share their experiences exploring the version control, reproducibility, and testing exercises in our example repository.

This repository serves an introduction to testing models and ensuring that their outputs are reproducible. It contains a simple stochastic model that draws samples from a normal distribution, and some tests that check whether the model outputs are consistent with our expectations.

What is a reproducible environment?¶

The exercise description was deliberately very open, but it may have been too vague:

Define a reproducible environment in which the model can run.

We avoided listing possible details for people to consider, such as software and package versions. Perhaps a better approach would have been to ask:

If this code was provided as supporting materials for a paper, what other information would you need in order to run it and be confident of obtaining the same results as the original authors?

The purpose of a reproducible environment is to define all of these details, so that you never have to say to someone "well, it runs fine on my machine".

Reproducibility and stochasticity¶

Many participants observed that the model was not reproducible unless we used a random number generator (RNG) with a known seed, which would ensure that the model produces the same output each time we run it.

But what if you're using a package or library that internally uses their own RNG and/or seed? This may not be something you can fix, but you should be able to detect it by running the model multiple times with the same seed, and checking whether you get identical result each time.

Another important question was raised: do you, or should you, include the RNG seed in your published code? This is probably a good idea, and suggested solutions included setting the seed at the very start of your code (so that it's immediately visible) or including it as a required model parameter.

Writing test cases¶

Tip

Write a test case every time you find a bug: ensure that the test case finds the bug, then fix the bug, then ensure that the test case passes.

A test case is a piece of code that checks that something behaves as expected. This can be as simple as checking that a mathematical function returns an expected value, to running many model simulations and verifying that a summary statistic falls within an expected range.

Rather than trying to write a single test that checks many different properties of a piece of code, it can be much simpler and quicker to write many separate tests that each check a single property. This can provide more detailed feedback when one or more test cases fail.

Note

This approach is similar to how we rely on multiple public health interventions to protect against disease outbreaks! Consider each test case as a slice of Swiss cheese — many imperfect tests can provide a high degree of confidence in our code.

Writing test cases for conditions that may fail¶

If you are testing a stochastic model, you may find certain test cases are difficult to write.

For example, consider a stochastic SIR model where you want to test that an intervention reduces the number of cases in an outbreak. You may, however, observe that in a small proportion of simulations the intervention has no effect (or it may even increase the number of cases).

One approach is to run many pairs of simulations and only check that the intervention reduced the number of cases at least X% of the time. You need to decide how many simulations to run, and what is an appropriate value for X%, but that's okay! Remember the Swiss cheese analogy, mentioned above.

Testing frameworks¶

If you have more than 2 or 3 test cases, it's a good idea to use a testing framework to automatically find your test cases, run each test, record whether it passed or failed, and report the results. These frameworks are usually specific to a single programming language.

Some commonly-used frameworks include:

GitHub Actions¶

Multiple participants reported some difficulties in setting up GitHub actions and knowing how to adapt available templates to their needs. See the following examples:

Python starter workflow; and
GitHub Actions for R.

We will aim to provide a GitHub action workflow for each model, and add comments to explain how to adapt these templates.

Warning

One downside of using GitHub Actions is the limited computation time of 2,000 minutes per month. This may not be suitable for large agent-based models and other long-running tasks.

Pull requests¶

At the time of writing, three participants have contributed pull requests:

TK added a default seed so that the model outputs are reproducible.
Micheal added a MATLAB version of the model and the test cases.
Cam added several features, such as recording metadata about the Python environment and testing that the model outputs are reproducible.

Tip

If you make your own copy ("fork") of the example repository, you can create as many commits as you want. GitHub will display a message that says:

This branch is N commits ahead of rrcop:master.

Click on the "N commits ahead" link to see a summary of your new commits. You can then click the big green button "Create pull request".

This will not modify the example repository. Instead, it will create an overview of the changes between your code and the example repository. We can then review these changes, make suggestions, you can add new commits, etc, before deciding whether to add these changes to the example repository.