Skip to content

19 February 2024

In this meeting we asked participants to suggest goals and activities to achieve in 2024.

Note

If you were unable to attend the meeting, you can send us suggestions via email.

We have identified the following goals for 2024:

  • Develop orientation materials for new students and staff;

  • Share examples of project repositories and model implementations;

  • Build expertise in testing your own code;

  • Use peer code review to share knowledge and develop coding skills;

See the sections below for further details.

Orientation materials

The first suggestion was to develop orientation materials for new students, postdocs, people coming from wet-lab backgrounds, etc. Suggested topics included:

  • How to set up common tools on your laptop (e.g., Python and R);
  • How to organise your files;
  • How to write Markdown documents;
  • How to format and lint your code;
  • How to set up git on your own device and using platforms such as GitHub;
  • How to recover old versions of files;
  • How to make your code device-agnostic, so it can run on HPC platforms, virtual machines, and can easily be migrated to new devices; and
  • How to plan for reproducibility from the beginning, rather than waiting until you're preparing a publication.

Info

Some of these topics are already covered in Git is my lab book; see the links above.

There was broad interest in having a checklist, and example workflows for people to follow — particularly for projects that involve some form of code "hand over", to ensure that the recipients experience few problems in running the code themselves.

We aim to address these topics in Git is my lab book, with assistance and feedback from the community. See the How to contribute page for details.

Example projects and model templates

Building on the idea of orientation materials, a number of participants suggested providing example projects and different types of models.

The most commonly used languages in our community are:

  • R
  • Python
  • Matlab
  • Julia
  • C++

As an example, we could demonstrate how to write an age-stratified SEIR ODE model in R and Python, and how to write agent-based models in vector and object-oriented forms.

Info

GitHub allows you to create template repositories, which might be a useful way of sharing such examples. We could host these template repositories in our Community of Practice GitHub organisation.

How (and why!) to begin testing your code

We asked everyone whether they'd ever found bugs in their code, and were relieved to see that yes, all of us have made mistakes! Writing test cases in one way to check that your code is behaving in the way that you expect.

  • But it can be hard to figure out how to actually write useful tests.

  • You can make your code easier to test if you structure your code well, and consider how to test it when you start coding.

As an example, Cam mentioned that he had written a stochastic model of a hospital ward, in which there was a queue of tasks. At the end of a shift, some tasks may not have been done, and these are put back on the queue for the next shift. Cam discovered there was a bug in the way this was done, and fixed it. However, later on he reintroduced the same bug. This is precisely the situation where regression tests are useful. In brief:

  • When you discover a bug in your code, write a test that detects this bug before you fix it.

  • You now have a test that will identify this bug if you ever make the same mistake again.

  • But you need to run this test whenever you modify your code.

Continuous Integration (CI) is one way to run tests automatically, whenever you push a commit to a platform such as GitHub or GitLab. See the list of resources shared in our previous meeting for some examples of using CI.

In our community we have a number of people with familiarity and expertise in testing infectious disease models and related code. We need to share this knowledge and help others in the community to learn how to test their code.

Peer code review

We talked about how we improve our writing skills by receiving feedback on drafts from colleagues, supervisors, etc, and how a similar approach can be extremely useful for improving our coding skills.

Info

A goal for this year is to review each other's code! Note that we have developed some guidelines for peer code review.

Suggestions included:

  • Before submitting a paper to a journal, ask someone else to run your code.

  • Rob has been working on a within-host malaria model, which is written in R and uses Continuous Integration to generate R Markdown documents every time the code is updated. He is happy to share this for code review, so that:

  • Everyone can see a working example of continuous integration; and

  • He can receive feedback on the code, since he is not an experienced R programmer.

  • A number of participants expressed a willingness to participate in peer code review.

  • Angus noted that it can be difficult to identify a discrete chunk of digestible code to offer up for peer review.

We can coordinate peer code review through our Community of Practice GitHub organisation.

Sharing code but not the original data

Samik mentioned that in a recent paper, The impact of Covid-19 vaccination in Aotearoa New Zealand: A modelling study, the code is provided in a public GitHub repository, but that they do not have permission to share the original health data.

Info

We can frequently encounter this issue when working with public health and clinical data sets.

What are the best practices in this case?

  • One option is to generate dummy data by, e.g., resampling or adding noise to the original data. You could then inform the reader that they should obtain similar, but not necessarily identical results.

  • You can also use a checksum algorithm (such as SHA-2) and include the checksum for each original data file in the public repository. This would allow users who can obtain access to the original data to verify that they are using identical data.