15 August 2023¶

Info

See the Resources section for links to useful resources that were mentioned in this meeting.

Changes to practices¶

In this meeting we asked everyone what changes (if any) they have made to their research and reproducibility practices since our last meeting.

A common theme was improving how we note and record our past actions. For example:

Eamon has begun recording the commit ID ("hash") of the code that was used to generate each set of outputs. This allows him to easily retrieve the exact version of the code that was used to generate any past result and, e.g., generate other outputs of interest.
Pan talked about how their group records raw separately from, but grouped with, the analysis code and processed data that were generated from these raw data. They also record every step of their model-fitting process, which may not always go as smoothly as expected.

This ensures that stakeholders who want to use these models to run their own scenarios can reproduce the baseline scenarios without being modelling experts themselves.

The model is available as an online app.

Rob has begun working on an existing malaria model, which was implemented in R as a series of scripts that shared many global variables. He wanted to restructure code to better understand it, so he used version control to record the simulation outputs and ensure that he didn't change the model's behaviour as he restructured the code. On several occasions he modified parts of the code and discovered that these changes unexpectedly affected the simulation outputs. This is a manual equivalent of using continuous integration.

How do you structure a project?¶

Gizem asked the group "How do you choose an appropriate project structure, especially if the project changes over time?"

Phrutsamon: the TIER Protocol 4.0 provides a template for organising the contents and reproduction documentation for projects that involve working with statistical data.

Rob: there may not be a single perfect solution that addresses everyone's needs. But look back at past projects, and try to imagine how the current project might change in the future. And if you're using version control, don't be afraid to experiment with different project structures — you can always revert back to an earlier commit.

Reviewing code as part of (manuscript) peer review¶

Rob asked the group "Has anyone reviewed supporting code when reviewing a manuscript?"

Ruarai read through R code that was provided with a paper, but was unable to run all of it — some of the code produced errors.
Similarly, Rob has read R code provided with a paper that used hard-coded paths that did not exist (e.g., "C:\Users\<Author Name>\..."), tried to run code in source files that did not exist, and read data from CSV files that did not exist.

Info

Pan mentioned a fantastic exercise for research students.

Pick a modelling paper that is relevant to their research project, and ask the student to:

read it;
understand it; and
reproduce the figures.

This teaches the students that reproducibility is very important, and shows them what they need to do when they publish their own results.

It's important to pick a relatively simple paper, so that this task isn't too complicated for the student. And if the paper is written by a colleague or collaborator, you can contact them to ask for extra details, etc.

Using Shiny to make models available/reproducible¶

Pan asked the group "What do you think about (the extra work involved in) turning R code into Shiny applications, to show that the model is reproducible, and do so in a way that lets others easily make use it?"

An objective of the COVID-19 International Modelling Consortium (CoMo) is to make models available and usable for non-modellers — turning models into something that anyone with minimal knowledge can explore.

The model is available as a Shiny app, and is continually being updated and refined. It is currently at version 19! Pan's group is trying to ensure that existing users update to the most recent version, because it can be very challenging and time-consuming to create scenario templates for older model versions. Templates are a good way to help the user define their scenario-specific settings, but it's a nightmare when you change the model version — it's like working with a new model.

Eamon: this is similar to when software platforms make changes to their APIs. Can you make backwards-compatible changes, or automatically transform old templates to make them compatible with the latest model version? This kind of work is simple to fund when your software is a commercial product, but it's much harder to find funding for academic outputs.
Pan: It's a lot of extra work, without any money to support it. For this consortium we hired several programmers, some for the coding, some specifically for the Shiny app, it involved a lot of resources. That project has now ended, but we've learned a lot and have a good network of collaborators. We still have monthly meetings! This was a special case with COVID-19, because the context changed so quickly. It would be much less of a problem with other diseases, which we better understood.
Gizem: very much in favour of using Shiny to make models available, and recently made a Shiny app for their latest project (currently under review). Because the model is very complicated, we had to pre-calculate model results for specific parameter combinations, and only allow users to choose between these parameter combinations. One reviewer asked for a modified figure to show results for slightly different parameter values, and it was quite simple to address.

Hadley Wickham has written a very good book about developing R Shiny applications. Gizem read a chapter of this book each morning, but found it necessary to practice in order to really understand how to use Shiny.

Info

Learning by doing (experiential learning) is a highly-effective way of convincing people to change their practices. It can be greatly enhanced by engaging as a community.

Resources¶

Teaching reproducibility and responsible workflows¶

The Journal of Statistics and Data Science Education published a special issue: Teaching Reproducibility in November 2022. The accompanying editorial article highlights:

Integrating reproducibility into our practice and our teaching can seem intimidating initially. One way forward is to start small. Make one small change to add an element of exposing students to reproducibility in one class, then make another the next semester. Our students can get much of the benefit of reproducible and responsible workflows even if we just make a few small changes in our teaching. These efforts will help them to make more trustworthy insights from data. If it leads, by way of some virtuous cycle, to us improving our own practice, then even better! Improving our teaching through providing curricular guidance about reproducible science will take time and effort that should pay off in the long term.

This journal issue was followed by an invited paper session with the following presentations:

Collaborative writing workflows: building blocks towards reproducibility
Opinionated practices for teaching reproducibility: motivation, guided instruction, and practice
From teaching to practice: Insights from the Toronto Reproducibility Conferences
Teaching reproducibility and responsible workflow: an editor's perspective

Project templates¶

The TIER Protocol 4.0 provides a template for organising the contents and reproduction documentation for projects that involve working with statistical data:

Documentation that meets the specifications of the TIER Protocol contains all the data, scripts, and supporting information necessary to enable you, your instructor, or an interested third party to reproduce all the computations necessary to generate the results you present in the report you write about your project.

Using Shiny¶

Mastering Shiny: an online book that teaches how to create web applications with R and Shiny.
CoMo Consortium App: the COVID-19 International Modelling Consortium (CoMo) has developed a web application for an age-structured, compartmental SEIRS model.

Continuous integration examples for R¶

Building reproducible analytical pipelines with R: this article shows how to use GitHub Actions to run R code when you push new commits to a GitHub repository.
GitHub Actions for the R language: this repository provides a variety of GitHub actions for R projects, such as installing specific versions of R and R packages.

Continuous integration examples for Python¶

GitHub Actions for Python: the GitHub Actions documentation includes examples of building and testing Python projects.

Other continuous integration examples¶

See the GitHub actions for Git is my lab book, available here. For example, the build action performs the following actions:

Check out the repository, using actions/checkout;
Install mdBook and other required tools, using make.
Build a HTML version of the book, using mdBook.