Skip to content

11 July 2094

Nefel Tellioglu: Lessons learned from pneumococcal vaccine modelling

In this meeting Nefel gave a presentation about a pneumococcal vaccine (PCV) evaluation project for government, sharing her experiences in developing a model from scratch under tight deadlines.

Attendance: 6 in person, 6 online.

Info

We welcome presentations about research projects and experiences that relate in any way to reproducibility and good computational research practices. Presentations can be short, informal, and free-form.

Please contact us if you have anything you might like to present!

Computational performance

Key message

Optimisation is a skill that takes time to learn.

This project involved constructing an agent-based model (ABM) of pneumococcal disease, incorporating various vaccination assumptions and intervention strategies. Nefel was familiar with an existing ABM framework written in Python, but found that the project requirements (a large population size and long simulation time-frames) meant that a different approach was required.

Asking for help in a new skill: model optimisation for each vaccine type and multi-strains

They ended up implementing a model from scratch, using the Polars data frame library to represent each individual as a separate row in a single population data frame. This library is designed for high performance, and Nefel was able to implement a model that ran quickly enough for the purposes of this project.

An introduction to Polars workshop?

Nefel asked whether other people would be interested in an "Introduction to Polars" workshop, and a number of participants indicated interest.

Workflows and deadlines

Key message

Using version control makes it much easier to fix your code when it breaks.

Nefel made frequent use of a git repository (hosted on GitHub) in the early stages of the project. She found it very useful during the model prototyping phase, when adding new features frequently broke the code in some way. Having immediate access to previous versions of the code made it much easier to revert changes and fix the code.

However, she stopped using it when the project reached a series of tight deadlines.

Asking for extensions

Key message

Being able to provide advance warning of potential delays, and to explain the reasons why they might occur, is extremely helpful for everyone. This allows project leaders and stakeholders to adjust their plans and expectations.

It's generally hard to estimate feasible timelines in advance. This is especially difficult when exploring a new problem, and when a range of future extensions are anticipated.

These kinds of conversations can feel extremely uncomfortable. Several participants reflected on their own experiences, and agreed that informing their supervisors about potential problems as early as possible was the best approach.

Things can take longer than expected due to the research nature of building a new infectious disease model. Where possible, avoid promising that a model will be completed by a certain time. Instead, give stakeholders regular updates about progress and challenges, so that they can appreciate how much effort that is being applied to the problem.

Gizem: stakeholders may not know what they want or need from the model. It's really helpful to clarify this early in the project, which needs a good working relationship.

Eamon: writing your code in a modular way can help make it easier to implement those future extensions. Experience also helps in designing your code so that future extensions only modify small parts of your model. But avoid trying to make your code as abstract and extensible as possible.

Rob: if you know that the model will be applied to many different scenarios in the future, try to separate the code that defines the location of data files from the code that uses those data. That makes it easier to run your model using different sets of input data.

Key message

There are a number of high-performance data frame libraries.

Polars primarily supports Python, Rust, and JavaScript. There is also an R package that has several extensions, including:

Other high-performance data frame options for R:

  • data.table: a high-performance data.frame replacement;

  • DBI: a package for working with various databases; and

  • dbplyr: a database backend for dplyr.

DuckDB is another high-performance library for working with databases and tabular data, and is available for many languages including R, Python, and Julia. It also integrates with Polars, allowing you to query Polars data frames and to save outputs as Polars data frames.

Conclusions

Key message

Once a project is completed, it's worth reflecting on what worked well, and on what you would do differently next time.

Nefel finished by reflecting on what she might do differently next time, and highlighting two key points:

  • Begin with a clearer understanding of the skills required for the project, such as modelling large populations and code optimisation.

  • Where there are potential skill gaps, involve other people in the project who can contribute relevant expertise.

Next meeting

At our next meeting — currently scheduled for Thursday 8 August — we will work on finalising our Orientation Guide checklist, collect supporting materials for each item on the checklist, and begin drafting content where no suitable supporting materials can be found.