17 April 2023¶
This is our initial meeting. The goal is to welcome people to the community and outline how we envision running these Community of Practice meetings.
Theme: Reproducible Research¶
Outline the theme and scope for this community.
This is open to all researchers who share an interest in reproducible research and/or related topics and practices; no prior knowledge is required.
For example, consider these questions:
-
Can you reproduce your current results on a new computer?
-
Can someone else reproduce your current results?
-
Can someone else reproduce your current results without your help?
-
Can you reproduce your own results from, say, 2 years ago?
-
Can someone else reproduce your own results from, say, 2 years ago?
-
Can you fix a mistake and update your own results from, say, 2 years ago?
Tip
The biggest challenge can often be remembering what you did and how you did it.
Making small changes to your practices can greatly improve reproducibilty!
How will these meetings run?¶
-
Aim to hold these meetings on a (roughly) monthly basis.
-
Prior to each meeting, we will invite community members to propose a topic or discussion point to be the focus of the meeting. This may be an open question or challenge, an example of good research practices, a useful software tool, etc.
-
Schedule each meeting to best suit the availability of community members who are particularly interested in the chosen topic.
-
Each meeting should be hosted by one or more community members, with online participation available to those who cannot attend in person.
-
At the end of each meeting, we will ask attendees how useful/effective they found the meeting, so that we can better cater to the needs of the community. For example:
-
What do you think of the session?
- What did we do well?
-
What could we do better in the next session?
-
We will summarise the key observations, findings, and outputs of each meeting in our online materials, and use them to improve and grow our training materials.
Preferred communication channels?¶
Info
To function effectively as a community, we need to support asynchronous discussions in addition to scheduled meetings.
One option is a dedicated mailing list. Other options were suggested:
-
A Slack workspace (Dave);
-
A Discord channel (TK);
-
A Teams channel (Gerry); and
-
A private GitHub repository, using the issue tracker (Alex).
Using a GitHub issue tracker might also serve as a gentle introduction to GitHub?
Supporting activities and resources?¶
Are there other activities that we could organise to help support the community?
-
We have online training materials, Git is my lab book, which should be useful for people who are not familiar with version control.
-
We also have a SPECTRUM/SPARK peer review team, where people can make their code available for peer review.
Topics for future meetings?¶
We asked each participant to suggest topics that they would like to see covered in future meetings and/or activities. A number of common themes emerged.
Version control: from theory to practice¶
A number of people mentioned now being sure how to get started, or starting with good intentions but ending up with a mess.
-
Dave: how can I transition from principle to practice?
-
Ollie: similar to David, I often start well but end up with a mess.
-
Ruarai: what other have found useful and applied in this space, what options are out there?
-
Michael: I'm a complete novice, git command lines are a foreign language to me! I'm looking for tips for someone who uses code a lot, experienced at coding but much less so on version control and the use of repositories. What are the first steps to incorporate it into my workflow?
-
Angus: I'm also relatively new to Git and have been using GitHub Desktop (a GUI for Windows and Mac). I'm not averse to command line stuff but I need to remember fewer arcane commands!
-
Samik: I use TortoiseGit — a Windows Shell Interface to Git.
-
Gray: I resonate with Michael, I do most of my research on my own and describe it in papers. It isn't particularly Git-friendly, I'm keen to learn.
-
Lauren: everything that everyone has said so far! I've found some good guidelines for how to write reproducible code, starting from the basics all the way to niche topics. Can we use this as a way to share materials that we've sound useful? The British Ecological Society have published guidelines. We could assemble good materials that start from basics.
-
David: The Society for Open, Reliable, and Transparent Ecology and Evolutionary Biology (SORTEE) also have good materials.
-
Gerry: I like the idea of reproducibility and I've done a terrible job of it in the past, my repository ends up with thousands of versions of different files. Can you help me solve it?
-
Josh: Along the same lines of what's been stated. How best to share knowledge of Git and best practices with others in a new research team? How to adjust to their methods of conducting reproducible research, version control, etc?
-
Punya: not much to add, would really like to know more about version control, I have a basic understanding, what's the standard way of using it, reproducibility and documentation.
-
Rachel: I strongly support this idea of code reproducibility. Best practice frameworks can be disseminated to modellers in modelling consortia, and they can be very helpful when auditing.
-
Ella: we're migrating models from Excel to R.
-
J'Belle: I work for a tiny, very remote health service at the Australian and Papua New Guinea border. We have 17 sources of clinical data, which presents massive challenges in reproducibility and quality assurance. I'm looking to tap into your expertise. How do we manage so many sources of clinical data?
Working with less technically-experienced collaborators¶
How can we make best use of existing tools and practices, while working with collaborators who have less technical expertise/experience?
-
Alex: if I start a project with collaborators who may be less technically literate, how can they contribute without messing up reproducibility? Options like Docker are a little too complicated. How can I motivate people, is there a simple solution?
-
Angus: in theory you may have reproducible code. But if you need to write a paper with less technical collaborators, running the code and generating reports can be hard. How do we collaborate on the writing side? RMarkdown and equivalents makes a lot of sense, but most colleagues will only look at Word documents. There are some workarounds, such as pandoc.
Reproducibility: best practices and limitations¶
How far can/should we go in validating and demonstrating that our models and analyses are reproducible? How can we automate this? How is this impacted when we cannot share the input data, or when our models are extremely large and complex?
-
Cam: there are unique issues in the type of research we do. Working with code makes it easy in some ways, as opposed to experimental conditions in real-world experiments. Our capacity for reproducibility is great, but so then is our burden. We should be exploring the limitations! Some challenges in our area come down to implementation of stochastic models with lots of random processes. How can we do that well and make it part of what we practice? What are the limitations to reproducibility and how do we perceive the goals when we are working when the data cannot be shared?
-
Samik: similar to Cam, I'm interested in how people have produced reproducible research where the data cannot be shared. Perhaps we can provide default examples as test cases?
-
Michael: I second Cam's points, particularly about reproducibility with confidential data. That's an issue I've hit multiple times. We usually have a side folder with the real dataset, and pipe through condensed or aggregated versions of the data that we can release.
-
Jiahao: I'm interested in how to build a platform for using agent based models. I've looked at lots of other models, but how can we bring them together so that it is easier to add a new variable or extend a model?
-
Eamon: I'm a Git fanatic, and I want to understand the development of code that I work with. I get annoyed when people share a repository as a single commit. People who don't use tags in their Git repositories to identify the version of the code they used in, e.g., a paper! How do you start running the code? What file formats does it expect to process?
-
Dion: I'm interested in seeing what people are doing that look like good practice. Making sure that code and results are reproducible, in the sense that your code may be version controlled, but you've since made changes to code, parameters, input data, etc. How do you do a good job to shoe-horn that all into Git? Maybe use Git for development and simultaneously use a separate repository for production stuff? We need to be able to look back and identify from the commit the code, data, intermediate files used along the way.
-
Palang: I've looked at papers with supplementary code, but running the code myself produces very different results from what's in the paper.
-
May: most people have said what I wanted to say. I faced similar problems with running other people's code. It may not print any error message, but you get very different results from what's published in the paper. You don't know who or what is wrong!
Testing and documentation¶
How can we develop confidence in our own code, and in other people's code?
-
TK: I want to learn/see different conventions for writing code documentation. I've never managed to get doxygen working to my satisfaction.
-
Angus: how do we design good tests? How to test, when to test, what to test for? Should we have coverage targets? Are there ways to automated testing?
-
Rahmat: I often find it very hard to learn how to use other people's code. The code needs to be easy to understand. Otherwise, I will just write the code myself! Sometimes when I run the code, I have difficulties in generating results, many errors come up and it's not clear why. Perhaps all of the necessary data have not been shared with the code? We need to include the data, but if the data cannot be provided, you need to provide similar data so that other can run the code. It also helps to use a language that others are familiar with.
Code reuse¶
-
Pan: I am not sure about the term reproducibility in the context of coders. I know lab people really do reuse published protocols. But do coders actually reuse other people's code to do their work?
-
Gerry: People often make their code into packages which others reuse. This could be a good topic for future meetings.
Using Chat GPT to write/check code¶
-
Pan: I recently joined a meeting where people have used Chat GPT to check their code. Does this group have any thoughts on how we might make good use of Chat GPT?
-
Cam: Chat GPT is not reproducible itself, so it seems questionable to use it to check reproducibility.
-
Alex: I don't entirely agree, it can be very useful for improving the implementation of a function. In terms of generating reliable code, it's wonderful. It's a nightmare for evaluating existing code.
-
Pan: people are using Chat GPT to generate initial templates.
-
Eamon: If you encounter code that has poor documentation, Chat GPT is surprisingly good at telling you how to use it.
-
Matt: I don't have anything to add to the above, I'm happy to be along for the ride.