3  Programming

The main goal of adopting good programming practices is to make your code readable, maintainable and reproducible. Additionally, good programming practices are crucial in collaborative projects to work efficiently and seamlessly with others.

TipThe mindset of good practice programming

While writing code, imagine how someone else (or future you) will see the project for the first time. Will they be able to understand and use it? The goal of writing reproducible code is to ensure that the answer to this question is “yes.” You can practically test this by sending your project to a colleague and asking them to try to understand and run the code.

There are many general and language-specific guidelines and tips to write readable, maintainable and reproducible code. We list the most essential ones below.

3.1 Structured workflow and readable code

3.1.1 Have a consistent programming style

Consistency improves readability and maintainability. It helps others to quickly understand your project’s logic and workflows.

Follow a style guide

Every programming language has a style guide for things like indentation, spacing and naming conventions for variables and functions. Check the style guide for the languages you use to get an overview.

There are tools to help you enforce a consistent style across a codebase:

  • Formatters: Can auto-format your script to eliminate inconsistencies.
  • Linters: Can analyse your code for errors, defects and stylistic issues and list areas for improvement.
TipUsing formatters and linters

The details of this depend on your combination of IDE and programming language. Just search for the right formatters and linters for your setup.

R and RStudio

  • Auto-format: Open the command palette (Ctrl/Cmd + Shift + P) and search for “format”. Use “Reformat Current Document” to auto-format your code. Toggle “Reformat documents on save” for convenience. You can also choose your own “Code formatter”, but the default option “Styler”, which applies the tidyverse style guide, is already a very good option.
  • Lint: Install the lintr package and search for “lint” options in the command palette. Use “Lint current file” or “Lint current package” to list style problems.
  • Air: A new package called Air is currently in development1 as a code formatter and language server.

Python and VS Code

  • Auto-format: Open the command palette (Ctrl/Cmd + Shift + P) and search for “Format Document”. You can also enable “Format on Save” in VS Code settings and select a default formatter in the VS Code settings.
  • Lint: Install a linter like pylint or flake8 and configure it in VS Code settings.

Julia and VS Code

  • Auto-format: Open the command palette (Ctrl/Cmd + Shift + P) and search for “Format Document”. You can also enable “Format on Save” in VS Code settings and select a default formatter in the VS Code settings. JuliaFormatter is packaged within the Julia extension and has “sane” defaults, but a user can also specify their own style configurations.
  • Lint: The Julia extension is by default statically linted; you can modify this behavior in your workplace settings.

Have a naming convention

Just like when naming your files (see Chapter 2), use clear and descriptive names for files, variables, functions and modules. The goal of a good naming convention is that it is immediately clear to the reader what is behind any file or object. For objects in your codebase, you can follow these tips:

  • Concise and descriptive: Variable names are usually nouns and function names are verbs (see also in the next section).
  • Avoid conflicts: Don’t use names of existing variables or functions unless you are intentionally extending or overriding them (e.g. when developing an R package or a new method in Julia).
  • Use consistent capitalisation rules: Each language community has a preferred naming style (e.g., snake_case for Python and R, lowerCamelCase for JavaScript).
  • Develop rules and document them: You can develop your own conventions where it is useful. This can include things like when and how to use abbreviations or rules on how you name your functions (e.g. you might want to prefix all your helper functions with zzz_ to mark them as helper functions).
TipExamples of good and bad object names

Bad names do not reveal what is behind the variable/function and could be misleading:

temp, data1, data_function, my_function

Good names are human-readable and tell the user what is behind the objects:

temperature_readings, user_data, read_data, run_binomial_model

It is good practice to establish and follow a naming convention throughout a project. It also helps to document your conventions and naming logic in the README file of your project. This way, it is easy for collaborators to read and understand your code but also to contribute using the same style.

3.1.2 Comment your code

An easy win for making code more readable and reproducible is the liberal and effective use of comments to provide context for human readers, which are ignored by the computer during execution of code. One good principle to adhere to is to comment on the ‘why’ rather than the ‘what’. The code itself tells the reader what is being done; it is far more important to document the reasoning behind a particular section of code.

You can use inline comments for short explanations or block comments that span multiple lines to summarise sections of code or provide detailed explanations. Although different languages have different ways of denoting a comment Julia, Python and R all use a # at the start of a line to denote that it is a comment as opposed to code.

Don’t forget to update your comments as your code evolves to avoid outdated or misleading information.

3.1.3 Structure your scripts

Structure your scripts in a consistent and logical way so that readers can orient themselves easily in your codebase.

Here are some things to consider:

  • Split long scripts: Make scripts do just one thing. If needed, you can import multiple scripts that you need for your analysis (e.g. using source in R, import in Python, include in Julia).
  • Use a standardised header: Include essential information like the purpose, authors, contact, license, etc. of the script.
  • Initialise at the top: Load all libraries, define global variables and paths, and read all data in one block at the top instead of throughout the script.
  • Use section headers: Guide readers through your scripts with section headers. In many IDEs you can navigate these sections using a script outline or collapse different sections.
  • Creating a script template: Create a new script from a pre-structured template where you can fill out the relevant information. In RStudio, you can check out code snippets that allow you to easily load template code and script sections in your scripts2
TipExample structure of R script for data analysis
# Purpose: Analyze climate data
# Author: Jane Doe, John Doe
# License: GPL-3.0-or-later
# Contact: jane.doe@email.com
# Date: 2025-12-16

# Load libraries -------------------------------------------------
library(tidyverse)

# Define global variables ----------------------------------------------------
rain_data_path <- "data/temperature_readings.csv"
temp_data_path <- "data/rainall_readings.csv"

# Load data ------------------------------------------------------
temperature_data <- read_csv(temp_data_path)
rainfall_data <- read_csv(rain_data_path)

# Data processing ------------------------------------------------

# Analysis -------------------------------------------------------

# Output results -------------------------------------------------

3.2 Modular and functional code

One of the core principles in software development is DRY (Don’t Repeat Yourself) i.e., reduce any repetitive patterns or duplicates in your code in favour of creating modular and referenceable code. Although it may seem simpler to just copy and paste the same code over again when performing repetitive tasks, it means that every time you need to change something (or fix an error), you need to change it in every place the code has been copied to. Functions are a simple way to avoid this, as they allow you to break your code into modules, which allows you to repeat the same task in a standardised (and documented) manner. DRY functions are designed to execute a specific task and ensure a data analysis is correct. See Section 3.3 for more information.

3.2.1 Writing your first function

Documentation

When documenting your code, never assume that the reader knows the basics of what is going on; strive to explain things to a layperson. Document how a function works, what it does and how to use it. It is useful to think about creating two ‘levels’ of documentation. The first level is documentation that allows developers/collaborators to understand what the code does (this you can do with comments inside the function; see Section 3.1.2). The second level is documentation for those who will use the function and need to know how to use it. Python and Julia allow you to add ‘docstrings’ directly to a function but external tools such as doxygen or roxygen2 can create more complex documentation. Typically function descriptions are ‘exported’ with a function, act as metadata and are searchable; comments however are not and have to be viewed by looking at the original source code. Additionally, it may be useful to provide higher-level documentation as to how the functions integrate and work together (e.g. using a README file).

Keeping things modular

A function should perform a single action, not rely on objects from outside the function and not change objects outside the function. So, if a function does too many jobs, split it. This is useful to keep in mind in instances when you have similar analyses that share 90% of the same code. Here, it makes sense to write a function that does the 90% and keep the 10% difference external to the function.

Have a consistent naming convention

Consistency is key to making your work easier for others to understand and follow. When introducing functions into your workflow, make sure that you are consistent with how you name and describe them. Because functions perform an action on an object, having a combination of verb + object in the name makes sense. Having a consistent design pattern (terms and word order) makes your code easier to understand and serves as a template for further development.

TipExample

Let’s say I have two functions; one initialises an object as a number and the other as a character. Bad practice would be to name them as follows:

  • int_char()
  • numInit()

Instead, opt for consistency in representing the ‘initialise’ action and the verb-object order:

  • init_num()
  • init_char()

Note that there is no right and wrong in terms of the words, order, or case used; just make sure that it is consistent and that it is clearly documented.

3.3 Defensive programming

Defensive programming is all about anticipating errors and writing robust code. The aim is to ensure that when your code fails, it does so with well-defined errors and rests on the idea that we expect our code to fail. By applying a defensive programming philosophy (and adding checks and tests into the code), you can find unexpected behaviour sooner. Although this initially means more work, it will make debugging the code a lot easier later.

3.3.1 Checking the behaviour of a function

One aspect of a function you can check is that it is behaving as expected. This can include ensuring the data types are correct (e.g., is the input a number and not a character?), or testing the boundaries of the data (e.g., asserting that two dataframes are the same dimensions or contain the same variables). Since a function only does what is specified, it is important to specify what it should not do. It is also useful to include a defined error message should those checks fail, as it makes it easier to try and correct the error.

TipWriting checks and good error messages

Knowing that a check fails is already a good starting point but writing an error message that explains how the check is failing is even better. If the error message has additional information, it will probably give us an idea as to what is happening that is resulting in the unexpected behaviour.

For example, let’s say a function is designed to count items by summing a vector. The input data should be a vector of integers because it is a count; however, it is also possible to sum a vector of floats. This means that if you were to input a vector of floats, the function would still be able to sum the numbers; however, this is not the specified job of the function. Adding a check that asserts that the input vector is an integer is a way to prevent unintended misuse of a function.

The logical check would be to see if typeof(input_vector) == Integer and then throw an error if this clause is not met. Although having an error message of “input data is not of type integer” is already informative, it is useful to add some additional info, such as exactly what the input data type is e.g., “input data is not of type integer but rather type float.”

3.3.2 Testing the output of a function

By testing your code, you can catch edge cases and ensure that functions are working as intended and expected, even when users are using functions in unexpected ways. While the idea of developing tests may feel excessive when starting out with programming, it is valuable to be aware of these principles as they provide a conceptual basis from which you can develop code that meets the expectations associated with conducting ‘good’ science.

Unit tests

Unit testing focuses on testing individual functions of your codebase to ensure that a function is doing what it should be doing and meets the specified requirements in a formalised and automated manner. At a high level, the aim of unit tests is to make sure that the underlying maths/logic of your function is correct. This can be done by inputting a value into the function for which you know what the output is and testing if the output that the function gives you is the same.

TipWriting and running unit tests

Most programming languages have packages that will help with executing a test run. Usually, this involves creating a separate directory where you can write your tests as well as where the testing workflows are hosted. Tests are typically run in a new language process, where the package itself and any test-specific dependencies are made available.

R

The testthat package3 is the commonly used testing framework and visually shows a pass, fail, or error for your tests. It easily integrates in your existing workflow, allowing for informal testing or the building of more ‘complex’ test suites.

Python

It is possible to write basic tests using assert to test if a statement evaluates to true. For writing more complex tests the unittest module provides the flexibility to write more nuanced tests (assertions).

Julia

Julia allows you to write basic tests using the @test macro and will test that the expression evaluates to true. The Pkg.jl has a framework for building testing suites that are run when compiling a project or package.

Integration tests

Integration tests are more about ensuring that the parts fit into the whole. So, going back to the data analysis example, you want to make sure that the output from your data cleaning function can seamlessly act as the input for our data analysis function. Alternatively, you might want to run integration tests when you are introducing new features (functions) to your project and need to ensure that these do not break or alter the behaviour of your existing workflow.

Test-driven development

Test-driven development (TDD) is an approach to software development whereby tests are written before the code to identify the desired behaviour of the system. You write a small test that defines the desired functionality, write the minimum code necessary to pass that test and refactor the code to improve structure and performance. This ensures the reliability of your code by predefining the parameters and expected outputs before you even start programming up the project.

3.3.3 Debugging and logging

Debugging is the process of finding and resolving errors in code. Code that has well-thought-out checks and error messages should be easy to debug, as problems are already identified and isolated. Creating logs is a comprehensive way to document the behaviour of the entire workflow. Logs are usually created by an automated workflow that runs through and records events or messages (that you have specified) as it goes. This allows you to diagnose and troubleshoot issues. Log messages can also give information on the state of the workflow. Unless you are developing extremely complex workflows or packages, it might make sense to only log errors to aid in the debugging process.

TipDebugging with an IDE

Generally speaking the IDE (e.g. RStudio4 or VS Code5) you choose to use will have some form of a ‘debug mode’ that will allow you to run the code until a specified breakpoint (the point where you suspect the problem is arising) and look at and/or walk through the code, step-by-step at that point.

3.4 Reproducible code

To ensure your code is reproducible, you should document the exact versions of all packages, libraries, software and potentially your operating system and hardware, alongside the code and data. Below are some basic tips to ensure others can run your code and obtain the same results.

TipTest reproducibility

If you are unsure whether your project is reproducible, send it to a colleague or test it on a different machine.

3.4.1 Write portable code

To improve the portability of your code, avoid absolute paths and use relative paths instead, ensuring that the script is run from the project root folder.

# Absolute path: Exists only on your machine
absolute_path <- "C:/Users/my_name/project_folder/data/species_dat.csv"

# Relative path: Exists within the project
relative_path <- "data/species_dat.csv"

You can immediately see the problem with absolute paths is that they only exist on one machine, while relative paths exist within the project no matter how the machine’s folders are organised.

TipAvoid setwd() in R

Use RStudio projects to automatically set the working directory to the project directory. Use the here package6 to construct paths relative to the project root:

# A relative path built with the here package
project_path <- here::here("data", "species_dat.csv")

So, if your project’s root folder is called “fish-jaws” and it lives on your Documents, the output for this would look something like Documents/fish-jaws/data/species_dat.csv depending on your operating system.

3.4.2 Dependency management

Documenting and managing dependencies are essential for reproducibility because the software changes over time. If you, for example, wrote your code with a recent version of an R package and gave it to someone who has not upgraded recently, they may not be able to run your code, or they might get different results.

Dependency management can be done in a lot of ways. Below, you will find three levels of complexity to document the dependencies for your projects.

Show packages that you used

The simplest approach is to document all your dependencies in a file that you add to your project.

TipFind dependencies of your project
  • R: use devtools::session_info() to get a nicely printed table of all dependencies. Add this information to your project (e.g. in a README file).
  • Python: you can use pip freeze to list all installed packages and their versions. Save this information to a requirements.txt file.
  • Julia: you can use Pkg.status() to list all installed packages and their versions. Save this information to a Project.toml file.

Use a project local library

Create a local library with the packages used in the project. This way, users don’t have to use their globally installed software that might have a different version, they can use the local project library.

TipCreate project local libraries

In R, you can use the renv package7 to manage dependencies. Initialise renv in your project:

install.packages("renv")
renv::init()

In Python, you can use pip and venv to manage dependencies. Specifically, venv (a standard package shipped with Python 38) supports lightweight ‘virtual environments’ that hosts its own set of independent packages. Here’s the command for creating a virtual environment and install packages within that environment with pip (which are listed in a file named requirements.txt):

python -m venv env
source env/bin/activate  # On Windows use `env\Scripts\activate`
pip install -r requirements.txt

In Julia, you can use the built-in package manager. Create a Project.toml file and activate the environment:

using Pkg
Pkg.activate(".")
Pkg.instantiate()

Use a container

A more advanced approach is to use containers, such as Docker or Podman, to encapsulate your entire environment. This ensures that your code runs in the exact same environment, regardless of the host system. Containers take more steps to set up but are especially useful for reproducing results when the analyses behind them require software packages that can be difficult to install.

For R, the Rocker project helps you provide container images with popular R software and optimized libraries pre-installed.

3.4.3 Namespace conflicts

Using multiple packages can result in namespace conflicts, where different packages have functions with the same name. This can lead to unexpected behaviour in your code. It is good practice to prefix functions with the package name to avoid this and make the dependency explicit.

TipExample of avoiding namespace conflicts

In R, if both dplyr and plyr have a function called summarise. You can make R use the dplyr version either by using dplyr:: or the use function:

dplyr::summarize(data, mean_value = mean(value))

or

use("dplyr", c("summarise"))

In Python, if both pandas and numpy have a function called mean:

import pandas as pd
import numpy as np

mean_value = pd.DataFrame.mean(data)
mean_array = np.mean(array)

In Julia, if both DataFrames and Statistics have a function called mean:

using DataFrames
using Statistics

mean_value = Statistics.mean(data)

3.4.4 Other considerations

Set a seed: A random seed is a number used to start a random number generator. Explicitly setting a random seed ensures reproducibility as the random number generation will start at the same point. This is important when your code involves random number generation, which often occurs in some statistical modelling. For this, in R you can use set.seed(123) or use the withr package, in Python you can use random.seed(123) and in Julia you can use Random.seed!(123).


  1. https://posit-dev.github.io/air/ accessed 15th August 2025↩︎

  2. https://support.posit.co/hc/en-us/articles/204463668-Code-Snippets-in-the-RStudio-IDE accessed 15th August 2025↩︎

  3. https://testthat.r-lib.org/ accessed 15th August 2025↩︎

  4. https://docs.posit.co/ide/user/ide/guide/code/debugging.html accessed 15th August 2025↩︎

  5. https://code.visualstudio.com/docs/debugtest/debugging accessed 15th August 2025↩︎

  6. https://here.r-lib.org/ accessed 15th August 2025↩︎

  7. https://rstudio.github.io/renv/articles/renv.html accessed 15th August 2025↩︎

  8. https://docs.python.org/3/library/venv.html accessed 15th August 2025↩︎