6 Writing Functions

Functions are fundamental building blocks in R programming. They allow you to encapsulate code logic into reusable, organised units. Writing well-structured functions not only enhances code readability but also improves maintainability, scalability, and reduces redundancy. In this chapter, we’ll explore best practices for writing functions in R, including function structure, parameter handling, return values, and documentation.

6.1 The Purpose of Functions

6.1.1 Why Write Functions?

Functions serve several key purposes in programming:

  • Modularity: Functions break your code into smaller, manageable pieces, making it easier to understand and maintain.
  • Reusability: Functions can be reused across different scripts and projects, reducing code duplication and errors.
  • Abstraction: Functions abstract away complex logic, allowing you to focus on higher-level processes.
  • Testing: Functions make it easier to isolate and test specific parts of your code.

6.1.2 When to Write a Function

Consider writing a function when:

  • Code is Repeated: If you find yourself copying and pasting the same code in multiple places, it’s a good candidate for a function.
  • Complexity Increases: As the complexity of your code grows, encapsulating logic in functions helps manage that complexity.
  • Task Automation: When you need to perform a task repeatedly with different inputs, functions make this process more efficient.

6.2 Structure of a Function

6.2.1 Basic Function Structure

An R function typically consists of three parts:

  • Function Name: Descriptive name that indicates the purpose of the function.

  • Arguments: Input parameters that the function will use.

  • Body: The code that performs the function’s operations.

  • Return Value: The output of the function, often explicitly stated using return().

  • Example:

    calculate_mean <- function(x) {
      # Remove NA values and calculate the mean
      mean_value <- mean(x, na.rm = TRUE)
      return(mean_value)
    }

6.2.2 Naming Conventions

  • Descriptive Names: Use descriptive names for functions that clearly indicate what the function does. For example, calculate_mean is more informative than calcMean.
  • Verb-Noun Format: A common convention is to use a verb-noun format, such as plot_data, get_summary, or calculate_average.
  • Consistency: Be consistent with naming conventions throughout your codebase.

6.2.3 Function Arguments

  • Use Default Values: Provide default values for arguments when appropriate. This makes the function more flexible and easier to use.

    calculate_mean <- function(x, na.rm = TRUE) {
      mean_value <- mean(x, na.rm = na.rm)
      return(mean_value)
    }
  • Argument Order: Place arguments in a logical order, usually starting with the most essential arguments and ending with optional ones.

  • Dot Arguments (...): Use the ... argument to allow for additional arguments to be passed to other functions within your function.

    plot_with_options <- function(x, y, ...) {
      plot(x, y, ...)
    }

6.2.4 Function Body

  • Keep It Simple: The body of a function should be as simple and concise as possible. If a function becomes too long or complex, consider breaking it into smaller functions.

  • Use Temporary Variables: Use temporary variables within the function to store intermediate results. This makes the code more readable and easier to debug.

    calculate_summary <- function(x) {
      mean_value <- mean(x, na.rm = TRUE)
      sd_value <- sd(x, na.rm = TRUE)
      return(list(mean = mean_value, sd = sd_value))
    }
  • Avoid Side Effects: Functions should avoid altering global variables or states outside of their scope. This makes them more predictable and easier to debug.

6.2.5 Return Values

  • Explicit Returns: Use the return() function to explicitly specify the output of your function. This clarifies what the function returns and can prevent unintended behavior.

    calculate_sum <- function(a, b) {
      result <- a + b
      return(result)
    }
  • Return Multiple Values: To return multiple values, use a list. This allows you to package several outputs into a single return object.

    calculate_stats <- function(x) {
      mean_value <- mean(x, na.rm = TRUE)
      sd_value <- sd(x, na.rm = TRUE)
      return(list(mean = mean_value, sd = sd_value))
    }

6.3 Documentation and Commenting

6.3.1 Documenting Functions with Roxygen2

Roxygen2 is a popular tool for documenting R functions. It allows you to write documentation comments directly above your function, which can then be converted into formal documentation.

  • Basic Roxygen2 Template:

    #' Calculate the mean of a numeric vector
    #'
    #' This function calculates the mean of a numeric vector, excluding any NA values.
    #'
    #' @param x A numeric vector.
    #' @param na.rm A logical value indicating whether NA values should be removed.
    #' @return The mean of the vector.
    #' @export
    calculate_mean <- function(x, na.rm = TRUE) {
      mean_value <- mean(x, na.rm = na.rm)
      return(mean_value)
    }
  • Key Roxygen2 Tags:

    • @param: Describes each function parameter.
    • @return: Describes the return value.
    • @export: Indicates that the function should be made available to users of your package.
    • @examples: Provides examples of how to use the function.

6.3.2 Commenting Inside Functions

  • Explain Complex Logic: Use comments inside your function to explain complex logic or non-obvious decisions. This helps others (and your future self) understand the code.

    calculate_variance <- function(x) {
      n <- length(x)
      mean_x <- mean(x, na.rm = TRUE)
      # Subtract mean and square the result
      squared_diffs <- (x - mean_x)^2
      variance <- sum(squared_diffs, na.rm = TRUE) / (n - 1)
      return(variance)
    }
  • Comment Sections, Not Every Line: Avoid over-commenting by focusing on sections of code rather than individual lines, unless a line is particularly complex or non-intuitive.

6.4 Testing Functions

6.4.1 Why Test Functions?

Testing functions is crucial to ensure they behave as expected. Writing tests helps catch errors early, simplifies debugging, and provides confidence that changes in the code don’t introduce new bugs.

6.4.2 Writing Simple Tests

  • Test Different Scenarios: Write tests that cover a range of input scenarios, including typical cases, edge cases, and error cases.

    # Test for a standard numeric vector
    stopifnot(calculate_mean(c(1, 2, 3, 4, 5)) == 3)
    
    # Test for a vector with NA values
    stopifnot(calculate_mean(c(1, 2, NA, 4, 5)) == 3)
    
    # Test for an empty vector
    stopifnot(is.na(calculate_mean(c())))
  • Use stopifnot(): The stopifnot() function is a simple way to assert that a condition is true. If the condition is false, it will stop execution and print an error message.

6.4.3 Automated Testing with testthat

For larger projects, consider using the testthat package to automate testing. testthat provides a framework for writing and running tests, making it easier to manage and scale your tests.

  • Basic testthat Example:

    library(testthat)
    
    test_that("calculate_mean works correctly", {
      expect_equal(calculate_mean(c(1, 2, 3, 4, 5)), 3)
      expect_equal(calculate_mean(c(1, 2, NA, 4, 5)), 3)
      expect_true(is.na(calculate_mean(c())))
    })

6.5 Advanced Function Techniques

6.5.1 Function Factories

Function factories are functions that create and return other functions. This is useful when you need to generate customised functions dynamically.

  • Example:

    power_function_factory <- function(power) {
      function(x) {
        x^power
      }
    }
    
    square <- power_function_factory(2)
    cube <- power_function_factory(3)

6.5.2 Closures

A closure is a function that captures the environment in which it was created. Closures are powerful in creating functions with memory, where the function retains access to variables from its creation context.

  • Example:

    make_counter <- function() {
      count <- 0
      function() {
        count <<- count + 1
        return(count)
      }
    }
    
    counter <- make_counter()
    counter()  # Returns 1
    counter()  # Returns 2

6.5.3 Vectorised Functions

Vectorised functions operate on entire vectors or arrays of data at once, rather than using loops to iterate over elements. In R, many built-in functions are vectorised, which makes them not only more concise but also significantly faster due to underlying optimisations.

  • Example of a Vectorised Function:

    # Squaring each element in a vector
    x <- c(1, 2, 3, 4, 5)
    squared_x <- x^2
  • Advantages:

    • Performance: Vectorised operations are typically faster because they are optimised in R’s underlying C code.
    • Conciseness: Vectorised code is often more concise and easier to read.
    • Avoiding Loops: While loops can be necessary in some cases, vectorisation often removes the need for explicit loops.

6.5.4 Anonymous Functions

Anonymous functions, also known as lambda functions, are functions that are defined without being named. These are particularly useful for quick, one-off operations that don’t require a full function definition.

  • Using Anonymous Functions with apply():

    # Applying an anonymous function to each element in a list
    result <- sapply(1:5, function(x) x^2)
  • When to Use:

    • Inline Operations: Use anonymous functions when the operation is simple and will only be used in one place.
    • Temporary Use: They are ideal for temporary, throwaway functionality that doesn’t require reuse.

6.6 Avoiding Common Pitfalls in Function Writing

6.6.1 Overloading Functions

Avoid writing overly complex functions that try to do too much. This can lead to functions that are difficult to understand, maintain, and test. Instead, follow the Single Responsibility Principle (SRP), which suggests that a function should do one thing and do it well.

  • Example of Overloaded Function:

    process_data <- function(x) {
      if (is.numeric(x)) {
        return(mean(x))
      } else if (is.character(x)) {
        return(tolower(x))
      } else {
        stop("Unsupported data type")
      }
    }
  • Solution: Split this function into two separate functions—one for numeric data and one for character data.

6.6.2 Handling Errors Gracefully

Functions should handle errors gracefully and provide informative error messages to help users diagnose problems. Use stop(), warning(), and message() functions to manage errors, warnings, and informational messages, respectively. We will discuss error handling in greater depth in the next chapter (Chapter 7).

  • Example of Error Handling:

    divide <- function(a, b) {
      if (b == 0) {
        stop("Error: Division by zero is not allowed.")
      }
      return(a / b)
    }

6.6.3 Avoiding Side Effects

Functions should be as “pure” as possible, meaning they should not alter global variables or states outside of their scope. Side effects can make functions unpredictable and harder to debug.

  • Example of a Function with a Side Effect:

    increment_global <- function() {
      global_var <<- global_var + 1
    }
  • Solution: Instead, return the new value and let the caller handle any changes to global variables.

    increment <- function(x) {
      return(x + 1)
    }

6.7 Summary

Writing well-structured functions is a cornerstone of good programming practice in R. Functions allow you to create modular, reusable, and testable code, making your scripts more organised and efficient. By following best practices—such as using clear naming conventions, handling errors gracefully, and avoiding side effects—you can write functions that are both powerful and easy to understand.

In this chapter, we covered the essentials of writing functions, including function structure, argument handling, return values, and documentation. We also explored advanced techniques such as vectorisation, closures, and anonymous functions, as well as common pitfalls to avoid.

In the next chapter, we will discuss best practices for error handling in your code, ensuring that your R projects run as smoothly as possible as they grow in size and complexity.