6 Writing Functions
Functions are fundamental building blocks in R programming. They allow you to encapsulate code logic into reusable, organised units. Writing well-structured functions not only enhances code readability but also improves maintainability, scalability, and reduces redundancy. In this chapter, we’ll explore best practices for writing functions in R, including function structure, parameter handling, return values, and documentation.
6.1 The Purpose of Functions
6.1.1 Why Write Functions?
Functions serve several key purposes in programming:
- Modularity: Functions break your code into smaller, manageable pieces, making it easier to understand and maintain.
- Reusability: Functions can be reused across different scripts and projects, reducing code duplication and errors.
- Abstraction: Functions abstract away complex logic, allowing you to focus on higher-level processes.
- Testing: Functions make it easier to isolate and test specific parts of your code.
6.1.2 When to Write a Function
Consider writing a function when:
- Code is Repeated: If you find yourself copying and pasting the same code in multiple places, it’s a good candidate for a function.
- Complexity Increases: As the complexity of your code grows, encapsulating logic in functions helps manage that complexity.
- Task Automation: When you need to perform a task repeatedly with different inputs, functions make this process more efficient.
6.2 Structure of a Function
6.2.1 Basic Function Structure
An R function typically consists of three parts:
Function Name: Descriptive name that indicates the purpose of the function.
Arguments: Input parameters that the function will use.
Body: The code that performs the function’s operations.
Return Value: The output of the function, often explicitly stated using
return()
.Example:
6.2.2 Naming Conventions
- Descriptive Names: Use descriptive names for functions that clearly indicate what the function does. For example,
calculate_mean
is more informative thancalcMean
. - Verb-Noun Format: A common convention is to use a verb-noun format, such as
plot_data
,get_summary
, orcalculate_average
. - Consistency: Be consistent with naming conventions throughout your codebase.
6.2.3 Function Arguments
Use Default Values: Provide default values for arguments when appropriate. This makes the function more flexible and easier to use.
Argument Order: Place arguments in a logical order, usually starting with the most essential arguments and ending with optional ones.
Dot Arguments (
...
): Use the...
argument to allow for additional arguments to be passed to other functions within your function.
6.2.4 Function Body
Keep It Simple: The body of a function should be as simple and concise as possible. If a function becomes too long or complex, consider breaking it into smaller functions.
Use Temporary Variables: Use temporary variables within the function to store intermediate results. This makes the code more readable and easier to debug.
Avoid Side Effects: Functions should avoid altering global variables or states outside of their scope. This makes them more predictable and easier to debug.
6.2.5 Return Values
Explicit Returns: Use the
return()
function to explicitly specify the output of your function. This clarifies what the function returns and can prevent unintended behavior.Return Multiple Values: To return multiple values, use a list. This allows you to package several outputs into a single return object.
6.3 Documentation and Commenting
6.3.1 Documenting Functions with Roxygen2
Roxygen2 is a popular tool for documenting R functions. It allows you to write documentation comments directly above your function, which can then be converted into formal documentation.
Basic Roxygen2 Template:
#' Calculate the mean of a numeric vector #' #' This function calculates the mean of a numeric vector, excluding any NA values. #' #' @param x A numeric vector. #' @param na.rm A logical value indicating whether NA values should be removed. #' @return The mean of the vector. #' @export calculate_mean <- function(x, na.rm = TRUE) { mean_value <- mean(x, na.rm = na.rm) return(mean_value) }
Key Roxygen2 Tags:
@param
: Describes each function parameter.@return
: Describes the return value.@export
: Indicates that the function should be made available to users of your package.@examples
: Provides examples of how to use the function.
6.3.2 Commenting Inside Functions
Explain Complex Logic: Use comments inside your function to explain complex logic or non-obvious decisions. This helps others (and your future self) understand the code.
Comment Sections, Not Every Line: Avoid over-commenting by focusing on sections of code rather than individual lines, unless a line is particularly complex or non-intuitive.
6.4 Testing Functions
6.4.1 Why Test Functions?
Testing functions is crucial to ensure they behave as expected. Writing tests helps catch errors early, simplifies debugging, and provides confidence that changes in the code don’t introduce new bugs.
6.4.2 Writing Simple Tests
Test Different Scenarios: Write tests that cover a range of input scenarios, including typical cases, edge cases, and error cases.
Use
stopifnot()
: Thestopifnot()
function is a simple way to assert that a condition is true. If the condition is false, it will stop execution and print an error message.
6.5 Advanced Function Techniques
6.5.1 Function Factories
Function factories are functions that create and return other functions. This is useful when you need to generate customised functions dynamically.
Example:
6.5.2 Closures
A closure is a function that captures the environment in which it was created. Closures are powerful in creating functions with memory, where the function retains access to variables from its creation context.
Example:
6.5.3 Vectorised Functions
Vectorised functions operate on entire vectors or arrays of data at once, rather than using loops to iterate over elements. In R, many built-in functions are vectorised, which makes them not only more concise but also significantly faster due to underlying optimisations.
Example of a Vectorised Function:
Advantages:
- Performance: Vectorised operations are typically faster because they are optimised in R’s underlying C code.
- Conciseness: Vectorised code is often more concise and easier to read.
- Avoiding Loops: While loops can be necessary in some cases, vectorisation often removes the need for explicit loops.
6.5.4 Anonymous Functions
Anonymous functions, also known as lambda functions, are functions that are defined without being named. These are particularly useful for quick, one-off operations that don’t require a full function definition.
Using Anonymous Functions with
apply()
:When to Use:
- Inline Operations: Use anonymous functions when the operation is simple and will only be used in one place.
- Temporary Use: They are ideal for temporary, throwaway functionality that doesn’t require reuse.
6.6 Avoiding Common Pitfalls in Function Writing
6.6.1 Overloading Functions
Avoid writing overly complex functions that try to do too much. This can lead to functions that are difficult to understand, maintain, and test. Instead, follow the Single Responsibility Principle (SRP), which suggests that a function should do one thing and do it well.
Example of Overloaded Function:
Solution: Split this function into two separate functions—one for numeric data and one for character data.
6.6.2 Handling Errors Gracefully
Functions should handle errors gracefully and provide informative error messages to help users diagnose problems. Use stop()
, warning()
, and message()
functions to manage errors, warnings, and informational messages, respectively. We will discuss error handling in greater depth in the next chapter (Chapter 7).
Example of Error Handling:
6.6.3 Avoiding Side Effects
Functions should be as “pure” as possible, meaning they should not alter global variables or states outside of their scope. Side effects can make functions unpredictable and harder to debug.
Example of a Function with a Side Effect:
Solution: Instead, return the new value and let the caller handle any changes to global variables.
6.7 Summary
Writing well-structured functions is a cornerstone of good programming practice in R. Functions allow you to create modular, reusable, and testable code, making your scripts more organised and efficient. By following best practices—such as using clear naming conventions, handling errors gracefully, and avoiding side effects—you can write functions that are both powerful and easy to understand.
In this chapter, we covered the essentials of writing functions, including function structure, argument handling, return values, and documentation. We also explored advanced techniques such as vectorisation, closures, and anonymous functions, as well as common pitfalls to avoid.
In the next chapter, we will discuss best practices for error handling in your code, ensuring that your R projects run as smoothly as possible as they grow in size and complexity.