3 Organising Scripts

A well-organised project structure is essential for maintaining clarity and efficiency in your R code. Proper organisation helps you navigate your project, makes it easier for others to understand your work, and enables smoother collaboration. In this chapter, we will explore best practices for structuring your scripts, organising files and directories, and managing your project’s workflow.

A full script header and template has been provided at the end of this chapter.

3.1 Structuring Scripts

3.1.1 Logical Script Structure

Organising your code within scripts is just as important as organising your project as a whole. A consistent structure within each script makes your code easier to read, debug, and extend. Here is a general structure you can follow:

Load Libraries: At the top of your script, load all the necessary libraries. This allows anyone reading your script to immediately see which external dependencies are required.
```
# Load libraries
library(dplyr)
library(ggplot2)
```
Define Constants and Parameters: If your script uses any constants or parameters that control the script’s behavior, define them near the top. This makes it easier to modify these values without digging through the code.
```
# Define constants and parameters
input_file <- "data/raw_data.csv"
output_file <- "output/cleaned_data.csv"
```

Load and Prepare Data: This section should handle data import and any necessary cleaning or preprocessing.

# Load and prepare data
data <- read.csv(input_file)
clean_data <- data %>%
                filter(!is.na(value))

Main Analysis or Processing: The core logic of your script should be placed here. This could involve running models, performing calculations, or generating plots.
```
# Perform analysis
summary_stats <- clean_data %>%
                     summarise(mean_value = mean(value))
```
Save Results: If your script generates output, save it towards the end of the script. This includes saving processed data, analysis results, or visualisations.
```
# Save results
write.csv(clean_data, output_file)
```
Clean Up (Optional): If needed, include a clean-up section where you remove temporary variables or objects from the environment.
```
# Clean up
rm(temp_var)
```

3.1.2 Using Functions for Modularity

To avoid having large blocks of code, encapsulate repetitive or complex tasks within functions. This makes your script more modular, easier to read, and allows for code reuse. This book will discuss writing functions in more detail in Chapter 6.

Example: Instead of repeating the same data cleaning steps throughout your script, create a clean_data() function and call it wherever needed.

clean_data <- function(data) {
    data %>%
      filter(!is.na(value))
}

# Use the function
data_cleaned <- clean_data(data)

3.1.3 Script Length

While there is no strict rule on the ideal length of an R script, aim to keep scripts focused and concise. If a script becomes too long, consider breaking it up into smaller, more manageable scripts that perform specific tasks.

Guideline: A script should ideally perform one primary function (e.g., data cleaning, analysis, or plotting). This separation of concerns helps maintain clarity.

3.2 Organising Files and Directories

3.2.1 Directory Structure

A well-organised directory structure is critical for keeping your project manageable, especially as it grows. Here is a common directory structure for an R project:

project/

├── data/

│ ├── raw/ #Raw data files (input)

│ └── processed/ #Processed data files (output)

├── scripts/

│ ├── data_cleaning.R #Script for cleaning data

│ ├── analysis.R #Script for analysis

│ └── plotting.R #Script for generating plots

├── output/

│ ├── figures/ #Generated figures and plots

│ └── results/ #Analysis results, tables, etc.

└── README.md #Project overview and instructions

3.2.2 Data Files

Raw Data: Store raw data files in a data/raw/ directory. These files should remain unmodified, serving as the original source for your analyses.
Processed Data: Any data files that are generated or cleaned should be saved in data/processed/. This keeps raw and processed data separate, making it easier to track changes.

3.2.3 Scripts

Scripts Directory: Place all your R scripts in a scripts/ directory. organise them by task, such as data_cleaning.R, analysis.R, and plotting.R.
Modular Scripts: If a single script becomes too large or complex, split it into multiple scripts that are responsible for different parts of the workflow.

3.2.4 Output Files

Output Directory: Use an output/ directory to store results such as figures, tables, and reports. Further organise this directory by creating subdirectories like figures/ and results/.
Avoid Overwriting: Name your output files descriptively to avoid overwriting them. Include details like the date or parameters used in the analysis (e.g., summary_stats_2024-08-21.csv).

3.3 Documentation

3.3.1 README Files

Every project should include a README.md file in the root directory. This file serves as an introduction and guide to your project.

Example Contents of a README:
- Project overview
- Instructions for setting up the environment
- Description of scripts and their functions
- How to run the analysis
- Dependencies (e.g., required R packages)

3.3.2 Inline Documentation

Use comments and docstrings to document your scripts and functions. This inline documentation helps users and collaborators understand what your code does and will be discussed further in Chapter 4.

3.4 Summary

In this chapter, we’ve covered best practices for organising your R scripts and project directories. A well-structured project makes it easier to manage, understand, and share your work. By following these guidelines, you can ensure that your codebase remains clean, modular, and maintainable.

## {Project Title}

## {Author Name} - {YYYY-MM-DD}
## GH: {GH_user_id} [OR ALTERNATIVE CONTACT DETAILS]

## {This is a description of the project and this code script.}
## {This is a description of the project and this code script.}
## {This is a description of the project and this code script.}

###############################################################################
############################### {SCRIPT TITLE} ################################
###############################################################################

# CHANGELOG -------------------------------------------------------------------

# v1.0.0.0 - {Change Overview}
#          - {YYYY-MM-DD}
#          - {Change Author}


# NOTES  ----------------------------------------------------------------------

# 1. Important Note for the Script User.
# 2. Another Important Note for the Script USer.

#TODO: A To-Do Item for the Script Developer


# REQUIRED LIBRARIES ----------------------------------------------------------

#ggplot2: For Plotting
library(ggplot2)


# SCRIPT CONSTANTS ------------------------------------------------------------

In the next chapter, we’ll explore how to use comments effectively to document your code and make it more understandable.