6 Survey Analysis in R
Survey data is a powerful tool in crime analysis, providing insights into public perceptions, victimisation rates, and more. In this chapter, we will explore how to manage and analyse survey data using R. For those familiar with SPSS, you’ll find that while the syntax and interface differ, R offers extensive capabilities for survey analysis, often with greater flexibility and control.
6.1 Introduction to Survey Data
Survey data often come with complex structures, including weighting, stratification, and clustering, which need to be handled appropriately to ensure valid analysis. In SPSS, you may have used procedures like FREQUENCIES, CROSSTABS, and WEIGHT to analyse survey data. In R, you can accomplish these tasks using various packages, with survey and srvyr being among the most powerful for survey data. The haven package is also particularly useful for importing SPSS .sav files directly into R.
Install the packages if you haven’t already:
6.1.1 Key Concepts in Survey Analysis
- Weighting: Adjusts the data to represent the population more accurately, compensating for oversampling or undersampling.
- Stratification: Divides the population into subgroups (strata) before sampling to ensure representation from each subgroup.
- Clustering: Groups the population into clusters, where a random sample of clusters is then selected.
6.1.2 Understanding Survey Data Structures
Survey data typically consists of responses from individuals to a set of questions, often with demographic information included. In SPSS, survey data is usually stored in .sav files with each row representing a respondent and each column representing a variable (question, demographic information, etc.). In R, survey data is typically handled within a data frame, where the structure is similar.
Survey datasets might also include:
- Categorical variables (e.g., gender, education level)
- Numerical variables (e.g., age, income)
- Weighting variables (to adjust for survey sampling)
6.2 Importing and Preparing Survey Data
Just like SPSS, R can import survey data from various file formats, including SPSS files.
Here’s how you can import SPSS survey data into R:
# Load haven package
library(haven)
# Import SPSS data
survey_data <- read_sav("data/survey_data.sav")
# View the first few rows of the data
head(survey_data)
#Produce a summary of the data
summary(survey_data)Once imported, survey data in R can be manipulated just like any other data frame. You can use functions from base R or packages like dplyr to filter, select, and mutate your data, just as you might use similar functions in SPSS.
6.2.1 Converting Data for Survey Analysis
Before we can analyse the data, we need to define the survey design. This step involves specifying the survey weights, strata, and clusters if applicable.
The svydesign() command creates a survey design object that tells R how your data were sampled, which is crucial for accurate analysis.
6.3 Descriptive Analysis of Survey Data
With the survey design object created, you can now perform various descriptive analyses. Descriptive statistics are often the first step in analysing survey data. In SPSS, you might use Frequencies, Descriptives, or Crosstabs commands. In R, these can be replicated using functions from base R or more specialised packages like dplyr or janitor.
6.3.1 Calculating Means and Totals
To calculate means, totals, or other statistics, you can use functions from the survey package.
# Mean of a variable
mean_variable_name <- svymean(~variable_name, design = survey_design)
mean_variable_name# Total population estimate
total_variable_name<- svytotal(~variable_name, design = survey_design)
total_variable_nameIf applicable, these commands will return the weighted mean and total, respectively, adjusting for the survey’s design.
6.3.2 Frequencies and Cross-tabulations
Frequencies and cross-tabulations are commonly used in survey analysis to summarise categorical variables. These tables are weighted according to the survey design, providing a more accurate reflection of the population.
Frequencies To calculate the frequency of responses to a survey question (categorical variable), you can use the svytable() function. This provides more detailed summary statistics (similar to SPSS’s Frequencies output).
Cross-tabulations
Cross-tabulations are used to examine the relationship between two categorical variables. In SPSS, this is done using the Crosstabs command. In R, you can use the table() function or the janitor package to create cross-tabulations:
# Cross-tabulation of two categorical variables
svytable(~variable_name_1 + variable_name_2, design = survey_design)For more detailed cross-tabulation (with proportions and totals) you can use the janitor package.
6.3.3 Comparing Results with SPSS Survey Functions
In SPSS, descriptive analysis is often performed using menu-driven commands with various output options. In R, you have more flexibility and control over the analysis and the output, though it may require writing more code. For example, while SPSS may offer a GUI-based approach to creating cross-tabulations with options for row/column percentages, R’s approach provides more control and customisation, often in fewer steps once the user is comfortable with the syntax. But most importantly R ensures that your analysis can be easily reproduced.
Exercise!
You have a dataset from a national survey on perceptions of crime. This dataset provides a comprehensive analysis of crime and policing by examining crime types, victim demographics, police responses, community engagement, and socioeconomic factors to understand the respondents crime experiences and perceptions. Download the Crime Survey Data (SPSS Format) and perform a descriptive analysis of the data. The steps you need to follow are listed below to help you.
Hints:
- Load the dataset
- Create a survey design object
- Create a frequency distribution of the
crime_typeandgendervariable. - Cross-tabulate
crime_typebyregionandgender(separately and together) to see the distribution across different regions and genders. - Create a new variable to categorise satisfaction into high and low satisfaction (above 3 = High, 3 or below = Low).
- Update the survey design with the new variable.
- Calculate the mean number of community programs by satisfaction level and gender
6.4 Weighting Survey Data
Weights are crucial in survey analysis to correct for biases introduced by the sampling design.
6.4.1 Applying Weights
Survey data often includes a weight variable to adjust for the sampling design. This is particularly common in complex surveys where the probability of selection differs among respondents. In SPSS, weights are applied using the Weight Cases function. In R, you can apply weights directly in the survey design phase, as shown earlier. Here’s a reminder:
6.4.2 Analysing Weighted Survey Data
Once weights are applied, all the subsequent analyses (means, totals, regressions) will automatically account for these weights, ensuring that your results are representative of the population.
For example, to calculate weighted means or proportions:
Weighted Means
Weighted Proportions
Weighted Cross-tabulations
These functions ensure that your survey analysis correctly reflects the survey design and weights, providing more accurate estimates and inferences.
6.5 Conclusion
In this chapter, we explored the basics of survey analysis in R, covering the import of SPSS survey data, applying weights, and performing basic descriptive statistics. The tools and techniques introduced here are powerful, enabling you to transition smoothly from SPSS to R while expanding your analytic capabilities.
For practice, try importing your own survey data, defining the survey design, and performing some of the analyses shown in this chapter. As you become more comfortable with these processes in R, you’ll find it offers greater flexibility and control over your survey analyses than SPSS.