library(rio)
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.5.2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(stargazer)
## 
## Please cite as:
##  Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.2. https://CRAN.R-project.org/package=stargazer
library(pander)
library(ggplot2)

1 Introduction

In this class we will look at statistics testing, performing t-tests to compare means and in a regression context.

We will continue to make use of data from the Irish National Election Study. We only look at the most recent election in the data set, which is already a bit dated: 2007. This data file is a ZIP archive, with inside it a Stata file, which can be opened in R using a temporary file as follows:

tempFile <- tempfile(fileext = ".zip")
download.file("https://www.ucd.ie/issda/t4media/INESLong_Beta.zip", tempFile)
ines <- import(tempFile, haven = FALSE) %>% 
  filter(ines == 2007)
unlink(tempFile)

This uses the long file in Stata format on the INES archive website. If the above is too slow when using “Knit”, you can also go to the INES website, download the file, unzip by clicking on the file, and then use a more typical method to open the file:

ines <- import("INESLong_Beta.dta", haven = FALSE) %>% 
  filter(ines == 2007)

Check out the codebook for a description of the relevant variables.

2 t-tests in R

There are three types of t-tests for the mean: * Testing the mean against some reference value (one-sample t-test). * Testing the means of two variables on the same units (paired-sample t-test). * Testing the means of the same variable on two different groups (two-sample t-test).

This used to be core material of the course, but now the course focuses on t- and F-tests in regression only. It is still useful to have a basic understanding of tests for comparing means, on which the t-test in regression is based.

First, we will prepare some variables to have proper variable names, which makes interpreting the output easier:

ines <- ines %>% mutate(union = recode(v0936, "Yes", "No"),
                        labour = v0190,
                        progDems = v0191,
                        turnout = recode(v0072, "Voted", "Did not vote", "Did not vote", "Did not vote"))

pander(table(ines$v0936, ines$union))
No Yes
0 431
718 0
pander(table(ines$v0190, ines$labour))
1 2 3 4 5 6 7 8 9 10
243 0 0 0 0 0 0 0 0 0
0 100 0 0 0 0 0 0 0 0
0 0 125 0 0 0 0 0 0 0
0 0 0 111 0 0 0 0 0 0
0 0 0 0 213 0 0 0 0 0
0 0 0 0 0 143 0 0 0 0
0 0 0 0 0 0 141 0 0 0
0 0 0 0 0 0 0 105 0 0
0 0 0 0 0 0 0 0 71 0
0 0 0 0 0 0 0 0 0 130
pander(table(ines$v0191, ines$progDems))
1 2 3 4 5 6 7 8 9 10
441 0 0 0 0 0 0 0 0 0
0 165 0 0 0 0 0 0 0 0
0 0 153 0 0 0 0 0 0 0
0 0 0 89 0 0 0 0 0 0
0 0 0 0 176 0 0 0 0 0
0 0 0 0 0 93 0 0 0 0
0 0 0 0 0 0 87 0 0 0
0 0 0 0 0 0 0 67 0 0
0 0 0 0 0 0 0 0 50 0
0 0 0 0 0 0 0 0 0 59
pander(table(ines$v0072, ines$turnout))
Did not vote Voted
0 1262
112 0
11 0
41 0

2.1 One-sample t-test

In a one-sample t-test, where you compare the mean of one variable against a fixed value. For example, to test whether the mean of a variable labour differs from 5:

pander(t.test(ines$labour, mu = 5))
One Sample t-test: ines$labour
Test statistic df P value Alternative hypothesis mean of x
0.3833 1381 0.7016 two.sided 5.03

2.2 Paired-sample t-test

Paired-sample t-test is where you compare the mean on two variables for the same individuals (e.g. a test score before and after a class). For example, to test whether the means of two variables, labour and progDems, for the same individuals differ:

pander(t.test(ines$labour, ines$progDems, paired = TRUE))
Paired t-test: ines$labour and ines$progDems (continued below)
Test statistic df P value Alternative hypothesis
13.14 1371 3.221e-37 * * * two.sided
mean of the differences
1.253

2.3 Two-sample t-test

The two-sample or independent-samples t-test is where you compare the mean on the same variable for two different groups. For example, does support for the Labour Party depend on whether a respondent is a member of a trade union?

pander(t.test(ines$labour ~ ines$union))
Welch Two Sample t-test: ines$labour by ines$union (continued below)
Test statistic df P value Alternative hypothesis
-3.21 865.5 0.001377 * * two.sided
mean in group No mean in group Yes
4.886 5.449

3 Chi-squared test

When producing a cross-table of two categorical variables, we can use the \(\chi^2\)-test to test whether the two variables are independent of each other or not. For example, to see if union members are more likely to participate in elections:

pander(chisq.test(table(ines$union, ines$turnout)))
Pearson’s Chi-squared test with Yates’ continuity correction: table(ines$union, ines$turnout)
Test statistic df P value
0.8159 1 0.3664

4 Exercises

Lab 5 has example code on recoding and testing whether the recode worked. Use that here as well for questions where necessary. Often it might be helpful to generate properly named and labelled variables first, then run the regression or analysis.

Create a new RMarkdown file for this lab and fill out the details in the header. Use it for the remainder of the questions.

First we look at attitudes towards abortion among younger voters.

For the following statements, formulate the null hypothesis and the alternative hypothesis, perform the appropriate t-test, and formulate the conclusion from the test:

To see whether individual with higher political efficacy are more likely to participate in elections: