Opening data

You will have to download the data from the Harvard Dataverse

suppressMessages({
  library(rio)
  library(stargazer)
  library(ggplot2)
  library(MASS)
  library(knitr)
  library(pROC)
})

ross <- import("ReplicationdataRossVoeten.dta")
icc <- import("http://www.joselkink.net/files/data/icc.dta")

ross$iccSigned <- 0
ross$iccRatified <- 0
for (i in 1:dim(icc)[1]) {
  ross$iccSigned[ross$ccode == icc$ccode[i] & ross$Year >= icc$signed[i]] <- 1
  ross$iccRatified[ross$ccode == icc$ccode[i] & ross$Year >= icc$ratified[i]] <- 1
}

summary(ross$Year)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1960    1973    1987    1987    2001    2014
length(unique(ross$countryname))
## [1] 171
ross2001 <- subset(ross, Year == 2001)

So there are 171 countries over a time period from 1960 to 2014 - although of course not all variables will be available for all countries and all years.

The above code also downloads a small data file on membership to the International Criminal Court, a binary variable reflecting whether a country signed the treaty and a binary variable whether the country ratified it. For 2001 the distribution is as follows:

tbl <- table(ross2001$iccSigned, ross2001$iccRatified)
colnames(tbl) <- c("Not ratified", "Ratified")
rownames(tbl) <- c("Not signed", "Signed")
kable(addmargins(tbl))
Not ratified Ratified Sum
Not signed 88 1 89
Signed 60 45 105
Sum 148 46 194
kable(addmargins(floor(prop.table(tbl, 1) * 100), 2))
Not ratified Ratified Sum
Not signed 98 1 99
Signed 57 42 99

So 42% of the signed treaties are subsequently ratified.

To avoid complications that are common in panel data, for the remainder of this lab we just use data from 2001.

Linear regression

Lets first try linear regression with a binary dependent variable. We will take dependent variable iccRatified, which is a binary variable where 1 is a country that ratified the Rome Statute and 0 one that has not. Our key independent variable is democracy, so we could say that we are trying to evaluate whether an important explanation of ICC membership is whether the country is democratic - but most of the analysis is focused on prediction rather than causal inference.

stargazer(lm(iccRatified ~ polity2, data = ross2001),
          m1 <- lm(iccRatified ~ polity2 + logoil + lngdp + tradegdp + lnpop, data = ross2001),
          type = "html")
Dependent variable:
iccRatified
(1) (2)
polity2 0.026*** 0.024***
(0.005) (0.006)
logoil 0.013
(0.014)
lngdp 0.054**
(0.026)
tradegdp -0.0004
(0.001)
lnpop -0.012
(0.023)
Constant 0.155*** -0.068
(0.035) (0.445)
Observations 158 149
R2 0.160 0.224
Adjusted R2 0.154 0.197
Residual Std. Error 0.394 (df = 156) 0.392 (df = 143)
F Statistic 29.642*** (df = 1; 156) 8.256*** (df = 5; 143)
Note: p<0.1; p<0.05; p<0.01
designMatrix <- m1$model

olsPredicted <- ifelse(predict(m1) > 0.5, "Ratified", "Not ratified")

Note that this designMatrix extracts the data set after the regression, so that all the missing cases are removed. If you change a model specification below, be careful that you might also have to change the model from which the design matrix is extracted, before you can, e.g., create a cross-table of predicted vs actual or a plot with predicted values.

Create a cross-table of iccRatified by olsPredicted (see code for iccSigned by iccRatified above). Due to missing data, you will need to use the designMatrix instead of ross2001 as the data set. How well do you think the model does in predicting the outcome?
ggplot(designMatrix, aes(x = polity2, y = iccRatified, color = olsPredicted)) +
  geom_jitter() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(x = "Democracy", y = "ICC Treaty ratified", title = "OLS: Democracy and ICC membership")

Note that the plotted curve is for the simple regression, not the multiple regression, but the predicted classification is based on the multiple regression.

Logistic regression

Lets next try a logistic regression for the same model.

stargazer(glm(iccRatified ~ polity2, data = ross2001, family = binomial(link = "logit")),
          m2 <- glm(iccRatified ~ polity2 + logoil + lngdp + tradegdp + lnpop, data = ross2001, family = binomial(link = "logit")),
          type = "html")
Dependent variable:
iccRatified
(1) (2)
polity2 0.243*** 0.212***
(0.059) (0.062)
logoil 0.139
(0.096)
lngdp 0.221
(0.170)
tradegdp -0.001
(0.007)
lnpop -0.135
(0.160)
Constant -2.517*** -2.154
(0.479) (2.986)
Observations 158 149
Log Likelihood -70.749 -65.221
Akaike Inf. Crit. 145.498 142.442
Note: p<0.1; p<0.05; p<0.01
logitPredicted <- ifelse(predict(m2, type = "response") > 0.5, "Ratified", "Not ratified")
Create a cross-table of iccRatified by logitPredicted (see code for iccSigned by iccRatified above). Due to missing data, you will need to use the designMatrix instead of ross2001 as the data set. How well do you think the model does in predicting the outcome?
ggplot(designMatrix, aes(x = polity2, y = iccRatified)) +
  geom_jitter(aes(color = logitPredicted)) +
  geom_smooth(method = "glm", se = TRUE, method.args = list(family = "binomial")) +
  labs(x = "Democracy", y = "ICC Treaty ratified", title = "GLM: Democracy and ICC membership")