You will have to download the data from the Harvard Dataverse
suppressMessages({
library(rio)
library(stargazer)
library(ggplot2)
library(MASS)
library(knitr)
library(pROC)
})
ross <- import("ReplicationdataRossVoeten.dta")
icc <- import("http://www.joselkink.net/files/data/icc.dta")
ross$iccSigned <- 0
ross$iccRatified <- 0
for (i in 1:dim(icc)[1]) {
ross$iccSigned[ross$ccode == icc$ccode[i] & ross$Year >= icc$signed[i]] <- 1
ross$iccRatified[ross$ccode == icc$ccode[i] & ross$Year >= icc$ratified[i]] <- 1
}
summary(ross$Year)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1960 1973 1987 1987 2001 2014
length(unique(ross$countryname))
## [1] 171
ross2001 <- subset(ross, Year == 2001)
So there are 171 countries over a time period from 1960 to 2014 - although of course not all variables will be available for all countries and all years.
The above code also downloads a small data file on membership to the International Criminal Court, a binary variable reflecting whether a country signed the treaty and a binary variable whether the country ratified it. For 2001 the distribution is as follows:
tbl <- table(ross2001$iccSigned, ross2001$iccRatified)
colnames(tbl) <- c("Not ratified", "Ratified")
rownames(tbl) <- c("Not signed", "Signed")
kable(addmargins(tbl))
Not ratified | Ratified | Sum | |
---|---|---|---|
Not signed | 88 | 1 | 89 |
Signed | 60 | 45 | 105 |
Sum | 148 | 46 | 194 |
kable(addmargins(floor(prop.table(tbl, 1) * 100), 2))
Not ratified | Ratified | Sum | |
---|---|---|---|
Not signed | 98 | 1 | 99 |
Signed | 57 | 42 | 99 |
So 42% of the signed treaties are subsequently ratified.
To avoid complications that are common in panel data, for the remainder of this lab we just use data from 2001.
Lets first try linear regression with a binary dependent variable. We will take dependent variable iccRatified, which is a binary variable where 1 is a country that ratified the Rome Statute and 0 one that has not. Our key independent variable is democracy, so we could say that we are trying to evaluate whether an important explanation of ICC membership is whether the country is democratic - but most of the analysis is focused on prediction rather than causal inference.
stargazer(lm(iccRatified ~ polity2, data = ross2001),
m1 <- lm(iccRatified ~ polity2 + logoil + lngdp + tradegdp + lnpop, data = ross2001),
type = "html")
Dependent variable: | ||
iccRatified | ||
(1) | (2) | |
polity2 | 0.026^{***} | 0.024^{***} |
(0.005) | (0.006) | |
logoil | 0.013 | |
(0.014) | ||
lngdp | 0.054^{**} | |
(0.026) | ||
tradegdp | -0.0004 | |
(0.001) | ||
lnpop | -0.012 | |
(0.023) | ||
Constant | 0.155^{***} | -0.068 |
(0.035) | (0.445) | |
Observations | 158 | 149 |
R^{2} | 0.160 | 0.224 |
Adjusted R^{2} | 0.154 | 0.197 |
Residual Std. Error | 0.394 (df = 156) | 0.392 (df = 143) |
F Statistic | 29.642^{***} (df = 1; 156) | 8.256^{***} (df = 5; 143) |
Note: | ^{}p<0.1; ^{}p<0.05; ^{}p<0.01 |
designMatrix <- m1$model
olsPredicted <- ifelse(predict(m1) > 0.5, "Ratified", "Not ratified")
Note that this designMatrix extracts the data set after the regression, so that all the missing cases are removed. If you change a model specification below, be careful that you might also have to change the model from which the design matrix is extracted, before you can, e.g., create a cross-table of predicted vs actual or a plot with predicted values.
Create a cross-table of iccRatified by olsPredicted (see code for iccSigned by iccRatified above). Due to missing data, you will need to use the designMatrix instead of ross2001 as the data set. How well do you think the model does in predicting the outcome? |
ggplot(designMatrix, aes(x = polity2, y = iccRatified, color = olsPredicted)) +
geom_jitter() +
geom_smooth(method = "lm", se = TRUE) +
labs(x = "Democracy", y = "ICC Treaty ratified", title = "OLS: Democracy and ICC membership")
Note that the plotted curve is for the simple regression, not the multiple regression, but the predicted classification is based on the multiple regression.
Lets next try a logistic regression for the same model.
stargazer(glm(iccRatified ~ polity2, data = ross2001, family = binomial(link = "logit")),
m2 <- glm(iccRatified ~ polity2 + logoil + lngdp + tradegdp + lnpop, data = ross2001, family = binomial(link = "logit")),
type = "html")
Dependent variable: | ||
iccRatified | ||
(1) | (2) | |
polity2 | 0.243^{***} | 0.212^{***} |
(0.059) | (0.062) | |
logoil | 0.139 | |
(0.096) | ||
lngdp | 0.221 | |
(0.170) | ||
tradegdp | -0.001 | |
(0.007) | ||
lnpop | -0.135 | |
(0.160) | ||
Constant | -2.517^{***} | -2.154 |
(0.479) | (2.986) | |
Observations | 158 | 149 |
Log Likelihood | -70.749 | -65.221 |
Akaike Inf. Crit. | 145.498 | 142.442 |
Note: | ^{}p<0.1; ^{}p<0.05; ^{}p<0.01 |
logitPredicted <- ifelse(predict(m2, type = "response") > 0.5, "Ratified", "Not ratified")
Create a cross-table of iccRatified by logitPredicted (see code for iccSigned by iccRatified above). Due to missing data, you will need to use the designMatrix instead of ross2001 as the data set. How well do you think the model does in predicting the outcome? |
ggplot(designMatrix, aes(x = polity2, y = iccRatified)) +
geom_jitter(aes(color = logitPredicted)) +
geom_smooth(method = "glm", se = TRUE, method.args = list(family = "binomial")) +
labs(x = "Democracy", y = "ICC Treaty ratified", title = "GLM: Democracy and ICC membership")