Opening data

For Project 2 we start with an entirely different data set. We will make use of the replication data of a published paper, Michael L. Ross and Erik Voeten (2016), “Oil and international cooperation”, in International Studies Quarterly 60(1):85–97. This data is so-called panel data, which means that it has a set of units, observed over a range of different time periods. In this case countries over a number of years.

Ross and Voeten refer to the concept of “structured international organisations”, which are international organisations that have a reasonable level of organisational structure, without being a full-blown supranational organisation. I.e. more structured than a trade agreement, but less structured than the World Bank.

You will have to download the data from the Harvard Dataverse

suppressMessages({
  library(rio)
  library(stargazer)
  library(ggplot2)
  library(car)
  library(leaps)
  library(MASS)
})

ross <- import("ReplicationdataRossVoeten.dta")

summary(ross$Year)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1960    1973    1987    1987    2001    2014
length(unique(ross$countryname))
## [1] 171
ross2001 <- subset(ross, Year == 2001)

So there are 171 countries over a time period from 1960 to 2014 - although of course not all variables will be available for all countries and all years.

To avoid complications that are common in panel data, for the remainder of this lab we just use data from 2001.

Based on the previous lab, produce a table with summary statistics for oil exports (“oilexp”) and membership in structured international organisations (“strucint”).

Simple regression

Lets first look at a simple regression - i.e. one independent variable only.

stargazer(lm(strucint ~ oilexp, data = ross2001),
          type = "html")
Dependent variable:
strucint
oilexp -4.780**
(2.257)
Constant 39.240***
(0.720)
Observations 161
R2 0.027
Adjusted R2 0.021
Residual Std. Error 8.282 (df = 159)
F Statistic 4.485** (df = 1; 159)
Note: p<0.1; p<0.05; p<0.01
ggplot(ross2001, aes(x = oilexp, y = strucint)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  geom_smooth(color = "red", se = FALSE) +
  labs(x = "Oil exports", y = "Memberships in structured IOs", title = "Oil exports and IO membership")
## Warning: Removed 33 rows containing non-finite values (stat_smooth).
## `geom_smooth()` using method = 'loess'
## Warning: Removed 33 rows containing non-finite values (stat_smooth).
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at -0.0049131
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 0.025916
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 1.0517e-15
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 0.00044113
## Warning: Removed 33 rows containing missing values (geom_point).

This produces a scatter plot of the raw data, a blue regression line with 95% confidence interval, and a red smooth curve (ignore the many warnings, due to the awkward distribution of the data here).

Verify that the regression table output matches how you read the regression plot.
What do you conclude about the relationship between oil exports and membership of structured international organisations?

Visual controls

Visually, we can easily run regressions for different groups, for example to compare democracies vs non-democracies.

ross2001$democracyf <- recode(ross2001$democracy, 
                              "1='Democracy'; 0='Non-democracy'; else=NA", 
                              as.factor.result = TRUE)

ggplot(ross2001, aes(x = oilexp, y = strucint)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(x = "Oil exports", y = "Memberships in structured IOs", title = "Oil exports and IO membership") +
  facet_grid(. ~ democracyf, )
## Warning: Removed 33 rows containing non-finite values (stat_smooth).
## Warning: Removed 33 rows containing missing values (geom_point).

ggplot(ross2001, aes(x = oilexp, y = strucint, color = democracyf)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(x = "Oil exports", y = "Memberships in structured IOs", title = "Oil exports and IO membership")
## Warning: Removed 33 rows containing non-finite values (stat_smooth).

## Warning: Removed 33 rows containing missing values (geom_point).