library(rio)
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.2. https://CRAN.R-project.org/package=stargazer
In this class we will lab at multiple regression, using replication data from Ross (2004). While Ross uses panel data - multiple countries over multiple years - we will select only one year to avoid complications with time-series data. This data set has already been prepared:
ross <- import("http://www.joselkink.net/wp-content/uploads/2013/01/ross_1997.dta")
Check out the codebook for a description of the relevant variables.
Estimating a multiple regression model - once you already know how to estimate a simple regression - is a straightforward extension. You simply add the variables to the regression equation. For example, regressing corruption on democracy would be as follows:
lm(corruption ~ democracy, ross)
##
## Call:
## lm(formula = corruption ~ democracy, data = ross)
##
## Coefficients:
## (Intercept) democracy
## 2.0259 0.2004
If we wanted to add as a control variable, the level of economic performance, we might include the GDP per capita:
lm(corruption ~ democracy + gdppc, ross)
##
## Call:
## lm(formula = corruption ~ democracy + gdppc, data = ross)
##
## Coefficients:
## (Intercept) democracy gdppc
## 1.990e+00 1.145e-01 7.747e-05
As a side-note, in the case of a variable that relates to money, like GDP per capita, or a size, like the population, we typically have a very skewed distribution. The relationship is then likely to be non-linear and you obtain better results with a linear regression using a log transformed variable. For example:
lm(corruption ~ democracy + log(gdppc), ross)
##
## Call:
## lm(formula = corruption ~ democracy + log(gdppc), data = ross)
##
## Coefficients:
## (Intercept) democracy log(gdppc)
## -1.3510 0.1189 0.4674
(As it happens, the data sets already contains a variable called loggdppc, but I wanted to include an example that can be used when this is not available already.)
Typically, rather than looking at this output directly we would save the output as an R object and then use a package that presents the results better:
regOutput <- lm(corruption ~ democracy + log(gdppc), ross)
stargazer(regOutput, type = "html", style = "ajps")
corruption | |
democracy | 0.119^{***} |
(0.037) | |
log(gdppc) | 0.467^{***} |
(0.102) | |
Constant | -1.351^{*} |
(0.754) | |
N | 100 |
R-squared | 0.416 |
Adj. R-squared | 0.404 |
Residual Std. Error | 0.968 (df = 97) |
F Statistic | 34.610^{***} (df = 2; 97) |
^{}p < .01; ^{}p < .05; ^{}p < .1 |
We see that once we control for the log of GDP per capita, the estimated impact of higher levels of democracy on corruption is halved.