Johan A. Dornschneider-Elkink

Data Analytics for the Social Sciences

POL 30660 Data Analytics for the Social Sciences

Friday, 9-11 am, B110 Newman Building (note final session in B109).

This module is co-taught by Yoo Sun Jung (first half) and Jos Elkink (second half).


You should install the necessary software on your computer prior to the first class. All software is free. R can be downloaded here.

You should also install RStudio, after installing R.

A handy overview of R regression commands can be found here: reference card for regression. A more general one for R is here: short reference card. The Google R Style guide provides suggestions for writing clear code.

1 21/1 Introduction slides lab output lecture: introduction | live session
videos: markdown | importing Excel | using packages
markdown cheat sheet
2 28/1 Distributions and descriptive statistics slides lab output lecture: levels of measurement | missing data | graphs | descriptive statistics
3 4/2 Comparing through visualisation slides lab output lecture: multiway graphs | control variables | project 1
Top 50 ggplot examples
11/2 Group presentations
4 18/2 Linear regression slides lab output lecture: linear regression | multiple regression | model selection
videos: recoding and merging (live session) | merging (demonstration)
5 25/2 Logistic regression slides lab output lecture: linear probability model | linear discriminant analysis | logistic regression | model fit
6 4/3 Trees and forests slides lab output lecture: trees | forests | project 2 (live)
7 25/3 Cluster analysis slides lab output lecture: intro | kmeans | hierarchical | dissimilarity | speeches
8 1/4 Dimension reduction slides lab output lecture: intro | mds | pca | factor analysis
9 8/4 Wordscores slides lab output lecture: intro | wordscores | wordfish
15/4 Good Friday
10 22/4 Topic models slides lab output lecture: topic models | putin | afterthoughts | project 3