Johan A. Dornschneider-Elkink
teaching

Data Analytics for the Social Sciences

POL 30660 Data Analytics for the Social Sciences

Friday, 9-11 am, B110 Newman Building (note final session in B109).

This module is co-taught by Yoo Sun Jung (first half) and Jos Elkink (second half).

Syllabus

You should install the necessary software on your computer prior to the first class. All software is free. R can be downloaded here.

You should also install RStudio, after installing R.

A handy overview of R regression commands can be found here: reference card for regression. A more general one for R is here: short reference card. The Google R Style guide provides suggestions for writing clear code.

1 21/1 Introduction slides lab output lecture: introduction | live session
videos: markdown | importing Excel | using packages
markdown cheat sheet
2 28/1 Distributions and descriptive statistics slides lab output lecture: levels of measurement | missing data | graphs | descriptive statistics
3 4/2 Comparing through visualisation slides lab output lecture: multiway graphs | control variables | project 1
Top 50 ggplot examples
11/2 Group presentations
4 18/2 Linear regression slides lab output lecture: linear regression | multiple regression | model selection
videos: recoding and merging (live session) | merging (demonstration)
5 25/2 Logistic regression slides lab output lecture: linear probability model | linear discriminant analysis | logistic regression | model fit
6 4/3 Trees and forests slides lab output lecture: trees | forests | project 2 (live)
7 25/3 Cluster analysis slides lab output lecture: intro | kmeans | hierarchical | dissimilarity | speeches
8 1/4 Dimension reduction slides lab output lecture: intro | mds | pca | factor analysis
9 8/4 Wordscores slides lab output lecture: intro | wordscores | wordfish
15/4 Good Friday
10 22/4 Topic models slides lab output lecture: topic models | putin | afterthoughts | project 3