ACLED collects real-time data on the locations, dates, actors, fatalities, and types of all reported political violence and protest events across Africa, South Asia, Southeast Asia, the Middle East, Central Asia and the Caucasus, Latin America and the Caribbean, and Southeastern and Eastern Europe and the Balkans.
AidData is a collaborative initiative to provide products and services that promote the dissemination, analysis, and understanding of development finance information.
Shiro Kuriwaki
This memo outlines one way to organize a data analysis-oriented project in policy research or social sciences – an area that is increasingly data-driven, code-driven, and collaborative. Many of the principles described are taken verbatim from Gentzkow and Shapiro's guide referenced below.
The CEP is an interdisciplinary research centre at the LSE Research Laboratory. The CEP studies the determinants of economic performance at the level of the company, the nation, and the global economy.
Matthew Gentzkow and Jesse M. Shapiro
This handbook is about translating insights from experts in code and data into practical terms for empirical social scientists. It suggests principles students and research professionals should adopt manage and to create reproducible research projects. Principles include automation, version control, working with directories and keys, documentation, and management of projects.
This tutorial is written to help people understand some of the basics of shell script programming (aka shell scripting), and hopefully to introduce some of the possibilities of simple but powerful programming available under the Bourne shell. As such, it has been written as a basis for one-on-one or group tutorials and exercises, and as a reference for subsequent use.
The Constituency-Level Elections Archive (CLEA) is a repository of detailed election results at the constituency level for lower chamber and upper chamber legislative elections from around the world.
This 10-week training program is designed to prepare incoming pre-doctoral research fellows at the Princeton Empirical Studies of Conflict (ESOC) lab with the skills needed to support faculty research projects within ESOC, SPIA, and associated departments. The program draws from online courses for core data processing and visualization skills, research training materials from partnering organizations, and materials prepared by the team at ESOC.
The goal is to expose fellows to all aspects of the data driven research process. We touch upon topics such as best practices for data management, research methodologies used in the social sciences, and production-related skills like optimizing code for publication and working with LATEX. Ultimately, these skills provide fellows with a strong analytical foundation for careers in public service and future PhD study in political science, economics and related fields.
Tutorials and courses on R, Python, and other data science topics.
National Oceanic and Atmospheric Administration nighttime lights time series. The files are cloud-free composites made using all the available archived DMSP-OLS smooth resolution data for calendar years.
Jacob N. Shapiro
This guide is intended to provide a common set of research practices for the Empirical Studies of Conflict group at Princeton, as well as colleagues elsewhere. Practically speaking, following this guide should ensure that (a) team projects can be managed with minimum friction and (b) any research affiliate will be able to get up-to-speed on any given project rapidly and efficiently. The conventions detailed in the guide should allow new colleagues to begin work and collaboration quickly.
The Facebook Population Density Maps help nonprofit and multilateral agencies plan vaccination campaigns, respond to natural disasters, and evaluate rural electrification plans. These maps help researchers assess the ways in which climate change and urbanization impact where people live.
GADM provides maps and spatial data for all countries and their sub-divisions.
Global Barometer Surveys (GBS) is the first comprehensive effort to measure, at a mass level, the current social, political, and economic atmosphere around the world. It provides an independent, non-partisan, multidisciplinary view of public opinion on a range of policy-relevant issues.
Global Forest Watch offers the latest data, technology and tools that empower people everywhere to better protect forests.
Miriam A. Golden
This document lays out guidelines for collaborative arrangements and is a systematic literature review on authorship guidelines, drawing from different disciplines.
Learn modern tech skills with the latest courses and labs in AWS, Azure and Google Cloud.
There are many options for running projects on cloud platforms, here are tutorials for 3 we like:
Using the BIMSB (soon to be called MAX) cluster environment is similar to using unix/linux environments for your job submission (e.g running your scripts or other software). The difference is that you need to specify needed resources beforehand. The cluster is controlled by a SGE (Sun Grid Engine Software) that organizes the queues and resources. This sort of scheduling system is necessary when limited computational resources are shared by many.
Website for "Efficient R Programming" covering general concepts and R programming techniques about code optimisation, before describing idiomatic programming structures.
Tutorial from research computing on Princeton's computing clusters.
The Humanitarian Data Exchange (HDX) is an open platform for sharing data across crises and organisations. Launched in July 2014, the goal of HDX is to make humanitarian data easy to find and use for analysis. Their growing collection of datasets has been accessed by users in over 200 countries and territories.
NASA Landsat satellite imagery and data. Data is available to download on the USGS website.
Guide on using Latex and Overleaf
lintr
is an R package offering static code analysis. It checks adherence to a given style, syntax errors and possible semantic issues
The mapsapi package provides an interface to the Google Maps APIs, currently four of them: Google Maps Direction API, Google Maps Distance Matrix API, Google Maps Geocode API, Maps Static API.
Stephen L. Morgan
In the second edition of Counterfactuals and Causal Inference, the essential features of the counterfactual approach to observational data analysis are presented with examples from the social, demographic, and health sciences.
James G. MacKinnon
Conventional methods for inference using clustered standard errors work very well when the model is correct and the data satisfy certain conditions, but they can produce very misleading results in other cases. This paper discusses some of the issues that users of these methods need to be aware of.
Online book and companion to "Mostly Harmless Econometrics"
A self-guided tour to help you find and analyze data using Stata, R, Excel and SPSS. The goal is to provide basic learning tools for classes, research and/or professional development
Joseph L. Dieleman and Tara Templin
2014 PLOSOne Paper: The random- and fixed-effects estimators (RE and FE, respectively) are two competing methods that address these problems. We conduct a simulation study to compare RE, FE, and WB estimation across 16,200 scenarios.
The Minerva Research Initiative supports social science research aimed at improving our basic understanding of security, broadly defined. All supported projects are university-based and unclassified, with the intention that all work be shared widely to support thriving stable and safe communities. The goal is to improve DoD’s basic understanding of the social, cultural, behavioral, and political forces that shape regions of the world of strategic importance to the U.S.
NASA Earth Observatory Normalized Difference Vegetation Index (NDVI) data.
OpenStreetMap is a map of the world, created by people like you and free to use under an open license.
The PRIO-Grid data set is a spatio-temporal grid structure constructed to aid the compilation, management and analysis of spatial data within a time-consistent framework. It consists of quadratic grid cells that jointly cover all terrestrial areas of the world.
Documentation and guidance for the Python Client for Google Maps Services.
JupyterLab enables you to work with documents and activities such as Jupyter notebooks, text editors, terminals, and custom components in a flexible, integrated, and extensible manner.
Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation.
Documentation and guidelines for Numpy.
pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,
built on top of the Python programming language.
Tutorial on Python multithreading and multiprocessing.
An open source machine learning framework that accelerates the path from research prototyping to production deployment.
Altair is a declarative statistical visualization library for Python, based on Vega and Vega-Lite, and the source is available on GitHub.
Python 101 starts off with the fundamentals of Python and then builds from there. The audience of this book is primarily people who have programmed in the past but want to learn Python. This book covers a fair amount of intermediate level material in addition to the beginner material.
This is the online home of Geocomputation with R, a book on geographic data analysis, visualization and modeling.
Collection of common R GGplot2 data visualizations.
Introductory guide on reproducible geospatial analysis using Raster and Vector data in R.
Guidelines on parallel computing in R.
This introduction to the plm package is a slightly modified version of Croissant and Millo (2008), published in the Journal of Statistical Software.
osmdata is an R package for downloading and using data from OpenStreetMap (OSM).
This is the book site for “R packages”.