This tutorial is written to help people understand some of the basics of shell script programming (aka shell scripting), and hopefully to introduce some of the possibilities of simple but powerful programming available under the Bourne shell. As such, it has been written as a basis for one-on-one or group tutorials and exercises, and as a reference for subsequent use.
Tutorials and courses on R, Python, and other data science topics.
Learn modern tech skills with the latest courses and labs in AWS, Azure and Google Cloud.
There are many options for running projects on cloud platforms, here are tutorials for 3 we like:
Using the BIMSB (soon to be called MAX) cluster environment is similar to using unix/linux environments for your job submission (e.g running your scripts or other software). The difference is that you need to specify needed resources beforehand. The cluster is controlled by a SGE (Sun Grid Engine Software) that organizes the queues and resources. This sort of scheduling system is necessary when limited computational resources are shared by many.
Website for "Efficient R Programming" covering general concepts and R programming techniques about code optimisation, before describing idiomatic programming structures.
Tutorial from research computing on Princeton's computing clusters.
lintr
is an R package offering static code analysis. It checks adherence to a given style, syntax errors and possible semantic issues
The mapsapi package provides an interface to the Google Maps APIs, currently four of them: Google Maps Direction API, Google Maps Distance Matrix API, Google Maps Geocode API, Maps Static API.
Online book and companion to "Mostly Harmless Econometrics"
A self-guided tour to help you find and analyze data using Stata, R, Excel and SPSS. The goal is to provide basic learning tools for classes, research and/or professional development
Joseph L. Dieleman and Tara Templin
2014 PLOSOne Paper: The random- and fixed-effects estimators (RE and FE, respectively) are two competing methods that address these problems. We conduct a simulation study to compare RE, FE, and WB estimation across 16,200 scenarios.
Documentation and guidance for the Python Client for Google Maps Services.
JupyterLab enables you to work with documents and activities such as Jupyter notebooks, text editors, terminals, and custom components in a flexible, integrated, and extensible manner.
Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation.
Documentation and guidelines for Numpy.
pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,
built on top of the Python programming language.
Tutorial on Python multithreading and multiprocessing.
An open source machine learning framework that accelerates the path from research prototyping to production deployment.
Altair is a declarative statistical visualization library for Python, based on Vega and Vega-Lite, and the source is available on GitHub.
Python 101 starts off with the fundamentals of Python and then builds from there. The audience of this book is primarily people who have programmed in the past but want to learn Python. This book covers a fair amount of intermediate level material in addition to the beginner material.
This is the online home of Geocomputation with R, a book on geographic data analysis, visualization and modeling.
Collection of common R GGplot2 data visualizations.
Guidelines on parallel computing in R.
This introduction to the plm package is a slightly modified version of Croissant and Millo (2008), published in the Journal of Statistical Software.
This is the book site for “R packages”.
Collection of R cheatsheets for popular packages.
The materials presented here teach spatial data analysis and modeling with R. R is a widely used programming language and software environment for data science. R also provides unparalleled opportunities for analyzing spatial data for spatial modeling.
Documentation and guidelines for Scikit-Learn.
The assertr package supplies a suite of functions designed to verify assumptions about data early in an analysis pipeline so that data errors are spotted early and can be addressed quickly.