Introduction

About Me

  • 2005-2009 Economist, East Asia (University Tuebingen)
  • 2010 OECD Statistics Directorate, Trade and Business Statistics (SQL)
  • 2011-2015 OECD Directorate for Science, Technology and Innovation (SAS, R)
  • 2016 FAO Statistics Directorate, Methodological Innovations
  • Website: rdata.work
  • GitHub: r4io
  • Email: r4io@rdata.work

Short Course Information

Instructor

  • Bo Werth

Time & Location

  • Time: 9:30 - 17:00
  • Location: OECD IT Traning MB MZ289

Website

What are we doing?

  • R Programming literacy
  • Data visualization

Requirements

  • The training accounts can access the OECD R server
  • The hands-on scripts are traversed by single-line or region execution

Short Course Description & Objectives

Provides an intensive, hands-on introduction to the R programming language. Prepares students with the fundamental programming skills required to start your journey to becoming a modern day data analyst.


Objectives

Upon successfully completing this course, students will:

  • Be up and running with R
  • Understand the different types of data R can work with
  • Understand the different structures in which R holds data
  • Be able to import data into R
  • Perform basic data wrangling activities with R
  • Compute basic descriptive statistics with R
  • Visualize their data with base R and ggplot graphics

Short Course Schedule & Material

R Programming I: Overview

  • Getting started with R
  • Importing data into R
  • Understanding data structures
  • Understanding data types
  • Shaping and transforming your data






Tomorrow

  • Base R graphics
  • ggplot graphics library






All required classroom material can be downloaded from the course website:

http://boot.rdata.work/r_bootcamp

Analytics & Programming

Why Program?

Why Program?

Flexibility

  • Frees us from point-n-click analysis software
  • Allows us to customize our analyses
  • Allows us to build analytic applications

Slows us down

  • Forces us to think about our analytic processes

Speeds the analysis up

  • Many statistical programming languages now leverage C++ and Java to speed up computation time

Reproducibility

  • Provides reproducibility that spreadsheet analysis cannot
  • Literate statistical programming is on the rise

Why R?

Why R?

Built for Analytics!

Why R?

Built for Analytics!



  • .csv, .txt, .xls, etc. files
  • web scraping: xml text nodes, html tables (rvest)
  • databases: Microsoft SQL Server, MySQL, Oracle, PostgreSQL, mongodb, etc.
  • SPSS, STATA, SAS

Why R?

Built for Analytics!



  • easy to create "tidy" data
  • works well with numerics, characters, dates, missing values
  • robust regex capabilities

Why R?

Built for Analytics!



  • joining disparate data sets
  • selecting, filtering, summarizing
  • great "pipe-line" process: %>%

Why R?

Built for Analytics!



  • R is known for its visualization capabilities
  • ggplot introduced grammar of graphics
  • interactive plotting - easily leverage D3.js libraries using htmlwidgets

Why R?

Built for Analytics!



  • built for statistical analyses
  • thousands of libraries provide many statistical capabilities
  • easy to build your own algorithms

Why R?

Built for Analytics!



  • RMarkdown (produce slides, HTML web pages, pdf, doc)
  • Shiny allows rapid prototyping of web applications (HTML / CSS / JS)
  • Reproducibility (communicate to your future self!)

Why R?

Great Community!

Why R?

Create Cool Stuff!

Getting Started

















“Programming is like kicking yourself in the face, sooner or later your nose will bleed.”

  • Kyle Woodbury

Installation

  1. Go to https://cran.r-project.org/
  2. Click "Download R for Mac/Windows"
  3. Download the appropriate file:
    • Windows users click Base, and download the installer for the latest R version
    • Mac users select the file R-3.X.X.pkg that aligns with your OS version
  4. Follow the instructions of the installer.

  1. Go to RStudio for desktop https://www.rstudio.com/products/rstudio/download/
  2. Select the install file for your OS
  3. Follow the instructions of the installer.

Note: There are other R IDE's available: Emacs, Microsoft R Open, Notepad++, etc.

Understanding the Console

Getting Help

Numerous help options are available internal and external to R. Within R, you can get help by:

# provides details for specific function
help(functionname)

# provides same information as help(functionname)
?functionname

# provides examples for said function
example(functionname)

External to R:

  • Google: just add "with R" at the end of any search.
  • Stack Overflow: a searchable Q&A site oriented toward programming issues. 75% of my answers typically come from Stack Overflow.
  • Cross Validated: a searchable Q&A site oriented toward statistical analysis.
  • R-bloggers: a central hub of content from over 500 bloggers who provide news and tutorials about R.

Set Your Directory

  • Keeping your files organized is critical
  • Get and set your working directory with the following:
# get your current working directory
getwd()
[1] "/home/xps13/Dropbox/Programming/R/Intro-to-R-Bootcamp"

# set your working directory
setwd("/home/xps13/Dropbox/Programming/R")

getwd()
[1] "/home/xps13/Dropbox/Programming/R"

Your Turn


Set your working directory to the "R-Bootcamp" folder you downloaded for this course.

R as a Calculator

R can be used as a simple calculator

# Uses PEMDAS convention for order of operations
4 + 3 / 10 ^ 2
## [1] 4.03
4 + (3 / 10 ^ 2)
## [1] 4.03
(4 + 3) / 10 ^ 2
## [1] 0.07

# large/small numbers will be displayed in scientific notation
1 / 17 ^ 7
## [1] 2.437011e-09

# Undefined caculations result in Inf or NaN
1 / 0
## [1] Inf
Inf - Inf
## [1] NaN

Simple Objects

Assign values to objects (aka variables) with "<-"

x <- 3                  # assign 3 to x
x                       # evaluate x
## [1] 3

x <- x + 1              # we can increment (build onto) existing objects
x
## [1] 4

Note that there are multiple ways to assign variables but best practice recommends using "<-"

x = 3                   # BAD
x <- 3                  # GOOD

Variable names are case sensitive:

x <- 3

X
Error: object 'x' not found

Your Turn

Economic Order Quantity Model: \[Q = \sqrt \frac{2DK}{h}\]


Calculate Q where:

  • D = 1000
  • K = 5
  • h = 0.25

hint: sqrt(x) \(= \sqrt x\)

Solution

D <- 1000
K <- 5
h <- .25

Q <- sqrt((2 * D * K) / h)
Q
## [1] 200

Workspace Environment

  • You probably have four objects in your global environment (Q, D, K, h)
  • History tab will show your recent code execution
  • You can list and remove objects from your global environment
# list all objects
ls()
## [1] "D" "h" "K" "Q"

# remove defined object from the environment
rm(D)

# removes everything in the working environment -- use with caution!
rm(list = ls())

Vectors

The fundamental object in R

Vector: a sequence of data elements of the same basic type

# the ":" operator can be used to create sequential vectors
1:10
##  [1]  1  2  3  4  5  6  7  8  9 10
-3:5
## [1] -3 -2 -1  0  1  2  3  4  5

# store a vector to variable x
x <- 1:10
x
##  [1]  1  2  3  4  5  6  7  8  9 10

# the "c" operator can be used to combine non-sequential elements
y <- c(2, 5, -1)
y
## [1]  2  5 -1

Note: We'll discuss vectors more later but for now you need to understand that…

Vectorization

A key difference between R and many other languages is the idea of vectorization.

In other languages you might have to run a loop to add two vectors together.

# two vectors to add
x <- c(1, 3, 4) 
y <- c(1, 2, 4)

# empty vector
z <- as.vector(NULL)

# `for` loop to add corresponding elements in each vector
for (i in seq_along(x)) {
        z[i] <- x[i] + y[i]
        print(z)
}
## [1] 2
## [1] 2 5
## [1] 2 5 8

Vectorization

In R, many arithmetic functions such as +, -, *, etc. are vectorized functions that can operate on entire vectors at once by applying underlying C code.

Significantly reduces the need for creating for loops

x + y
## [1] 2 5 8
x * y
## [1]  1  6 16
x > y
## [1] FALSE  TRUE FALSE

Beware of recycling

long <- 1:10
short <- 1:5

long
##  [1]  1  2  3  4  5  6  7  8  9 10
short
## [1] 1 2 3 4 5

long + short
##  [1]  2  4  6  8 10  7  9 11 13 15

Your Turn

Back to our EOQ Model: \(Q = \sqrt \frac{2DK}{h}\)


Calculate Q where:

  • D = 1000
  • K = 5
  • h = vector of values 0.25, 0.50, 0.75



hint: sqrt(x) \(= \sqrt x\)

hint 2: The c() may be handy here

Solution

D <- 1000
K <- 5
h <- c(.25, .50, .75)

Q <- sqrt((2 * D * K) / h)
Q
## [1] 200.0000 141.4214 115.4701

Working with Packages

The fundamental unit of shareable code is the package.


So how do we install these packages?

# install packages from CRAN
install.packages("packagename")

# install packages from Bioconductor
source("http://bioconductor.org/biocLite.R")            # only required the first time
biocLite()                                              # only required the first time
biocLite("packagename")

# install packages from GitHub
install.packages("devtools")                            # only required the first time
devtools::install_github("username/packagename")

Your Turn

Download these packages from CRAN:

dplyr
tidyr
ggplot2
stringr
lubridate

Solution

install.packages("dplyr")
install.packages("tidyr")
install.packages("ggplot2")
install.packages("stringr")
install.packages("lubridate")

# alternative
install.packages(c("dplyr", "tidyr", "ggplot2", "stringr", "lubridate"))

For a full list of useful packages see this guide

Using Packages

Loading packages

Once the package is downloaded to your computer you can access the functions and resources provided by the package in two different ways:

# load the package to use in the current R session
library(packagename)

# use a particular function within a package without loading the package
packagename::functionname

Getting help on packages

# provides details regarding contents of a package
help(package = "packagename")

# list vignettes available for a specific package
vignette(package = "packagename") 

# view specific vignette
vignette("vignettename")

Key Takeaways

Operator/Function Description Operator/Function Description
help() get help ls() list objects in working session
? get help rm() remove objects in current session
getwd() get working directory :, c() create vector
setwd() set working directory install.packages() install package from CRAN
+, -, *, /, ^ arithmetic library() load package
<- assignment vignette() view/list package vignette

Break

5 minutes!