Getting the data

OECD.Stat

Import Data

Interactive

  • click "Import Dataset" button in RStudio
  • make sure header rows are identified properly
  • import the data


readr package

  • use read_csv function
read_csv("data/TEC7_REV4_26092017092200473.csv")

Selecting and Filtering

dplyr data preparation

  • dataset has over 200k observations and 17 columns
  • use select function from dplyr to keep vars for plot
  • use filter function to reduce dataset


  • REPORTER: FRA, ITA, DEU
  • TOWNERSHIP: F
  • SECTOR: TOTAL
  • PARTNER: TOTAL
  • Year: 2011

Plotting

ggplot2

  • show REPORTER on x-axis, Value on y-axis
  • map FLOW to fill
  • create (dodged) column chart using stat="identity"
  • facet to rows by Indicator applying free y-scales

Modify factor labels

join

  • create data frame with new labels using data.frame
  • the legend for fill has two values for FLOW
  • 1 corresponds to imports, 2 to exports
  • join labels on data using left_join

Key Things to Remember

Remember These Functions!

Operator/Function Description
%>% global chaining operator (provided by the magrittr package)
+ ggplot2-specific chaining operator
read_csv() function for reading in .csv files (provided by the readr package)
filter() reduce number of observations
select() reduce number of columns
ggplot() initiate plot
geom_bar() add column chart
facet_grid() add facetting in ggplot2

Solutions in script 20-case-study-1.R