August 6, 2016

because…


"Data sets contain more information than they display."


- Garrett Grolemund                     

Key Things to Remember

What to Remember from this Section

dplyr is a package that transforms and manipulates data


select(): select variables of concern


filter(): filter values based on conditions


group_by(): group data by categorical levels


summarise(): change unit of analysis


arrange(): order the data

What to Remember from this Section

dplyr is a package that transforms and manipulates data


mutate(): create new variables


join(): combine separate data sets

GR&A

That means grounds rules & assumptions

R package used…




install.packages("dplyr")


library(dplyr)

data used…




install.packages("EDAWR")


library(EDAWR)


Data sets: storms, tb, pollution, iris, a, b

%>% operator…

learn it, love it, leverage it



filter(data, variable == numeric_value)

or

data %>% filter(variable == numeric_value)

%>% operator…

learn it, love it, leverage it

arrange(
        summarize(
                filter(data, variable == "numeric_value"),
                Total = sum(variable)
        ),
        desc(Total)
)
a <- filter(data, variable == "numeric_value")
b <- summarise(a, Total = sum(variable))
c <- arrange(b, desc(Total))
data %>%
   14     filter(variable == "value") %>%
        summarise(Total = sum(variable)) %>%
        arrange(desc(Total))
Same results but the %>% operator is more efficient and legible

select( )

Select variables of concern

select( )

select( )

select( )

select( )