Split-Apply-Combine

Split-Apply-Combine

A common analytical pattern is to:

Generally avoid using loops when you need to do Split-Apply-Combine, consider these alternatives instead:

  1. Entry level: dplyr::group_by()
  2. General approach: nesting
  3. *aplly functions and plyr package (non-tidyverse solution)

Lesson

Exercise

  1. Fit linear regression models of the daily bike counts on percipitation, min and max temperature, first for all bridges together and then for each bridge separately using the split-apply-combine pattern;
  2. Extract the results from models in the above step:
    1. Compare the R-squares of the bridge-specific model. The bike traffic of which bridge has the highest correlation with percipitation, min and max temperature?
    2. Which model has the largest percipitation coefficient? Temperature coefficient?

Sample code

Resources:

  1. purrr package
  2. purrr tutorial
  3. Software Carpentry lesson on Split-Apply-Combine