Split-Apply-Combine

Split-Apply-Combine

A common analytical pattern is to:

  • split data into pieces,
  • apply some function to each piece,
  • combine the results back together again.

Generally avoid using loops when you need to do Split-Apply-Combine, consider these alternatives instead:

  1. Entry level: dplyr::group_by()
  2. General approach: nesting
  3. *aplly functions and plyr package (non-tidyverse solution)

Exercise

  1. Fit linear regression models of the daily bike counts on percipitation, min and max temperature, first for all bridges together and then for each bridge separately using the split-apply-combine pattern;
  2. Extract the results from models in the above step:
    1. Compare the R-squares of the bridge-specific model. The bike traffic of which bridge has the highest correlation with percipitation, min and max temperature?
    2. Which model has the largest percipitation coefficient? Temperature coefficient?

Sample code