Tidy Data
There are three interrelated rules which make a dataset tidy:
- Each variable must have its own column.
- Each observation must have its own row.
- Each value must have its own cell.
That interrelationship leads to an even simpler set of practical instructions:
- Put each dataset in a tibble.
- Put each variable in a column.
Lesson
Download the script that generates the tables for the lesson here
- Tidy Data
Exercise
- Are the bike counts data tidy data?
- If not, why not? And how can we tidy it?
- Convert the total bike counts data to a wide format, with each row representing a day, and a column representing the total bike counts for each of the three bridges;
- Convert the above data frame in wide format back to long format.
- [Challenge] After tidying the bike counts, using functions in the
tidyr
package, create tables summarizing the average bike counts by bridge and day of week in two different formats:
Bike Counts by Day of Week and Bridge (1st Format)
Hawthorne |
|
|
|
|
|
|
|
Tilikum |
|
|
|
|
|
|
|
Bike Counts by Day of Week and Bridge (2nd Format)
Fri |
|
|
Mon |
|
|
Sat |
|
|
Sun |
|
|
Thur |
|
|
Tue |
|
|
Wed |
|
|
Sample code: tidy_counts.R