Working with time-series data can be a challenge for new and experienced R users. You will often have to format the date, time, and timezone when working with raw data. R does not automatically recognize date-time formats and there are many formats for representing date-time (e.g. yyyy-mm-dd, mm-dd-yy, mm/dd/yyyy hh:mm:ss).
lubridate
is a handy package that is installed as part of the tidyverse
installation but does not automatically load when you call for the tidyverse
package (library(tidyverse)
). You have to explicitly call the package when you need it with library(tidyverse)
.
Download the sample data set from here (left-click & Save link as…), and move it to the data folder of your working drive.
Load the library and sample dataset.
library(lubridate)
library(dplyr)
leesferry <- read.csv("data/leesferry.csv", stringsAsFactors = F)
Check the structure of your dataframe and you’ll notice the date.time
attribute is a character string. You’ll also notice that the timezone is MST.
str(leesferry)
## 'data.frame': 43391 obs. of 7 variables:
## $ date.time : chr "1926-01-11 00:00:00" "1926-02-25 00:00:00" "1926-03-26 00:00:00" "1926-05-24 00:00:00" ...
## $ timezone : chr "MST" "MST" "MST" "MST" ...
## $ site.name : chr "Lees" "Lees" "Lees" "Lees" ...
## $ parameter : chr "p71851" "p71851" "p71851" "p71851" ...
## $ description: chr "Nitrate, water, filtered, milligrams per liter as nitrate" "Nitrate, water, filtered, milligrams per liter as nitrate" "Nitrate, water, filtered, milligrams per liter as nitrate" "Nitrate, water, filtered, milligrams per liter as nitrate" ...
## $ measurement: chr "15" "8.4" "0.8" "0" ...
## $ units : chr "mg/l asNO3" "mg/l asNO3" "mg/l asNO3" "mg/l asNO3" ...
When assigning timezones to your date-time attribute use the Olson name which can be called by OlsonNames()
of which there are over 600. You can also look at a list here.
To format the date.time
attribute with the appropriate timezone use the following lubridate
function:
leesferry$date.time <- ymd_hms(leesferry$date.time, tz = "US/Mountain")
To extract the year, month, or day of a date:
# Extract and create new column for year
leesferry$year <- year(leesferry$date.time)
# Extract and create new columns for month and day of week
leesferry_dow <- leesferry %>%
mutate(month = month(date.time)) %>%
mutate(dow = wday(date.time, label = T, abbr = T))
There are a number of functions with lubridate that can make working with time-series data including but not limited to:
Check out the vignette for these additional functions.