More Datetime Howto

Jeff Newmiller

September 9, 2018

Desirable features for handling date/times

Cleaned Residential Data

Sample residential electric load data from London, England1 (pre-trimmed to one house and odd records removed)

dta <- read.csv( "../data/MAC000002clean.csv", as.is = TRUE )
str(dta)
## 'data.frame':    24140 obs. of  3 variables:
##  $ LCLid   : chr  "MAC000002" "MAC000002" "MAC000002" "MAC000002" ...
##  $ DateTime: chr  "2012-10-12 00:30:00" "2012-10-12 01:00:00" "2012-10-12 01:30:00" "2012-10-12 02:00:00" ...
##  $ KWH     : num  0 0 0 0 0 0 0 0 0 0 ...

Base R Timestamp

Tell R to assume timezone is Greenwich Mean Time (or Universal Time Coordinated = UTC)

Sys.setenv( TZ = "GMT" ) # when you don't know how the data was coded, use GMT to get started

Make a Dtm column using base R. The “%Y-%m-%d %H:%M:%OS” format works by default, but many common formats need to be specified with a format argument (?strptime).

dta_b <- dta  # make a copy so we can compare approaches later
dta_b$Dtm <- as.POSIXct( dta_b$DateTime ) # assumes TZ is set
str( dta_b$Dtm ) # confirming new column type
##  POSIXct[1:24140], format: "2012-10-12 00:30:00" "2012-10-12 01:00:00" "2012-10-12 01:30:00" ...

POSIXct is a time representation that borrows heavily from the Portable Operating Systems Interface (POSIX) standard, which uses integer values to indicate time as the number of seconds since January 1, 1970 Greenwich Mean Time. R uses a double-precision floating point number instead of an integer, but is otherwise very similar to the original standard.

Aside: Timezones

Excel basically acts like all time is in GMT all the time… if you compute '2012-03-26' - '2012-03-25' in Excel you get 1 day (24 hours), even though in London, March 25, 2012 was the beginning of daylight savings time so the day was 23 hours long. For many uses this is fine, but R timestamps always keep timezones in mind so if you want to use simplified time in R like you can in Excel then you need to set the timezone to GMT before you do time calculations.

You can set a default timezone for time calculations in a particular R session in the manner of the previous page, or you can set an attribute "tzone" on each timezone variable. Timezones affect how the character strings are converted to POSIXct and back to character. It also affects how POSIXct <-> POSIXlt conversions behave.

dta_b$DtmGMT <- as.POSIXct( dta_b$DateTime, tz = "GMT" )
attr( dta_b$Dtm, "tzone" )
## [1] ""
attr( dta_b$DtmGMT, "tzone" )
## [1] "GMT"

It is not possible to set a separate timezone on individual POSIXct elements within a vector.

Aside 2: POSIXlt (List or Long Time)

POSIXlt is the base R tool for manipulating the parts of a timestamp:

Sys.setenv( TZ = "UTC" )
# see ?as.POSIXlt
dtm2 <- as.POSIXlt( c( "2016-03-13 01:00:00", "2016-03-13 03:00:00" ) )
dtm2
## [1] "2016-03-13 01:00:00 UTC" "2016-03-13 03:00:00 UTC"
dtm2[ 1 ] < dtm2[ 2 ]
## [1] TRUE
diff( dtm2 )
## Time difference of 2 hours

Aside 2 continued: POSIXlt Internals

See ?DateTimeClasses.

year is based from 1900, mon represents January as 0, wday starts at 0 for Sunday, yday starts at 0 for January 1.

str( unclass( dtm2 ) )
## List of 9
##  $ sec  : num [1:2] 0 0
##  $ min  : int [1:2] 0 0
##  $ hour : int [1:2] 1 3
##  $ mday : int [1:2] 13 13
##  $ mon  : int [1:2] 2 2
##  $ year : int [1:2] 116 116
##  $ wday : int [1:2] 0 0
##  $ yday : int [1:2] 72 72
##  $ isdst: int [1:2] 0 0
##  - attr(*, "tzone")= chr "UTC"
dtm2$year + 1900
## [1] 2016 2016

Date-Time Classes

These are introduced in the help page ?DateTimeClasses and were discussed in R News 2004-42:

Note that base R does not support working with time-of-day only, since the length of a day can be different in different timezones and/or on different calendar days.

Date (No Time)

Sys.setenv( TZ = "UTC" )
dt1a <- as.Date( "2013-03-13" ) # see ?as.Date
dt1b <- as.Date( "3/21/2013", format="%m/%d/%Y" ) # see ?strptime
dt1b
## [1] "2013-03-21"
as.numeric( dt1b )
## [1] 15785
dt1a < dt1b
## [1] TRUE
dt1b - dt1a
## Time difference of 8 days

POSIXct (Continuous or Compact Time)

Most flexible for computing with instants of time. Can represent precision of smaller than one second, but results may be unreliable due to floating point rounding.

Sys.setenv( TZ = "UTC" )
dtm1 <- as.POSIXct( c( "2016-03-13 01:00:00", "2016-03-13 03:00:00" ) )
dtm1
## [1] "2016-03-13 01:00:00 UTC" "2016-03-13 03:00:00 UTC"
as.numeric( dtm1 )
## [1] 1457830800 1457838000
dtm1[ 1 ] < dtm1[ 2 ]
## [1] TRUE
diff( dtm1 )
## Time difference of 2 hours

POSIXlt (List or Long Time)

Sys.setenv( TZ = "UTC" )
# see ?as.POSIXlt
dtm2 <- as.POSIXlt( c( "2016-03-13 01:00:00", "2016-03-13 03:00:00" ) )
dtm2
## [1] "2016-03-13 01:00:00 UTC" "2016-03-13 03:00:00 UTC"
dtm2[ 1 ] < dtm2[ 2 ]
## [1] TRUE
diff( dtm2 )
## Time difference of 2 hours

POSIXlt Internals

See ?DateTimeClasses. year is based from 1900, mon represents January as 0, wday starts at 0 for Sunday, yday starts at 0 for January 1.

str( unclass( dtm2 ) )
## List of 9
##  $ sec  : num [1:2] 0 0
##  $ min  : int [1:2] 0 0
##  $ hour : int [1:2] 1 3
##  $ mday : int [1:2] 13 13
##  $ mon  : int [1:2] 2 2
##  $ year : int [1:2] 116 116
##  $ wday : int [1:2] 0 0
##  $ yday : int [1:2] 72 72
##  $ isdst: int [1:2] 0 0
##  - attr(*, "tzone")= chr "UTC"
dtm2$year + 1900
## [1] 2016 2016

difftime for Durations

The amount of time between two points in time is treated differently than the points in time themselves. You cannot add two POSIXct values, but you can add a POSIXct with as many difftime values as desired.

diftm1 <- as.difftime( 30, units="mins" ) # see ?as.difftime
dtm1[ 1 ] + diftm1 
## [1] "2016-03-13 01:30:00 UTC"
dtm1[ 1 ] + as.difftime( 2, units="weeks" )
## [1] "2016-03-27 01:00:00 UTC"

difftime Numeric Equivalent

If you need to know the value of a difftime you must remember to specify the units or you may get whatever “convenient” units R wants to use:

as.numeric( diftm1 ) # not recommended
## [1] 30
as.numeric( diftm1, units="mins" )
## [1] 30
as.numeric( diftm1, units="secs" )
## [1] 1800

Timezones (1)

Time zones are identified using string labels that are technically OS-dependent, but for Windows/Mac/Linux the Olson database is used so this is fairly widely applicable 5.

on <- OlsonNames()
tail( on ) # a few examples
## [1] "US/Pacific-New" "US/Samoa"       "UTC"            "W-SU"          
## [5] "WET"            "Zulu"
grep( "Los_Angeles", on, value=TRUE )
## [1] "America/Los_Angeles"

Note that even though sometimes R will use a 3-letter timezone abbreviation when displaying a datetime value, such shorthand is usually not acceptable for specifying the timezone.

Timezones (2)

No matter what timezone you use, the underlying numeric value of a POSIXct will be assumed to count from the origin instant in GMT.

If you don’t have any reason to be concerned with timezones in your data, you can make life “easy” for yourself by setting your working timezone to be “GMT” or “UTC”.

Converting Date to POSIXct always treats the date as beginning of the day in GMT, so if you use any other timezone for other values then you will want to “force” the timezone to be compatible with any other POSIXct values you may be working with.

Note that each vector of POSIXct can have its own timezone, but some functions can cause that timezone to get lost, or will create time values internally using the default (TZ) timezone, so it is simplest to change the TZ as needed while doing input, then use some single timezone of your choosing while doing calculations and generating output.

lubridate package (1)

The lubridate6 package provides many “helper” functions for working with POSIXct and Date values.

library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
mdy( "3/14/2013" ) == as.Date( "3/14/2013", format="%m/%d/%Y" )
## [1] TRUE
dmy_hms( "14/3/13 1:15:45" ) == as.POSIXct( "14/3/13 1:15:45", format = "%d/%m/%y %H:%M:%S")
## [1] TRUE

lubridate package (2)

You can repair a time value that was converted to POSIXct with the wrong timezone:

dtm1[ 1 ]
## [1] "2016-03-13 01:00:00 UTC"
force_tz( dtm1, "US/Pacific" ) # this is a different point in time
## [1] "2016-03-13 01:00:00 PST" "2016-03-13 03:00:00 PDT"

Or you can display a given instant of time using a different timezone:

with_tz( dtm1, "US/Pacific" )
## [1] "2016-03-12 17:00:00 PST" "2016-03-12 19:00:00 PST"
# which is easier to remember than
# attr( dtm1, "tzone" ) <- "US/Pacific"

lubridate package (3)

Three additional ways beyond difftime to represent time intervals are also provided:

interval( dtm1[ 1 ], dtm1[ 2 ] ) # a very specific interval of time
## [1] 2016-03-13 01:00:00 UTC--2016-03-13 03:00:00 UTC
dtm1PT <- force_tz( dtm1[ 1 ], "US/Pacific" )
dtm1PT + days( 1 ) # add a 1 day period (acts like a calendar)
## [1] "2016-03-14 01:00:00 PDT"
dtm1PT + ddays( 1 ) # add a 1 day duration (much like difftime(1,units="days"))
## [1] "2016-03-14 02:00:00 PDT"

There exists a cheat sheet summary of lubridate functions.7

Other approaches to handling time

Some people think the POSIXt approach is too rigid, and try out their own ideas for handling time:

Detour: Floating Point Error

R FAQ 7.318 warns against depending on exact results when using floating point fractions:

x <- 0.3     # floating point is always approximate
0.6 == 2 * x # works
## [1] TRUE
0.9 == 3 * x # but you cannot rely on it
## [1] FALSE

Detour: Floating Point Error

Why not equal?

0.9 - 3 * x
## [1] 1.110223e-16

Error in 0.3 has become three times larger, but error in 0.9 is about the same as it was in 0.3.

This imprecision is not unique to R… this applies to all software that uses floating point numbers.

Detour: Floating Point Error

It is best to use a date/time representation that uses non-fractional values for your application…

If your smallest precision is less than one second, POSIXt may introduce rounding errors so it is best to minimize the amount of calculations performed with such timestamps.

chron

library(chron)
## NOTE: The default cutoff when expanding a 2-digit year
## to a 4-digit year will change from 30 to 69 by Aug 2020
## (as for Date and POSIXct in base R.)
## 
## Attaching package: 'chron'
## The following objects are masked from 'package:lubridate':
## 
##     days, hours, minutes, seconds, years
dtm1 <- chron( dates. = c( "3/13/2016", "3/13/2016" )
             , times. = c( "01:00:00", "03:00:00" )
             )
dtm1  # automatically formatted for display
## [1] (03/13/16 01:00:00) (03/13/16 03:00:00)

chron Internal Representation

See what R is storing without the automatic formatting:

unclass( dtm1 )
## [1] 16873.04 16873.12
## attr(,"format")
##   dates   times 
## "m/d/y" "h:m:s" 
## attr(,"origin")
## month   day  year 
##     1     1  1970

chron Comparison

dtm1[ 1 ] < dtm1[ 2 ]
## [1] TRUE
diff( dtm1 )
## [1] 02:00:00

chron Spring Forward

library(ggplot2)
dtms1 <- seq( dtm1[ 1 ], dtm1[ 2 ], times( "00:30:00" ) ); dtms1
## [1] (03/13/16 01:00:00) (03/13/16 01:30:00) (03/13/16 02:00:00)
## [4] (03/13/16 02:30:00) (03/13/16 03:00:00)

chron Spring Forward

Sys.setenv( TZ = "GMT" )
qplot( seq_along( dtms1 ), dtms1 ) +
  chron::scale_y_chron( format="%m/%d/%y %H:%M" )

chron Sequence Rounding

dtm2a <- chron( "02/20/13", "00:00:00" )
dtm2b <- chron( "07/03/18", "15:30:00" ) # stop at 3:30pm
dtm2 <- seq( from=dtm2a, to=dtm2b, by=times( "00:15:00" ) )
tail( dtm2 ) # stops one value too soon
## [1] (07/03/18 14:00:00) (07/03/18 14:15:00) (07/03/18 14:30:00)
## [4] (07/03/18 14:45:00) (07/03/18 15:00:00) (07/03/18 15:15:00)
length( dtm2 )
## [1] 188126

POSIXct Sequence Rounding

Sys.setenv( TZ="GMT" ) # emulate chron behavior
dtm3a <- as.POSIXct( "02/20/13 00:00:00"
                   , format = "%m/%d/%y %H:%M:%S"
                   )
dtm3b <- as.POSIXct( "07/03/18 15:30:00"
                   , format = "%m/%d/%y %H:%M:%S" 
                   )
dtm3 <- seq( from = dtm3a
           , to = dtm3b
           , by = as.difftime( 15, units="mins" )
           )
tail( dtm3 )   # does include final value
## [1] "2018-07-03 14:15:00 GMT" "2018-07-03 14:30:00 GMT"
## [3] "2018-07-03 14:45:00 GMT" "2018-07-03 15:00:00 GMT"
## [5] "2018-07-03 15:15:00 GMT" "2018-07-03 15:30:00 GMT"
length( dtm3 ) # one more than cron example
## [1] 188127

zoo Package offers yearmon/yrqtr alternatives

library(zoo)
dt1 <- as.yearmon( c( "2016-03", "2016-04" ) )
dt1  # automatically formatted for display
## [1] "Mar 2016" "Apr 2016"

zoo Internal Representation

See what R is storing without the automatic formatting:

unclass( dt1 )
## [1] 2016.167 2016.250

zoo Comparison

dt1[ 1 ] < dt1[ 2 ]
## [1] TRUE
diff( dt1 ) # displayed nonsensically
## [1] 0.08333333

zoo Sequences

Typical to build floating point sequences, then convert to yearmon type:

n <- 1416
f2a <- seq( 1900
          , 1900 + n/12
          , by = 1/12 # unsafe practice
          )
d2a <- as.yearmon( f2a ) # rounded when converted
tail( d2a ) #  internal round-to-month is very robust
## [1] "Aug 2017" "Sep 2017" "Oct 2017" "Nov 2017" "Dec 2017" "Jan 2018"
f2b <- 1900 + seq( 0, n )/12 # safer way to handle fractions
d2b <- as.yearmon( f2b )
tail( d2b ) # no difference
## [1] "Aug 2017" "Sep 2017" "Oct 2017" "Nov 2017" "Dec 2017" "Jan 2018"
as.numeric( f2a[ length( f2a ) ] ) - as.numeric( f2b[ length( f2b ) ] )
## [1] 0

  1. https://data.london.gov.uk/dataset/smartmeter-energy-use-data-in-london-households

  2. G. Grothendieck and T. Petzoldt, “R Help Desk: Date and Time Classes in R,” R News, vol. 4, no. 1, pp. 29–32, Jun-2004 [Online]. Available: https://www.r-project.org/doc/Rnews/Rnews_2004-1.pdf.

  3. M. J. Crawley, Statistics: an introduction using R, 1st ed. Chichester, West Sussex, England: J. Wiley, 2005.

  4. M. J. Crawley, Statistics: an introduction using R, 1st ed. Chichester, West Sussex, England: J. Wiley, 2005.

  5. https://en.wikipedia.org/wiki/List_of_tz_database_time_zones

  6. G. Grolemund and H. Wickham, “Dates and Times Made Easy with lubridate,” Journal of Statistical Software, vol. 40, no. 3, pp. 1–25, 2011 [Online]. Available: http://www.jstatsoft.org/v40/i03/

  7. “Dates and times with lubridate :: CHEAT SHEET.” RStudio, Dec-2017 [Online]. Available: https://github.com/rstudio/cheatsheets/raw/master/lubridate.pdf

  8. https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f