Jeff Newmiller
September 9, 2018
Sample residential electric load data from London, England1 (pre-trimmed to one house and odd records removed)
## 'data.frame': 24140 obs. of 3 variables:
## $ LCLid : chr "MAC000002" "MAC000002" "MAC000002" "MAC000002" ...
## $ DateTime: chr "2012-10-12 00:30:00" "2012-10-12 01:00:00" "2012-10-12 01:30:00" "2012-10-12 02:00:00" ...
## $ KWH : num 0 0 0 0 0 0 0 0 0 0 ...
Tell R to assume timezone is Greenwich Mean Time (or Universal Time Coordinated = UTC)
Make a Dtm column using base R. The “%Y-%m-%d %H:%M:%OS” format works by default, but many common formats need to be specified with a format argument (?strptime).
dta_b <- dta # make a copy so we can compare approaches later
dta_b$Dtm <- as.POSIXct( dta_b$DateTime ) # assumes TZ is set
str( dta_b$Dtm ) # confirming new column type## POSIXct[1:24140], format: "2012-10-12 00:30:00" "2012-10-12 01:00:00" "2012-10-12 01:30:00" ...
POSIXct is a time representation that borrows heavily from the Portable Operating Systems Interface (POSIX) standard, which uses integer values to indicate time as the number of seconds since January 1, 1970 Greenwich Mean Time. R uses a double-precision floating point number instead of an integer, but is otherwise very similar to the original standard.
Excel basically acts like all time is in GMT all the time… if you compute '2012-03-26' - '2012-03-25' in Excel you get 1 day (24 hours), even though in London, March 25, 2012 was the beginning of daylight savings time so the day was 23 hours long. For many uses this is fine, but R timestamps always keep timezones in mind so if you want to use simplified time in R like you can in Excel then you need to set the timezone to GMT before you do time calculations.
You can set a default timezone for time calculations in a particular R session in the manner of the previous page, or you can set an attribute "tzone" on each timezone variable. Timezones affect how the character strings are converted to POSIXct and back to character. It also affects how POSIXct <-> POSIXlt conversions behave.
## [1] ""
## [1] "GMT"
It is not possible to set a separate timezone on individual POSIXct elements within a vector.
POSIXlt (List or Long Time)POSIXlt is the base R tool for manipulating the parts of a timestamp:
Sys.setenv( TZ = "UTC" )
# see ?as.POSIXlt
dtm2 <- as.POSIXlt( c( "2016-03-13 01:00:00", "2016-03-13 03:00:00" ) )
dtm2## [1] "2016-03-13 01:00:00 UTC" "2016-03-13 03:00:00 UTC"
## [1] TRUE
## Time difference of 2 hours
POSIXlt InternalsSee ?DateTimeClasses.
year is based from 1900, mon represents January as 0, wday starts at 0 for Sunday, yday starts at 0 for January 1.
## List of 9
## $ sec : num [1:2] 0 0
## $ min : int [1:2] 0 0
## $ hour : int [1:2] 1 3
## $ mday : int [1:2] 13 13
## $ mon : int [1:2] 2 2
## $ year : int [1:2] 116 116
## $ wday : int [1:2] 0 0
## $ yday : int [1:2] 72 72
## $ isdst: int [1:2] 0 0
## - attr(*, "tzone")= chr "UTC"
## [1] 2016 2016
These are introduced in the help page ?DateTimeClasses and were discussed in R News 2004-42:
Date
POSIXct (Continuous Time3)
POSIXlt (List Time4)
Note that base R does not support working with time-of-day only, since the length of a day can be different in different timezones and/or on different calendar days.
Date (No Time)Sys.setenv( TZ = "UTC" )
dt1a <- as.Date( "2013-03-13" ) # see ?as.Date
dt1b <- as.Date( "3/21/2013", format="%m/%d/%Y" ) # see ?strptime
dt1b## [1] "2013-03-21"
## [1] 15785
## [1] TRUE
## Time difference of 8 days
POSIXct (Continuous or Compact Time)Most flexible for computing with instants of time. Can represent precision of smaller than one second, but results may be unreliable due to floating point rounding.
Sys.setenv( TZ = "UTC" )
dtm1 <- as.POSIXct( c( "2016-03-13 01:00:00", "2016-03-13 03:00:00" ) )
dtm1## [1] "2016-03-13 01:00:00 UTC" "2016-03-13 03:00:00 UTC"
## [1] 1457830800 1457838000
## [1] TRUE
## Time difference of 2 hours
POSIXlt (List or Long Time)Sys.setenv( TZ = "UTC" )
# see ?as.POSIXlt
dtm2 <- as.POSIXlt( c( "2016-03-13 01:00:00", "2016-03-13 03:00:00" ) )
dtm2## [1] "2016-03-13 01:00:00 UTC" "2016-03-13 03:00:00 UTC"
## [1] TRUE
## Time difference of 2 hours
POSIXlt InternalsSee ?DateTimeClasses. year is based from 1900, mon represents January as 0, wday starts at 0 for Sunday, yday starts at 0 for January 1.
## List of 9
## $ sec : num [1:2] 0 0
## $ min : int [1:2] 0 0
## $ hour : int [1:2] 1 3
## $ mday : int [1:2] 13 13
## $ mon : int [1:2] 2 2
## $ year : int [1:2] 116 116
## $ wday : int [1:2] 0 0
## $ yday : int [1:2] 72 72
## $ isdst: int [1:2] 0 0
## - attr(*, "tzone")= chr "UTC"
## [1] 2016 2016
difftime for DurationsThe amount of time between two points in time is treated differently than the points in time themselves. You cannot add two POSIXct values, but you can add a POSIXct with as many difftime values as desired.
## [1] "2016-03-13 01:30:00 UTC"
## [1] "2016-03-27 01:00:00 UTC"
difftime Numeric EquivalentIf you need to know the value of a difftime you must remember to specify the units or you may get whatever “convenient” units R wants to use:
## [1] 30
## [1] 30
## [1] 1800
Time zones are identified using string labels that are technically OS-dependent, but for Windows/Mac/Linux the Olson database is used so this is fairly widely applicable 5.
## [1] "US/Pacific-New" "US/Samoa" "UTC" "W-SU"
## [5] "WET" "Zulu"
## [1] "America/Los_Angeles"
Note that even though sometimes R will use a 3-letter timezone abbreviation when displaying a datetime value, such shorthand is usually not acceptable for specifying the timezone.
No matter what timezone you use, the underlying numeric value of a POSIXct will be assumed to count from the origin instant in GMT.
If you don’t have any reason to be concerned with timezones in your data, you can make life “easy” for yourself by setting your working timezone to be “GMT” or “UTC”.
Converting Date to POSIXct always treats the date as beginning of the day in GMT, so if you use any other timezone for other values then you will want to “force” the timezone to be compatible with any other POSIXct values you may be working with.
Note that each vector of POSIXct can have its own timezone, but some functions can cause that timezone to get lost, or will create time values internally using the default (TZ) timezone, so it is simplest to change the TZ as needed while doing input, then use some single timezone of your choosing while doing calculations and generating output.
lubridate package (1)The lubridate6 package provides many “helper” functions for working with POSIXct and Date values.
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
## [1] TRUE
## [1] TRUE
lubridate package (2)You can repair a time value that was converted to POSIXct with the wrong timezone:
## [1] "2016-03-13 01:00:00 UTC"
## [1] "2016-03-13 01:00:00 PST" "2016-03-13 03:00:00 PDT"
Or you can display a given instant of time using a different timezone:
## [1] "2016-03-12 17:00:00 PST" "2016-03-12 19:00:00 PST"
lubridate package (3)Three additional ways beyond difftime to represent time intervals are also provided:
## [1] 2016-03-13 01:00:00 UTC--2016-03-13 03:00:00 UTC
dtm1PT <- force_tz( dtm1[ 1 ], "US/Pacific" )
dtm1PT + days( 1 ) # add a 1 day period (acts like a calendar)## [1] "2016-03-14 01:00:00 PDT"
## [1] "2016-03-14 02:00:00 PDT"
There exists a cheat sheet summary of lubridate functions.7
Some people think the POSIXt approach is too rigid, and try out their own ideas for handling time:
chron::chron
zoo::yrmon
R FAQ 7.318 warns against depending on exact results when using floating point fractions:
## [1] TRUE
## [1] FALSE
Why not equal?
## [1] 1.110223e-16
Error in 0.3 has become three times larger, but error in 0.9 is about the same as it was in 0.3.
This imprecision is not unique to R… this applies to all software that uses floating point numbers.
It is best to use a date/time representation that uses non-fractional values for your application…
Date if you never use time-of-day, orPOSIXct if your smallest precision is one second.If your smallest precision is less than one second, POSIXt may introduce rounding errors so it is best to minimize the amount of calculations performed with such timestamps.
chron## NOTE: The default cutoff when expanding a 2-digit year
## to a 4-digit year will change from 30 to 69 by Aug 2020
## (as for Date and POSIXct in base R.)
##
## Attaching package: 'chron'
## The following objects are masked from 'package:lubridate':
##
## days, hours, minutes, seconds, years
dtm1 <- chron( dates. = c( "3/13/2016", "3/13/2016" )
, times. = c( "01:00:00", "03:00:00" )
)
dtm1 # automatically formatted for display## [1] (03/13/16 01:00:00) (03/13/16 03:00:00)
chron Internal RepresentationSee what R is storing without the automatic formatting:
## [1] 16873.04 16873.12
## attr(,"format")
## dates times
## "m/d/y" "h:m:s"
## attr(,"origin")
## month day year
## 1 1 1970
chron Spring ForwardGMT timezone (To display right, set TZ="GMT" when working with chron)## [1] (03/13/16 01:00:00) (03/13/16 01:30:00) (03/13/16 02:00:00)
## [4] (03/13/16 02:30:00) (03/13/16 03:00:00)
chron Spring ForwardSys.setenv( TZ = "GMT" )
qplot( seq_along( dtms1 ), dtms1 ) +
chron::scale_y_chron( format="%m/%d/%y %H:%M" )chron Sequence Roundingdtm2a <- chron( "02/20/13", "00:00:00" )
dtm2b <- chron( "07/03/18", "15:30:00" ) # stop at 3:30pm
dtm2 <- seq( from=dtm2a, to=dtm2b, by=times( "00:15:00" ) )
tail( dtm2 ) # stops one value too soon## [1] (07/03/18 14:00:00) (07/03/18 14:15:00) (07/03/18 14:30:00)
## [4] (07/03/18 14:45:00) (07/03/18 15:00:00) (07/03/18 15:15:00)
## [1] 188126
POSIXct Sequence RoundingSys.setenv( TZ="GMT" ) # emulate chron behavior
dtm3a <- as.POSIXct( "02/20/13 00:00:00"
, format = "%m/%d/%y %H:%M:%S"
)
dtm3b <- as.POSIXct( "07/03/18 15:30:00"
, format = "%m/%d/%y %H:%M:%S"
)
dtm3 <- seq( from = dtm3a
, to = dtm3b
, by = as.difftime( 15, units="mins" )
)
tail( dtm3 ) # does include final value## [1] "2018-07-03 14:15:00 GMT" "2018-07-03 14:30:00 GMT"
## [3] "2018-07-03 14:45:00 GMT" "2018-07-03 15:00:00 GMT"
## [5] "2018-07-03 15:15:00 GMT" "2018-07-03 15:30:00 GMT"
## [1] 188127
zoo Package offers yearmon/yrqtr alternativeslibrary(zoo)
dt1 <- as.yearmon( c( "2016-03", "2016-04" ) )
dt1 # automatically formatted for display## [1] "Mar 2016" "Apr 2016"
zoo Internal RepresentationSee what R is storing without the automatic formatting:
## [1] 2016.167 2016.250
zoo Comparison## [1] TRUE
## [1] 0.08333333
zoo SequencesTypical to build floating point sequences, then convert to yearmon type:
n <- 1416
f2a <- seq( 1900
, 1900 + n/12
, by = 1/12 # unsafe practice
)
d2a <- as.yearmon( f2a ) # rounded when converted
tail( d2a ) # internal round-to-month is very robust## [1] "Aug 2017" "Sep 2017" "Oct 2017" "Nov 2017" "Dec 2017" "Jan 2018"
f2b <- 1900 + seq( 0, n )/12 # safer way to handle fractions
d2b <- as.yearmon( f2b )
tail( d2b ) # no difference## [1] "Aug 2017" "Sep 2017" "Oct 2017" "Nov 2017" "Dec 2017" "Jan 2018"
## [1] 0
https://data.london.gov.uk/dataset/smartmeter-energy-use-data-in-london-households↩
G. Grothendieck and T. Petzoldt, “R Help Desk: Date and Time Classes in R,” R News, vol. 4, no. 1, pp. 29–32, Jun-2004 [Online]. Available: https://www.r-project.org/doc/Rnews/Rnews_2004-1.pdf.↩
M. J. Crawley, Statistics: an introduction using R, 1st ed. Chichester, West Sussex, England: J. Wiley, 2005.↩
M. J. Crawley, Statistics: an introduction using R, 1st ed. Chichester, West Sussex, England: J. Wiley, 2005.↩
https://en.wikipedia.org/wiki/List_of_tz_database_time_zones↩
G. Grolemund and H. Wickham, “Dates and Times Made Easy with lubridate,” Journal of Statistical Software, vol. 40, no. 3, pp. 1–25, 2011 [Online]. Available: http://www.jstatsoft.org/v40/i03/↩
“Dates and times with lubridate :: CHEAT SHEET.” RStudio, Dec-2017 [Online]. Available: https://github.com/rstudio/cheatsheets/raw/master/lubridate.pdf↩
https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f↩