Jeff Newmiller
September 9, 2018
Sample residential electric load data from London, England1 (pre-trimmed to one house and odd records removed)
## 'data.frame': 24140 obs. of 3 variables:
## $ LCLid : chr "MAC000002" "MAC000002" "MAC000002" "MAC000002" ...
## $ DateTime: chr "2012-10-12 00:30:00" "2012-10-12 01:00:00" "2012-10-12 01:30:00" "2012-10-12 02:00:00" ...
## $ KWH : num 0 0 0 0 0 0 0 0 0 0 ...
Tell R to assume timezone is Greenwich Mean Time (or Universal Time Coordinated = UTC)
Make a Dtm
column using base R. The “%Y-%m-%d %H:%M:%OS” format works by default, but many common formats need to be specified with a format
argument (?strptime
).
dta_b <- dta # make a copy so we can compare approaches later
dta_b$Dtm <- as.POSIXct( dta_b$DateTime ) # assumes TZ is set
str( dta_b$Dtm ) # confirming new column type
## POSIXct[1:24140], format: "2012-10-12 00:30:00" "2012-10-12 01:00:00" "2012-10-12 01:30:00" ...
POSIXct
is a time representation that borrows heavily from the Portable Operating Systems Interface (POSIX) standard, which uses integer values to indicate time as the number of seconds since January 1, 1970 Greenwich Mean Time. R uses a double-precision floating point number instead of an integer, but is otherwise very similar to the original standard.
Excel basically acts like all time is in GMT all the time… if you compute '2012-03-26' - '2012-03-25'
in Excel you get 1 day (24 hours), even though in London, March 25, 2012 was the beginning of daylight savings time so the day was 23 hours long. For many uses this is fine, but R timestamps always keep timezones in mind so if you want to use simplified time in R like you can in Excel then you need to set the timezone to GMT before you do time calculations.
You can set a default timezone for time calculations in a particular R session in the manner of the previous page, or you can set an attribute "tzone"
on each timezone variable. Timezones affect how the character strings are converted to POSIXct and back to character. It also affects how POSIXct
<-> POSIXlt
conversions behave.
## [1] ""
## [1] "GMT"
It is not possible to set a separate timezone on individual POSIXct
elements within a vector.
POSIXlt
(List or Long Time)POSIXlt
is the base R tool for manipulating the parts of a timestamp:
Sys.setenv( TZ = "UTC" )
# see ?as.POSIXlt
dtm2 <- as.POSIXlt( c( "2016-03-13 01:00:00", "2016-03-13 03:00:00" ) )
dtm2
## [1] "2016-03-13 01:00:00 UTC" "2016-03-13 03:00:00 UTC"
## [1] TRUE
## Time difference of 2 hours
POSIXlt
InternalsSee ?DateTimeClasses
.
year
is based from 1900, mon
represents January as 0, wday
starts at 0 for Sunday, yday
starts at 0 for January 1.
## List of 9
## $ sec : num [1:2] 0 0
## $ min : int [1:2] 0 0
## $ hour : int [1:2] 1 3
## $ mday : int [1:2] 13 13
## $ mon : int [1:2] 2 2
## $ year : int [1:2] 116 116
## $ wday : int [1:2] 0 0
## $ yday : int [1:2] 72 72
## $ isdst: int [1:2] 0 0
## - attr(*, "tzone")= chr "UTC"
## [1] 2016 2016
These are introduced in the help page ?DateTimeClasses
and were discussed in R News 2004-42:
Date
POSIXct
(Continuous Time3)
POSIXlt
(List Time4)
Note that base R does not support working with time-of-day only, since the length of a day can be different in different timezones and/or on different calendar days.
Date
(No Time)Sys.setenv( TZ = "UTC" )
dt1a <- as.Date( "2013-03-13" ) # see ?as.Date
dt1b <- as.Date( "3/21/2013", format="%m/%d/%Y" ) # see ?strptime
dt1b
## [1] "2013-03-21"
## [1] 15785
## [1] TRUE
## Time difference of 8 days
POSIXct
(Continuous or Compact Time)Most flexible for computing with instants of time. Can represent precision of smaller than one second, but results may be unreliable due to floating point rounding.
Sys.setenv( TZ = "UTC" )
dtm1 <- as.POSIXct( c( "2016-03-13 01:00:00", "2016-03-13 03:00:00" ) )
dtm1
## [1] "2016-03-13 01:00:00 UTC" "2016-03-13 03:00:00 UTC"
## [1] 1457830800 1457838000
## [1] TRUE
## Time difference of 2 hours
POSIXlt
(List or Long Time)Sys.setenv( TZ = "UTC" )
# see ?as.POSIXlt
dtm2 <- as.POSIXlt( c( "2016-03-13 01:00:00", "2016-03-13 03:00:00" ) )
dtm2
## [1] "2016-03-13 01:00:00 UTC" "2016-03-13 03:00:00 UTC"
## [1] TRUE
## Time difference of 2 hours
POSIXlt
InternalsSee ?DateTimeClasses
. year
is based from 1900, mon
represents January as 0, wday
starts at 0 for Sunday, yday
starts at 0 for January 1.
## List of 9
## $ sec : num [1:2] 0 0
## $ min : int [1:2] 0 0
## $ hour : int [1:2] 1 3
## $ mday : int [1:2] 13 13
## $ mon : int [1:2] 2 2
## $ year : int [1:2] 116 116
## $ wday : int [1:2] 0 0
## $ yday : int [1:2] 72 72
## $ isdst: int [1:2] 0 0
## - attr(*, "tzone")= chr "UTC"
## [1] 2016 2016
difftime
for DurationsThe amount of time between two points in time is treated differently than the points in time themselves. You cannot add two POSIXct
values, but you can add a POSIXct
with as many difftime
values as desired.
## [1] "2016-03-13 01:30:00 UTC"
## [1] "2016-03-27 01:00:00 UTC"
difftime
Numeric EquivalentIf you need to know the value of a difftime you must remember to specify the units or you may get whatever “convenient” units R wants to use:
## [1] 30
## [1] 30
## [1] 1800
Time zones are identified using string labels that are technically OS-dependent, but for Windows/Mac/Linux the Olson database is used so this is fairly widely applicable 5.
## [1] "US/Pacific-New" "US/Samoa" "UTC" "W-SU"
## [5] "WET" "Zulu"
## [1] "America/Los_Angeles"
Note that even though sometimes R will use a 3-letter timezone abbreviation when displaying a datetime value, such shorthand is usually not acceptable for specifying the timezone.
No matter what timezone you use, the underlying numeric value of a POSIXct
will be assumed to count from the origin instant in GMT.
If you don’t have any reason to be concerned with timezones in your data, you can make life “easy” for yourself by setting your working timezone to be “GMT” or “UTC”.
Converting Date
to POSIXct
always treats the date as beginning of the day in GMT, so if you use any other timezone for other values then you will want to “force” the timezone to be compatible with any other POSIXct
values you may be working with.
Note that each vector of POSIXct
can have its own timezone, but some functions can cause that timezone to get lost, or will create time values internally using the default (TZ) timezone, so it is simplest to change the TZ as needed while doing input, then use some single timezone of your choosing while doing calculations and generating output.
lubridate
package (1)The lubridate
6 package provides many “helper” functions for working with POSIXct
and Date
values.
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
## [1] TRUE
## [1] TRUE
lubridate
package (2)You can repair a time value that was converted to POSIXct with the wrong timezone:
## [1] "2016-03-13 01:00:00 UTC"
## [1] "2016-03-13 01:00:00 PST" "2016-03-13 03:00:00 PDT"
Or you can display a given instant of time using a different timezone:
## [1] "2016-03-12 17:00:00 PST" "2016-03-12 19:00:00 PST"
lubridate
package (3)Three additional ways beyond difftime
to represent time intervals are also provided:
## [1] 2016-03-13 01:00:00 UTC--2016-03-13 03:00:00 UTC
dtm1PT <- force_tz( dtm1[ 1 ], "US/Pacific" )
dtm1PT + days( 1 ) # add a 1 day period (acts like a calendar)
## [1] "2016-03-14 01:00:00 PDT"
## [1] "2016-03-14 02:00:00 PDT"
There exists a cheat sheet summary of lubridate
functions.7
Some people think the POSIXt
approach is too rigid, and try out their own ideas for handling time:
chron::chron
zoo::yrmon
R FAQ 7.318 warns against depending on exact results when using floating point fractions:
## [1] TRUE
## [1] FALSE
Why not equal?
## [1] 1.110223e-16
Error in 0.3
has become three times larger, but error in 0.9
is about the same as it was in 0.3
.
This imprecision is not unique to R… this applies to all software that uses floating point numbers.
It is best to use a date/time representation that uses non-fractional values for your application…
Date
if you never use time-of-day, orPOSIXct
if your smallest precision is one second.If your smallest precision is less than one second, POSIXt
may introduce rounding errors so it is best to minimize the amount of calculations performed with such timestamps.
chron
## NOTE: The default cutoff when expanding a 2-digit year
## to a 4-digit year will change from 30 to 69 by Aug 2020
## (as for Date and POSIXct in base R.)
##
## Attaching package: 'chron'
## The following objects are masked from 'package:lubridate':
##
## days, hours, minutes, seconds, years
dtm1 <- chron( dates. = c( "3/13/2016", "3/13/2016" )
, times. = c( "01:00:00", "03:00:00" )
)
dtm1 # automatically formatted for display
## [1] (03/13/16 01:00:00) (03/13/16 03:00:00)
chron
Internal RepresentationSee what R is storing without the automatic formatting:
## [1] 16873.04 16873.12
## attr(,"format")
## dates times
## "m/d/y" "h:m:s"
## attr(,"origin")
## month day year
## 1 1 1970
chron
Spring ForwardGMT
timezone (To display right, set TZ="GMT"
when working with chron
)## [1] (03/13/16 01:00:00) (03/13/16 01:30:00) (03/13/16 02:00:00)
## [4] (03/13/16 02:30:00) (03/13/16 03:00:00)
chron
Spring Forwardchron
Sequence Roundingdtm2a <- chron( "02/20/13", "00:00:00" )
dtm2b <- chron( "07/03/18", "15:30:00" ) # stop at 3:30pm
dtm2 <- seq( from=dtm2a, to=dtm2b, by=times( "00:15:00" ) )
tail( dtm2 ) # stops one value too soon
## [1] (07/03/18 14:00:00) (07/03/18 14:15:00) (07/03/18 14:30:00)
## [4] (07/03/18 14:45:00) (07/03/18 15:00:00) (07/03/18 15:15:00)
## [1] 188126
POSIXct
Sequence RoundingSys.setenv( TZ="GMT" ) # emulate chron behavior
dtm3a <- as.POSIXct( "02/20/13 00:00:00"
, format = "%m/%d/%y %H:%M:%S"
)
dtm3b <- as.POSIXct( "07/03/18 15:30:00"
, format = "%m/%d/%y %H:%M:%S"
)
dtm3 <- seq( from = dtm3a
, to = dtm3b
, by = as.difftime( 15, units="mins" )
)
tail( dtm3 ) # does include final value
## [1] "2018-07-03 14:15:00 GMT" "2018-07-03 14:30:00 GMT"
## [3] "2018-07-03 14:45:00 GMT" "2018-07-03 15:00:00 GMT"
## [5] "2018-07-03 15:15:00 GMT" "2018-07-03 15:30:00 GMT"
## [1] 188127
zoo
Package offers yearmon
/yrqtr
alternativeslibrary(zoo)
dt1 <- as.yearmon( c( "2016-03", "2016-04" ) )
dt1 # automatically formatted for display
## [1] "Mar 2016" "Apr 2016"
zoo
Internal RepresentationSee what R is storing without the automatic formatting:
## [1] 2016.167 2016.250
zoo
Comparison## [1] TRUE
## [1] 0.08333333
zoo
SequencesTypical to build floating point sequences, then convert to yearmon
type:
n <- 1416
f2a <- seq( 1900
, 1900 + n/12
, by = 1/12 # unsafe practice
)
d2a <- as.yearmon( f2a ) # rounded when converted
tail( d2a ) # internal round-to-month is very robust
## [1] "Aug 2017" "Sep 2017" "Oct 2017" "Nov 2017" "Dec 2017" "Jan 2018"
f2b <- 1900 + seq( 0, n )/12 # safer way to handle fractions
d2b <- as.yearmon( f2b )
tail( d2b ) # no difference
## [1] "Aug 2017" "Sep 2017" "Oct 2017" "Nov 2017" "Dec 2017" "Jan 2018"
## [1] 0
https://data.london.gov.uk/dataset/smartmeter-energy-use-data-in-london-households↩
G. Grothendieck and T. Petzoldt, “R Help Desk: Date and Time Classes in R,” R News, vol. 4, no. 1, pp. 29–32, Jun-2004 [Online]. Available: https://www.r-project.org/doc/Rnews/Rnews_2004-1.pdf.↩
M. J. Crawley, Statistics: an introduction using R, 1st ed. Chichester, West Sussex, England: J. Wiley, 2005.↩
M. J. Crawley, Statistics: an introduction using R, 1st ed. Chichester, West Sussex, England: J. Wiley, 2005.↩
https://en.wikipedia.org/wiki/List_of_tz_database_time_zones↩
G. Grolemund and H. Wickham, “Dates and Times Made Easy with lubridate,” Journal of Statistical Software, vol. 40, no. 3, pp. 1–25, 2011 [Online]. Available: http://www.jstatsoft.org/v40/i03/↩
“Dates and times with lubridate :: CHEAT SHEET.” RStudio, Dec-2017 [Online]. Available: https://github.com/rstudio/cheatsheets/raw/master/lubridate.pdf↩
https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f↩