A Brief Date-Time Howto

Jeff Newmiller

October 2, 2018 (updated May 10, 2020)

Date/Time FAQs

Will show examples of fixing common time-related problems:

Example Dataset

Sample residential electric load data from London, England1. Peak energy usage usually occurs around 8pm, depending somewhat on day of week.

Q: Date/Time values make messy plots?

Why do the following simple commands make such messy x-axis labeling? And it is slow, too!

dta <- read.csv( "../data/MAC000002clean.csv", stringsAsFactors = TRUE )
plot( KWH ~ DateTime, data=dta, pch = "." )

A: Because the x-axis is a factor

In versions of R before 4.0.0 the read.csv function by default looked for floating point, integer, or logical types, and anything else is treated like a factor. In R 4.0.0 and after, the default is to import unrecognized columns of data as character strings. Plotting character data directly does not work, so an error “NAs introduced by coercion” would appear.

Factors are labels, like “Male” or “Female”. The above plot assigns each record its own label, and then “skips” labels to avoid writing them on top of each other.

A: Use Date or POSIXct for date/time values

Timestamp values may look similar to character or factor representations, but they are much more flexible.

dta <- read.csv( "../data/MAC000002clean.csv"
#               , as.is=TRUE # avoid auto-convert to factor (unnecessary in R4.0..0)
               )
# Simulate YYYY-mm-ddTHH:MM:SS ISO time format
dta$DateTime <- sub( " ", "T", dta$DateTime )
Sys.setenv( TZ = "US/Pacific" ) # If you forget TZ, this is the likely default for SFBay Area
dta$Dtm <- as.POSIXct( dta$DateTime ) # New column
str( dta )
## 'data.frame':    24140 obs. of  4 variables:
##  $ LCLid   : chr  "MAC000002" "MAC000002" "MAC000002" "MAC000002" ...
##  $ DateTime: chr  "2012-10-12T00:30:00" "2012-10-12T01:00:00" "2012-10-12T01:30:00" "2012-10-12T02:00:00" ...
##  $ KWH     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Dtm     : POSIXct, format: "2012-10-12" "2012-10-12" ...

I prefer to make a new column to help troubleshoot the time conversion process.

Note there are ways to find out how to refer to timezones on Wikipedia2 and in R3.

A: Use Date or POSIXct for date/time values

The x-axis breaks are more sensible now:

plot( KWH ~ Dtm, data = dta, pch = "." )

Useful Date/Time Data Types in R

Item Date POSIXct POSIXlt
from chr as.Date( "2013-03-31" ) as.POSIXct( "2013-03-31 03:30:00" ) as.POSIXlt( "2013-03-31 03:30:00" )
to chr as.character( Dt ) as.character( Dtm ) as.character( Dtm )
implementation [1] 15795 [1] 1364700600
attr(,"tzone")
[1] ""
List of 9
$ sec : num 0
$ min : int 30
$ hour : int 3
$ mday : int 31
$ mon : int 2
$ year : int 113
$ wday : int 0
$ yday : int 89
$ isdst: int 0
- attr(*, "tzone")= chr "GMT"
units days since 1970-01-01 GMT sec since 1970-01-01 00:00:00 GMT separate units for each of 9 elements

Good Practices

Converting formatted strings to POSIXct:

Q: Only part of timestamp converted? (Table)

The converted Dtm column does not seem to have hours and minutes in it:

head(dta)
##       LCLid            DateTime KWH        Dtm
## 1 MAC000002 2012-10-12T00:30:00   0 2012-10-12
## 2 MAC000002 2012-10-12T01:00:00   0 2012-10-12
## 3 MAC000002 2012-10-12T01:30:00   0 2012-10-12
## 4 MAC000002 2012-10-12T02:00:00   0 2012-10-12
## 5 MAC000002 2012-10-12T02:30:00   0 2012-10-12
## 6 MAC000002 2012-10-12T03:00:00   0 2012-10-12

Q: Only part of timestamp converted? (Plot)

Just to confirm this problem, we can plot a few days of Dtm versus row position:

idx <- 1:(48*4) # about 4 days of 1/2 hour data 
qplot( idx, dta$Dtm[ idx ] )

R is not correctly guessing what format the data are stored in.

A?: Only part of timestamp converted? Force Format

Experiment: remove the choice to guess which format to use:

dta$Dtm <- as.POSIXct( dta$DateTime
                     , format = "%Y-%m-%dT%H:%M:%S"
                     )
which( is.na( dta$Dtm ) ) # this may not work on Linux/Mac
## integer(0)
dta[ 7105:7106, ]
##          LCLid            DateTime   KWH                 Dtm
## 7105 MAC000002 2013-03-10T02:00:00 0.245 2013-03-10 01:00:00
## 7106 MAC000002 2013-03-10T02:30:00 0.169 2013-03-10 01:30:00

These timestamps correspond to the 2013 “spring-forward” daylight savings time change… technically, they are invalid in this timezone. R chose the “date-only” "%Y-%m-%d" format that made no attempt to convert the time information because the “date-only” format suceeded for all values in the input.

A?: Only part of timestamp converted? Force Format (Tabular)

What does the time look like around the spring-forward records using the forced format?

dta[ 7103:7109, ]
##          LCLid            DateTime   KWH                 Dtm
## 7103 MAC000002 2013-03-10T01:00:00 0.247 2013-03-10 01:00:00
## 7104 MAC000002 2013-03-10T01:30:00 0.235 2013-03-10 01:30:00
## 7105 MAC000002 2013-03-10T02:00:00 0.245 2013-03-10 01:00:00
## 7106 MAC000002 2013-03-10T02:30:00 0.169 2013-03-10 01:30:00
## 7107 MAC000002 2013-03-10T03:00:00 0.104 2013-03-10 03:00:00
## 7108 MAC000002 2013-03-10T03:30:00 0.120 2013-03-10 03:30:00
## 7109 MAC000002 2013-03-10T04:00:00 0.089 2013-03-10 04:00:00

In US/Pacific timezone, 3am is one hour after 1am… what if we plot this?

A?: Only part of timestamp converted? Force Format (Plot)

idx <- 7101:7109
qplot( idx, dta$Dtm[ idx ] )

But if this is actually how the data were recorded, why the bad data?

A: Use the TZ Environment Variable

Solution is to tell R what the correct timezone is before converting from character. This data was from London, and it did not appear to have a “spring-forward” so it doesn’t have daylight savings:

Sys.setenv( TZ = "GMT" )
dta$Dtm <- as.POSIXct( dta$DateTime, format = "%Y-%m-%dT%H:%M:%S" )
idx <- 7101:7109
qplot( idx, dta$Dtm[ idx ] )

No NA values, time increments uniformly! Looking better!

A: Only part of timestamp converted? (Summary)

Q: Why does the data look shifted during summer?

Look at February and April 2013:

# find beginning of day for each record
dta$DBegin <- as.POSIXct( trunc( dta$Dtm, units="days" ) )
# find hours from midnight
dta$DTime <- as.numeric( dta$Dtm - dta$DBegin, units="hours" )
# find timestamp at beginning of month for each record
dta$MBegin <- as.POSIXct( paste0( substr( dta$DateTime, 1, 7 )
                                , "-01" ) )
dta2mo <- subset( dta
                , dta$MBegin
                   %in% as.POSIXct( c( "2013-02-01"
                                     , "2013-04-01" ) ) )

Q: Why does the data look shifted during summer? (Plot)

A: Because the people are following civil time

Even though the data are recorded in GMT, the people using the electricity react to the civil time clock. After converting from character, we can change TZ so the same timestamps will display differently when plotted or printed:

Sys.setenv( TZ = "Europe/London" )
# find beginning of day for each record
dta$DBegin <- as.POSIXct( trunc( dta$Dtm, units="days" ) )
# find hours from midnight
dta$DTime <- as.numeric( dta$Dtm - dta$DBegin, units="hours" )
# find timestamp at beginning of month for each record
dta$MBegin <- as.POSIXct( paste0( substr( dta$DateTime, 1, 7 )
                                , "-01" ) )
dta2mo <- subset( dta
                , dta$MBegin
                   %in% as.POSIXct( c( "2013-02-01"
                                     , "2013-04-01" ) ) )

A: Because the people are following civil time (Plot)

Q: How to handle multi-column Date/Times?

An old solar weather file for San Francisco Airport 4:

These files were recorded in “local standard time” (LST; no daylight savings). Could set TZ=“GMT” if desired, but here we illustrate what an LST data set looks like, and show how to convert multiple numeric columns into a timestamp using ISOdatetime:

# dta3 import not shown
str( dta3 )
## 'data.frame':    8760 obs. of  15 variables:
##  $ Yr     : int  90 90 90 90 90 90 90 90 90 90 ...
##  $ Mo     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Dy     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Hr     : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ G_etdir: int  0 0 0 0 0 0 0 69 246 446 ...
##  $ G_etdni: int  0 0 0 0 0 0 0 731 1415 1415 ...
##  $ G_g    : int  0 0 0 0 0 0 0 7 64 87 ...
##  $ G_g_src: chr  " ?" " ?" " ?" " ?" ...
##  $ G_g_unc: int  0 0 0 0 0 0 0 5 5 4 ...
##  $ G_n    : int  0 0 0 0 0 0 0 0 54 13 ...
##  $ G_n_src: chr  " ?" " ?" " ?" " ?" ...
##  $ G_n_unc: int  0 0 0 0 0 0 0 4 4 4 ...
##  $ G_d    : int  0 0 0 0 0 0 0 7 55 83 ...
##  $ G_d_src: chr  " ?" " ?" " ?" " ?" ...
##  $ G_d_unc: int  0 0 0 0 0 0 0 5 5 5 ...

A: How to handle multi-column Date/Times? ISOdatetime

Sys.setenv( TZ = "Etc/GMT+8" ) # yes, that is plus for West of GMT
dta3$Dtm <- with( dta3, ISOdatetime( Yr + 1900, Mo, Dy, Hr, 0, 0 ) )

Q: Separate Dates and Times

Yet another old source of solar data 5:

# dta4 import not shown
str( dta4 )
## 'data.frame':    8760 obs. of  43 variables:
##  $ YYYY-MM-DD               : chr  "2010-01-01" "2010-01-01" "2010-01-01" "2010-01-01" ...
##  $ HH:MM (LST)              : chr  "1:00" "2:00" "3:00" "4:00" ...
##  $ Zenith (deg)             : num  99 99 99 99 99 99 99 87.4 80.1 71.7 ...
##  $ Azimuth (deg)            : num  -99 -99 -99 -99 -99 ...
##  $ ETR (Wh/m^2)             : int  0 0 0 0 0 0 0 36 243 443 ...
##  $ ETRN (Wh/m^2)            : int  0 0 0 0 0 0 0 790 1415 1415 ...
##  $ SUNY Glo (Wh/m^2)        : logi  NA NA NA NA NA NA ...
##  $ SUNY Glo Flg             : int  99 99 99 99 99 99 99 99 99 99 ...
##  $ SUNY Glo Unc (%)         : logi  NA NA NA NA NA NA ...
##  $ SUNY Dir (Wh/m^2)        : logi  NA NA NA NA NA NA ...
##  $ SUNY Dir Flg             : int  99 99 99 99 99 99 99 99 99 99 ...
##  $ SUNY Dir Unc (%)         : logi  NA NA NA NA NA NA ...
##  $ SUNY Dif (Wh/m^2)        : logi  NA NA NA NA NA NA ...
##  $ SUNY Dif Flg             : int  99 99 99 99 99 99 99 99 99 99 ...
##  $ SUNY Dif Unc (%)         : logi  NA NA NA NA NA NA ...
##  $ METSTAT Glo (Wh/m^2)     : int  0 0 0 0 0 0 0 5 42 183 ...
##  $ METSTAT Glo Unc (%)      : int  0 0 0 0 0 0 0 24 24 24 ...
##  $ METSTAT Dir (Wh/m^2)     : int  0 0 0 0 0 0 0 0 0 131 ...
##  $ METSTAT Dir Unc (%)      : int  0 0 0 0 0 0 0 27 27 27 ...
##  $ METSTAT Dif (Wh/m^2)     : int  0 0 0 0 0 0 0 5 42 142 ...
##  $ METSTAT Dif Unc (%)      : int  0 0 0 0 0 0 0 24 24 24 ...
##  $ METSTAT CSKY Glo (Wh/m^2): int  0 0 0 0 0 0 0 12 126 275 ...
##  $ METSTAT CSKY Glo Unc (%) : int  0 0 0 0 0 0 0 10 10 10 ...
##  $ METSTAT CSKY Dir (Wh/m^2): int  0 0 0 0 0 0 0 130 544 732 ...
##  $ METSTAT CSKY Dir Unc (%) : int  0 0 0 0 0 0 0 16 16 16 ...
##  $ METSTAT CSKY Dif (Wh/m^2): int  0 0 0 0 0 0 0 6 32 46 ...
##  $ METSTAT CSKY Dif Unc (%) : int  0 0 0 0 0 0 0 10 10 10 ...
##  $ Meas Glo (Wh/m^2)        : logi  NA NA NA NA NA NA ...
##  $ Meas Glo Flg             : int  99 99 99 99 99 99 99 99 99 99 ...
##  $ Meas Dir (Wh/m^2)        : logi  NA NA NA NA NA NA ...
##  $ Meas Dir Flg             : int  99 99 99 99 99 99 99 99 99 99 ...
##  $ Meas Dif (Wh/m^2)        : logi  NA NA NA NA NA NA ...
##  $ Meas Dif Flg             : int  99 99 99 99 99 99 99 99 99 99 ...
##  $ Precip Wat (cm)          : num  2.6 2.6 2.5 2.5 2.5 2.6 2.6 2.6 2.6 2.6 ...
##  $ Precip Wat Flg           : int  3 51 51 3 51 51 3 51 51 3 ...
##  $ AOD (unitless)           : num  0.108 0.108 0.108 0.108 0.108 0.108 0.108 0.108 0.108 0.108 ...
##  $ AOD Flg                  : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ AOD RAN (unitless)       : num  0.108 0.108 0.108 0.108 0.108 0.108 0.108 0.052 0.052 0.052 ...
##  $ AOD RAN Flg              : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Ozone (cm)               : num  0.303 0.303 0.303 0.303 0.302 0.302 0.302 0.302 0.302 0.302 ...
##  $ Ozone Flg                : int  2 51 51 51 51 51 51 51 51 51 ...
##  $ Albedo (unitless)        : num  0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 ...
##  $ Albedo Flg               : int  0 0 0 0 0 0 0 0 0 0 ...

A: Separate Dates and Times

The Date type is convenient if you don’t need to worry about time, but beware of comparing or calculating between Date and POSIXct:

Sys.setenv(TZ = "Etc/GMT+8" ) # local standard time only
dta4$Dt <- as.Date( dta4$`YYYY-MM-DD` )
dta4$Dtm <- as.POSIXct( paste( dta4$`YYYY-MM-DD`, dta4$`HH:MM (LST)` ) )
head( dta4[ , c( "YYYY-MM-DD", "HH:MM (LST)", "Dt", "Dtm" ) ] )
##   YYYY-MM-DD HH:MM (LST)         Dt                 Dtm
## 1 2010-01-01        1:00 2010-01-01 2010-01-01 01:00:00
## 2 2010-01-01        2:00 2010-01-01 2010-01-01 02:00:00
## 3 2010-01-01        3:00 2010-01-01 2010-01-01 03:00:00
## 4 2010-01-01        4:00 2010-01-01 2010-01-01 04:00:00
## 5 2010-01-01        5:00 2010-01-01 2010-01-01 05:00:00
## 6 2010-01-01        6:00 2010-01-01 2010-01-01 06:00:00
dta4$Dtm[ 1 ] - as.POSIXct( dta4$Dt[ 1 ] )
## Time difference of 9 hours

When a Date value is converted to POSIXct, it is always treated as if it was in GMT timezone! If you need a midnight-date that can be compared/subtracted with POSIXct intuitively then use as.POSIXct( trunc( Dtm ) ). Also avoid mixing variables created with ISOdate and ISOdatetime for the same reason.

Conclusion


  1. https://data.london.gov.uk/dataset/smartmeter-energy-use-data-in-london-households

  2. https://en.wikipedia.org/wiki/List_of_tz_database_time_zones

  3. ?OlsonNames

  4. NSRDB 1961-1990: Hourly Data Files, https://rredc.nrel.gov/solar/old_data/nsrdb/1961-1990/

  5. National Solar Radiation Data Base 1991- 2010 Update, https://rredc.nrel.gov/solar/old_data/nsrdb/1991-2010/