-
Notifications
You must be signed in to change notification settings - Fork 24
/
Copy pathdate_objects.qmd
291 lines (201 loc) · 11.3 KB
/
date_objects.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
# Working with Dates
```{r echo=FALSE}
source("libs/Common.R")
```
```{r echo = FALSE}
pkg_ver(c("lubridate", "stringr"))
```
Date values can be represented in tables as numbers or characters. But to be properly interpreted by R as dates, date values should be converted to an R **date** object class or a `POSIXct`/`POSIXt` object class. R provides many facilities to convert and manipulate dates and times, but a package called `lubridate` makes working with dates/times much easier.
## Creating date/time objects
### From complete date strings
You can convert many representations of date and time to date objects. For example, let's create a vector of dates represented as month/day/year character strings,
```{r}
x <- c("06/23/2013", "06/30/2013", "07/12/2014")
class(x)
```
At this point, R treats the vector `x` as characters. To force R to interpret these as dates, use `lubridate`'s `mdy` function. `mdy` will convert date strings where the date elements are ordered as month, day and year.
```{r}
library(lubridate)
x.date <- mdy(x)
class(x.date)
```
Note that using the `mode` or `typeof` functions will not help us determine if the object is an R *date* object. This is because a date is stored as a `numeric` (double) internally. Use the `class` function instead as shown in the above code chunk.
If you need to specify the time zone, add the parameter `tz=`. For example, to specify *Eastern Standard Time*, type:
```{r}
x.date <- mdy(x, tz="EST")
x.date
```
The `mdy` function can read in date formats that use different delimiters so that `mdy("06/23/2013")`, `mdy("06-23-2013")` and `mdy("06.23.2013")` are parsed exactly the same so long as the order remains month/day/year.
For different month/day/year arrangements, other `lubridate` functions need to be used:
Functions | Date Format
-----------|--------------
`dmy()` | day/month/year
`ymd()` | year/month/day
`ydm()` | year/day/month
If your data contains both date *and* time in a "month/day/year hour:minutes:seconds" format use the `mdy_hms` function.
```{r}
x <- c("06/23/2013 03:45:23", "07/30/2013 14:23:00", "08/12/2014 18:01:59")
x.date <- mdy_hms(x, tz="EST")
x.date
```
The characters `_h`, `_hm` or `_hms` can be appended to any of the four date function names described earlier to accommodate time formats. A few examples follow:
```{r}
mdy_h("6/23/2013 3", tz="EST")
dmy_hm("23/6/2013 3:15", tz="EST")
ymd_hms("2013/06/23 3:15:7", tz="EST")
```
Note that adding a time element to the date object will create `POSIXct` and `POSIXt` object classes instead of `Date` object classes.
```{r}
class(x.date)
```
Also, if a timezone is not explicitly defined for a time based date, the function assigns `UTC` ( Universal Coordinated Time).
```{r}
dmy_hm("23/6/2013 3:15")
```
### Setting and modifying timezones
R does not maintain its own list of timezone names, instead, it relies on the operating system's naming convention. To list the supported timezone names for your particular R environment, type:
```{r eval=FALSE}
OlsonNames()
```
For example, to select Daylight Savings Time type `tz = "EST5EDT"`.
```{r}
x.date <- mdy_hms(x, tz="EST5EDT")
x.date
```
```{r}
class(x.date)
```
If you need to convert the day/time to another timezone, use `lubridate`'s `with_tz()` function. For example, to convert `x.date` from it's current `EST5DST` timezone to the `US/Alaska` time zone, type:
```{r}
with_tz(x.date, tzone = "US/Alaska")
```
Note that the `with_tz` function will change the timestamp to reflect the new time zone. If you simply want to change the time zone definition and not the timestamp, use the `tz()` function.
```{r}
tz(x.date) <- "US/Alaska"
x.date
```
### From separate date elements
If your data table splits the date elements into separate vector objects or columns, use the `paste` function to combine the elements into a single date string before passing it to one of the `lubridate` functions. Let's look at an example:
```{r}
dat1 <- read.csv("http://mgimond.github.io/ES218/Data/CO2.csv")
head(dat1)
```
The CO2 dataset has the date split across two columns: `Year` and `Month` (both stored as integers). You can combine the columns into a character string using the `paste` function. For example, if we want to create a "Year-Month" string as in `1959-10`, we could type:
```{r eval=FALSE}
paste(dat1$Year,dat1$Month, sep="-")
```
The above example uses three arguments: the two objects that are pasted together (i.e. `Year` and `Month`) and the `sep="-"` parameter which fills the gap between both objects with a dash `-` (by default, `paste` would have added spaces thus creating strings in the form of `1959 10`).
`lubridate` does not have a function along the lines of `ym` to convert just the year-month strings, this requires that we add an artificial day of the month to the string. We'll choose to add the 15^th^ day of the month as in
```{r eval=FALSE}
paste(dat1$Year, dat1$Month, "15", sep="-")
```
We are now ready to add a new column called `Date` to the `dat` object and fill that column with a *real* date object:
```{r}
dat1$Date <- ymd( paste(dat1$Year, dat1$Month, "15", sep="-") )
head(dat1)
```
The `sep="-"` option is not needed with the `lubridate` function since `lubridate` will recognize a space as a valid delimiter, so the last piece of code could have been written as:
```{r}
dat1$Date <- ymd( paste(dat1$Year, dat1$Month, "15") )
```
To confirm that the `Date` column is indeed formatted as a date object type:
```{r}
str(dat1)
```
or you could type,
```{r}
class(dat1$Date)
```
Since we did not add a timezone or a time component to the date object the `Date` column was assigned a `Date` class as opposed to the `POSIX...` class.
### Padding time values
The lubridate functions may expect the time values to consist of a specific number of characters if a delimiter such as `:` is not present to split the time elements. For example, the following will not generate a valide date/time object:
```{r}
hrmin <- 712 # Time 7:12
date <- "2018/03/17" # Date
ymd_hm(paste(date, hrmin))
```
One solution is to *pad* the time element with 0's to complete a four character vector (or a six character vector if seconds are part of the time element). We can use the `str_pad` function from the `stringr` package to pad the time object (the `stringr` package is covered [in chapter 6](strings.html)).
```{r}
library(stringr)
ymd_hm(paste(date, str_pad(hrmin, width=4, pad="0")))
```
## Extracting date information
If you want to extract the day of the week from a date vector, use the `wday` function.
```{r}
wday(x.date)
```
If you want the day of the week displayed as its three letter designation, add the `label=TRUE` parameter.
```{r}
wday(x.date, label=TRUE)
```
You'll note that the function returns a `factor` object with seven levels--one for each day of the week (`r levels(lubridate::wday(x.date, label = TRUE))`)--as well as the level hierarchy which will dictate the order in which values will be displayed if grouped by this factor. The levels are not necessarily reflected in the vector elements (only `r unique(lubridate::wday(x.date, label=TRUE))` are present), but the levels are there if we were ever to add additional day elements to this vector.
The following table lists `lubridate` functions that extract different elements of a date object.
Functions | Extracted element
-----------------|--------------
`hour()` | Hour of the day
`minute()` | Minute of the hour
`day()` | Day of the month
`yday()` | Day of the year
`decimal_date()` | Decimal year
`month()` | Month of the year
`year()` | Year
`tz()` | Time zone
The `month()` and `wday()` have a `label` option that will output the values as texts and not numbers. For example:
```{r}
month(x.date, label=TRUE)
```
Note that the names are stored as factors. This may prove useful in that the names will be sorted in chronological order in the factor's level.
To get the full name, set the abbreviation parameter, `abbr`, to `FALSE`.
```{r}
month(x.date, label=TRUE, abbr = FALSE)
```
## Operating on dates
You can apply certain operations to dates as you would to numbers. For example, to list the number of days between the first and third elements of the vector `x.date` type the following:
```{r}
(x.date[3] - x.date[1]) / ddays()
```
To get the number of weeks between both dates:
```{r}
(x.date[3] - x.date[1]) / dweeks()
```
Likewise, you can get the number of minutes between dates by dividing by `dminutes()` and the number of years by dividing by `dyears()`.
You can also apply Boolean operations on dates. For example, to find which date element in `x.date` falls between the 11^th^ and 24^th^ day of any month, type:
```{r}
(mday(x.date) > 11) & (mday(x.date) < 24)
```
If you want the command to return just the dates that satisfy this query, pass the Boolean operation as an index to the `x.date` vector:
```{r}
x.date[ (mday(x.date) > 11) & (mday(x.date) < 24) ]
```
## Formatting date objects
You can create a character vector from a date object. This is useful if you want to annotate plots with dates or include date values in reports. For example, to convert the date object `x.date` to a "<Month name> Year" character format, type the following:
```{r}
format(x.date, "%B %Y")
```
The `format` function accepts many different date/time format codes listed in the following table (note the cases!).
Format codes | Description | Example
--------------|--------------------------|------------------------------------------
`%a` | Abbreviated weekday name | `r as.character(x.date, format="%a")`
`%A` | Full weekday name | `r as.character(x.date, format="%A")`
`%m` | Month as decimal number | `r as.character(x.date, format="%m")`
`%b` | Abbreviated month name | `r as.character(x.date, format="%b")`
`%B` | Full month name | `r as.character(x.date, format="%B")`
`%c` | Date and time, locale-specific | `r as.character(x.date, format="%c")`
`%d` | Day of the month as decimal number | `r as.character(x.date, format="%d")`
`%H` | Hours as decimal number (00 to 23) | `r as.character(x.date, format="%H")`
`%I` | Hours as decimal number (01 to 12) | `r as.character(x.date, format="%I")`
`%p` | AM/PM indicator in the locale | `r as.character(x.date, format="%p")`
`%j` | Day of year as decimal number | `r as.character(x.date, format="%j")`
`%M` | Minute as decimal number (00 to 59) | `r as.character(x.date, format="%M")`
`%S` | Second as decimal number | `r as.character(x.date, format="%S")`
`%U` | Week of the year starting on the first Sunday | `r as.character(x.date, format="%U")`
`%W` | Week of the year starting on the first Monday | `r as.character(x.date, format="%W")`
`%w` | Weekday as decimal number (Sunday = 0) | `r as.character(x.date, format="%w")`
`%x` | Date (locale-specific) | `r as.character(x.date, format="%x")`
`%X` | Time (locale-specific) | `r as.character(x.date, format="%X")`
`%Y` | 4-digit year | `r as.character(x.date, format="%Y")`
`%y` | 2-digit year | `r as.character(x.date, format="%y")`
`%Z` | Abbreviated time zone | `r as.character(x.date, format="%Z")`
`%z` | Time zone | `r as.character(x.date, format="%z")`