Manipulating date times with rap and map

The time on the clock

Yesterday Ollie, a student here at the Master in Applied Data Science, had a very good question about local times.

The very short answer is you don’t simply mix local times in R. Not sure about the political message of this 😼.

Ollie is working on a dataframe with a lot of datetimes, all expressed in UTC. The datetimes are from different places around the world. Given the nature of the problem he is solving (something to do with bee keeping!), he wants to know what time it was on the wall clock in those places.

There’s a direct solution — thanks to lubridate — but the path we take to solve the task offers an interesting view on datetime vectors (namely, POSIXct vectors). And let me play with purrr and rap.

The roads goes ever on

Let’s walk. We don’t even get to the door without our walking sticks.

library(tidyverse) # for data wrangling
library(rap) # an alternative to purrr
library(vctrs) # we'll see way later
library(lubridate) # for dates and stuff

The road goes ever on, and it is a time of lies and intrigues: we build a test dataset based on the lakers dataset from lubridate.

lakers contains two columns, date and time, and we need to unite() them to obtain a proper datetime column. We pretend all those datetimes are UTC time.

example_df <- lakers %>%
  tbl_df() %>%
  unite(datetime, date, time, sep = " ") %>%
  mutate(utc_time = datetime %>% ymd_hm(tz = "UTC"),
         timez = sample(OlsonNames(),
                        size = nrow(.),
                        replace = TRUE)) %>%
  select(utc_time, timez)

The result is a suitable narrow tibble with a UTC time and a (random) timezone.

example_df
#> # A tibble: 34,624 x 2
#>    utc_time            timez
#>    <dttm>              <chr>
#>  1 2008-10-28 12:00:00 America/Moncton
#>  2 2008-10-28 11:39:00 America/Nipigon
#>  3 2008-10-28 11:37:00 Antarctica/Syowa
#>  4 2008-10-28 11:25:00 MET
#>  5 2008-10-28 11:23:00 Brazil/DeNoronha
#>  6 2008-10-28 11:22:00 Pacific/Wallis
#>  7 2008-10-28 11:22:00 Etc/Greenwich
#>  8 2008-10-28 11:22:00 US/Eastern
#>  9 2008-10-28 11:00:00 Factory
#> 10 2008-10-28 10:53:00 Etc/GMT-5
#> # … with 34,614 more rows

what-the-heck o’clocl

Looking into its tools bag, Ollie found with_tz(): the lubridate function to convert datetime from one time zone to another. That looks exactly like what we need, doesn’t it? So, let’s try to use it!

😠 take 1:

Let’s see if we can just use it like it is. We use mutate to create the new variable, and cross our fingers.

example_df %>%
  mutate(local_time = with_tz(time = utc_time, tzone =  timez))

#> Error in as.POSIXlt.POSIXct(x, tz) : invalid 'tz' value

What the banana! Invalid time zone?? That can’t be! We used the time zones straight out of OlsonNames(), so there’s no way they can be wrong. So what? mmm, maybe it’s not vectorized?

😠 take 2:

Let’s try to use purrr::map()— or a suitable version — to work on every row and get what we need.

example_df %>%
  mutate(local_time = map2(utc_time,timez,with_tz))
#> # A tibble: 34,624 x 3
#>    utc_time            timez               local_time
#>    <dttm>              <chr>               <list>
#>  1 2008-10-28 12:00:00 America/El_Salvador <dttm [1]>
#>  2 2008-10-28 11:39:00 Asia/Beirut         <dttm [1]>
#>  3 2008-10-28 11:37:00 Greenwich           <dttm [1]>
#>  4 2008-10-28 11:25:00 Pacific/Ponape      <dttm [1]>
#>  5 2008-10-28 11:23:00 Libya               <dttm [1]>
#>  6 2008-10-28 11:22:00 Asia/Ulan_Bator     <dttm [1]>
#>  7 2008-10-28 11:22:00 GMT+0               <dttm [1]>
#>  8 2008-10-28 11:22:00 America/Belize      <dttm [1]>
#>  9 2008-10-28 11:00:00 Asia/Gaza           <dttm [1]>
#> 10 2008-10-28 10:53:00 Indian/Mayotte      <dttm [1]>
#> # … with 34,614 more rows

Looks promising, but we got back a list instead of just a datetime — and it’s not purrr error: we did not specify any output format, so it probably just throw everything in a big list. But this are all datetimes. Alas!, there’s not map_...() function specific to datetimes.

No biggie, we can try to unnlist()

example_df %>%
  mutate(local_time = map2(utc_time,timez,with_tz)) %>%
  unnest()
#> # A tibble: 34,624 x 3
#>    utc_time            timez               local_time
#>    <dttm>              <chr>               <dttm>
#>  1 2008-10-28 12:00:00 America/El_Salvador 2008-10-28 12:00:00
#>  2 2008-10-28 11:39:00 Asia/Beirut         2008-10-28 11:39:00
#>  3 2008-10-28 11:37:00 Greenwich           2008-10-28 11:37:00
#>  4 2008-10-28 11:25:00 Pacific/Ponape      2008-10-28 11:25:00
#>  5 2008-10-28 11:23:00 Libya               2008-10-28 11:23:00
#>  6 2008-10-28 11:22:00 Asia/Ulan_Bator     2008-10-28 11:22:00
#>  7 2008-10-28 11:22:00 GMT+0               2008-10-28 11:22:00
#>  8 2008-10-28 11:22:00 America/Belize      2008-10-28 11:22:00
#>  9 2008-10-28 11:00:00 Asia/Gaza           2008-10-28 11:00:00
#> 10 2008-10-28 10:53:00 Indian/Mayotte      2008-10-28 10:53:00
#> # … with 34,614 more rows

What the kerfuffle! That’s just the UTC time back again. That’s NOT what we want. Silly purrr.

😠 take 3:

Let’s try something different, something new and wonderful! Romain Francois just twitted about rap, a way of doing mutate + map in one call, and you can specify the format of the outputs using vctrs. vctrs has the appropriate constructor for the kind of ouput we want:

Using the notation of rap (here’s there’s a nice introduction) we can do this:

example_df %>%
  rap(local_time = new_datetime()  ~ with_tz(utc_time, timez))
#> # A tibble: 34,624 x 3
#>    utc_time            timez                      local_time
#>    <dttm>              <chr>                      <dttm>
#>  1 2008-10-28 12:00:00 Brazil/DeNoronha           2008-10-29 01:00:00
#>  2 2008-10-28 11:39:00 America/Lima               2008-10-29 00:39:00
#>  3 2008-10-28 11:37:00 Africa/Ouagadougou         2008-10-29 00:37:00
#>  4 2008-10-28 11:25:00 Pacific/Kosrae             2008-10-29 00:25:00
#>  5 2008-10-28 11:23:00 Australia/Melbourne        2008-10-29 00:23:00
#>  6 2008-10-28 11:22:00 America/Porto_Velho        2008-10-29 00:22:00
#>  7 2008-10-28 11:22:00 America/Argentina/Tucuman  2008-10-29 00:22:00
#>  8 2008-10-28 11:22:00 Antarctica/South_Pole      2008-10-29 00:22:00
#>  9 2008-10-28 11:00:00 America/Argentina/La_Rioja 2008-10-29 00:00:00
#> 10 2008-10-28 10:53:00 Etc/GMT-10                 2008-10-28 23:53:00
#> # … with 34,614 more rows

Oh, that’s not UTC time anymore.

The left-handside of the formula (the stuff at the left of ~) is the kind of output we want. The right-handside of the formula (the stuff at the right of ~) is something very similar to map(), but we can use directly the variable names and not those pesky purrr names (.x, .y, ..1, ..2, …: purrr uses this shorthand names to define its lambda functions arguments).

But! Wait! Take a second look! It’s not UTC time, but it’s not even the local time: the difference between utc_time and time is always the same, no matter what timez says. That’s not good 💔.

Got the number 13 tatooed on my neck

The issue here is with the POSIXct, and thus datetimes, encoding. Datetimes are stored as vectors and the timezone is an attribute of that vector. That means that we can’t mix datetimes with different timezones: we either get an error, or they are promoted to same timezone. So, no luck here.

The issue was not purrr nor rap, but the way datetimes are treated. A compromise solution was to accept a list column back: that is, skipping the unnest() step or not specifying anything in the left-handside of the formula for rap() (rap(local_time = ~ with_tz(utc_time, timez))). We don’t get to see the datetime printed nicely to screen, but we can still operate on those (more or less comfortably).

The happy encoding

Don’t shed no tear: this story has an happy ending.

Oh, you were here for the short story and the direct solution? Well, in lubridate there exists a function, local_time(), which returns the time on the clock at a specified time zone. And it turns out that local_time is vectorized, so you don’t need to use map() nor rap():

example_df %>%
  mutate(local_time = local_time(utc_time, timez, units = "hours"))

#> # A tibble: 34,624 x 3
#>    utc_time            timez                 local_time
#>    <dttm>              <chr>                 <time>
#>  1 2008-10-28 12:00:00 Atlantic/St_Helena    12.000000 hours
#>  2 2008-10-28 11:39:00 America/Grand_Turk     7.650000 hours
#>  3 2008-10-28 11:37:00 Asia/Kolkata          17.116667 hours
#>  4 2008-10-28 11:25:00 Pacific/Efate         22.416667 hours
#>  5 2008-10-28 11:23:00 Europe/Stockholm      12.383333 hours
#>  6 2008-10-28 11:22:00 Antarctica/Vostok     17.366667 hours
#>  7 2008-10-28 11:22:00 America/Lower_Princes  7.366667 hours
#>  8 2008-10-28 11:22:00 Australia/West        20.366667 hours
#>  9 2008-10-28 11:00:00 America/Edmonton       5.000000 hours
#> 10 2008-10-28 10:53:00 Pacific/Port_Moresby  20.883333 hours
#> # … with 34,614 more rows

Here you are Ollie! 😄


gvdr rambles · gvdr teaches · gvdr codes · who is gvdr? · home