-
-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
period.apply on xts object takes about 50x longer than running it on it's core data and recasting. #278
Comments
n <- 1e6L
ep <- c(0L, seq(1L, n, 3L))
x <- .xts(runif(n, 1, n), 1:n)
system.time(test1 <- xts:::period_apply(x, ep, sum))
# user system elapsed
# 2.104 0.044 2.149
system.time(test2 <- period.apply(coredata(x), ep, sum))
# user system elapsed
# 0.705 0.019 0.725
identical(coredata(test1), test2)
# [1] TRUE I may be able to use some ideas from |
Cool. Thx Josh!
Chris
Sent from my iPhone
… On Nov 3, 2018, at 07:12, Joshua Ulrich ***@***.***> wrote:
period.apply() is essentially a convenience wrapper around sapply(), so it's not particularly performant. There's a faster unexported C-based version that needs some testing before I'm comfortable exporting it and/or replacing period.apply() with it (see 7c29359).
n <- 1e6L
ep <- c(0L, seq(1L, n, 3L))
x <- .xts(runif(n, 1, n), 1:n)
system.time(test1 <- xts:::period_apply(x, ep, sum))
# user system elapsed
# 2.104 0.044 2.149
system.time(test2 <- period.apply(coredata(x), ep, sum))
# user system elapsed
# 0.705 0.019 0.725
identical(coredata(test1), test2)
# [1] TRUE
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I've been able to go faster using data.table and converting the results back to an xts:
|
I appreciate your comment and desire to make xts even faster. However, your code is over-optimized to this specific example. Here are the issues that are immediately clear to me:
I'm also very unlikely to change the existing I also just checked my suspicion that data.table optimizes the call to |
I see show improvement for other functions as well. It seems to be faster in any instance where FUN operates on an argument x that is the coredata of each interval. It could be useful as an alternative implementation of period.sum, period.min/max. I also included an example comparing to period.sum, which does not allow x to have more than one column. This example would not work if the user needed the x argument to FUN to be an xts object, with an index attribute. It is not documented in period.apply if FUN can assume it gets an xts object, or even that x is an xts. You have an example using letters where x is not an xts object . > library(xts)
> n <- 1e6L
> x <- .xts(runif(n, 1, n), 1:n)
> e <- endpoints(x, "seconds", 5)
> # not exported
> period_apply <- xts:::period_apply
> # period.apply using datatable
> period.apply.DT <- function(x, INDEX, FUN, ...)
+ {
+ dt <- data.table::as.data.table(x)
+ dt$period <- as.numeric(dt$index[INDEX][findInterval(dt$index, dt$index[INDEX], left.open = T)+1])
+ dt <- as.matrix(dt[, lapply(.SD, FUN, ...), by='period', .SDcols = !'index'])
+ setNames(xts(dt[,-1], structure(dt[,1], class = c("POSIXct", "POSIXt"), tz='')), colnames(x))
+ }
> # period.sum using datatable
> period.sum.DT <- function(x, INDEX, na.rm=F)
+ {
+ dt <- data.table::as.data.table(x)
+ dt$period <- as.numeric(dt$index[INDEX][findInterval(dt$index, dt$index[INDEX], left.open = T)+1])
+ dt <- as.matrix(dt[, lapply(.SD, sum, na.rm=na.rm), by='period', .SDcols = !'index'])
+ setNames(xts(dt[,-1], structure(dt[,1], class = c("POSIXct", "POSIXt"), tz='')), colnames(x))
+ }
> #
> my_check <- function(values) { all(sapply(values[-1], function(x) all.equal(values[[1]], x))) }
> # period.apply with FUN=sum
> microbenchmark::microbenchmark(period.apply(x, e, sum),
+ period_apply(x, e, sum),
+ period.apply.DT(x, e, sum),
+ times=3, check=my_check, unit = 's')
Unit: seconds
expr min lq mean median uq max neval
period.apply(x, e, sum) 4.2666698 4.2768328 4.2851441 4.2869959 4.2943812 4.3017665 3
period_apply(x, e, sum) 0.7217397 0.7342956 0.7661842 0.7468515 0.7884065 0.8299614 3
period.apply.DT(x, e, sum) 0.1849304 0.1883792 0.1928879 0.1918279 0.1968667 0.2019055 3
> # period.sum
> microbenchmark::microbenchmark(period.sum(x, e),
+ period.sum.DT(x, e),
+ times=3, check=my_check, unit = 's')
Unit: seconds
expr min lq mean median uq max neval
period.sum(x, e) 4.2866287 4.4034498 4.4782497 4.520271 4.5740602 4.6278495 3
period.sum.DT(x, e) 0.1248247 0.1333578 0.1644498 0.141891 0.1842624 0.2266339 3
> # period.apply with FUN=min
> microbenchmark::microbenchmark(period.apply(x, e, min),
+ period_apply(x, e, min),
+ period.apply.DT(x, e, min),
+ times=3, check=my_check, unit = 's')
Unit: seconds
expr min lq mean median uq max neval
period.apply(x, e, min) 4.2764956 4.3273872 4.3512772 4.3782788 4.3886679 4.3990571 3
period_apply(x, e, min) 0.5536805 0.5877314 0.6010733 0.6217822 0.6247697 0.6277571 3
period.apply.DT(x, e, min) 0.1632318 0.1833229 0.1949875 0.2034139 0.2108654 0.2183168 3
> # period.apply with arbitrary function
> microbenchmark::microbenchmark(period.apply(x, e, function(x) { sqrt(prod(x)) + min(x) }),
+ period_apply(x, e, function(x) { sqrt(prod(x)) + min(x) }),
+ period.apply.DT(x, e, function(x) { sqrt(prod(x)) + min(x) }),
+ times=3, check=my_check, unit = 's')
Unit: seconds
expr min lq mean median uq max neval
period.apply(x, e, function(x) { sqrt(prod(x)) + min(x) }) 4.5865064 4.6597227 4.7162952 4.7329391 4.7811896 4.8294402 3
period_apply(x, e, function(x) { sqrt(prod(x)) + min(x) }) 1.1638887 1.1998162 1.2164511 1.2357438 1.2427324 1.2497210 3
period.apply.DT(x, e, function(x) { sqrt(prod(x)) + min(x) }) 0.2706527 0.2719362 0.2981492 0.2732197 0.3118975 0.3505753 3 |
Thanks for the notes about > microbenchmark::microbenchmark(period.sum(x, e),
+ period.sum.DT(x, e),
+ times=3, check=my_check)
Unit: milliseconds
expr min lq mean median uq max neval
period.sum(x, e) 11.41976 11.47986 17.24279 11.53996 20.15431 28.76866 3
period.sum.DT(x, e) 59.08953 61.27357 64.23368 63.45761 66.80575 70.15389 3
You're correct that it isn't documented that I'm very surprised at the data.table performance for the arbitrary function. It is faster than simply calling the function 200,000 times (the length of endpoints). I don't feel good about that. d <- x[1:4]
f <- function(x) { sqrt(prod(x)) + min(x) }
system.time(for(i in seq_len(length(e))) f(d))
# user system elapsed
# 0.576 0.000 0.577
system.time(period.apply.DT(x, e, f))
# user system elapsed
# 0.34 0.00 0.34 |
Will the new version have that as.matrix call removed?
…On Wed, Dec 19, 2018 at 7:22 AM Joshua Ulrich ***@***.***> wrote:
Thanks for the notes about period.sum() and others. That slowdown was due
to an unnecessary as.matrix() call. The timings are more in xts' favor
once those are removed:
> microbenchmark::microbenchmark(period.sum(x, e),+ period.sum.DT(x, e),+ times=3, check=my_check)Unit: milliseconds
expr min lq mean median uq max neval
period.sum(x, e) 11.41976 11.47986 17.24279 11.53996 20.15431 28.76866 3
period.sum.DT(x, e) 59.08953 61.27357 64.23368 63.45761 66.80575 70.15389 3
This example would not work if the user needed the x argument to FUN to be
an xts object, with an index attribute. It is not documented in
period.apply if FUN can assume it gets an xts object, or even that x is an
xts. You have an example using letters where x is not an xts object.
You're correct that it isn't *documented* that FUN may expect x to be an
xts object. But period.apply() calls try.xts() which *attempts* to
convert x to xts, if possible. Practically, it is very likely that FUN
expects an object similar to the object passed to period.apply().
Unfortunately, in your faster function that is only the case when the
object passed to period.apply() is a data.table.
------------------------------
I'm very surprised at the data.table performance for the arbitrary
function. It is faster than simply calling the function 200,000 times (the
length of endpoints). I don't feel good about that.
f <- function(x) { sqrt(prod(x)) + min(x) }R> system.time(for(i in seq_len(length(e))) f(d))
user system elapsed
0.576 0.000 0.577 R> system.time(period.apply.DT(x, e, f))
user system elapsed
0.34 0.00 0.34
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#278 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHKwqvDDTcdtL9s7Gw6sfAT_6mn8ZMYFks5u6j2PgaJpZM4YMQ5d>
.
|
The period.XXX() functions (sum, prod, min, max) only work on univariate series, so the as.matrix() call is not necessary. It makes the functions considerably slower, because it converts the xts index into row names for the matrix. See #278.
I can not repeat this test. What is 'd?' system.time(for(i in seq_len(length(e))) f(d)) |
@AndreMikulec d <- x[1:4]
system.time(for(i in seq_len(length(e))) f(d))
# user system elapsed
# 0.543 0.000 0.544
system.time(period.apply.DT(x, e, f))
# user system elapsed
# 0.31 0.00 0.31 |
I do not know.I am wild guessing. From Contents
From Contents
Benchmarks here are interesting. |
Description
on large xts objects +1MM lines, period.apply is much faster using core data and recasting
Expected behavior
about equal runtimes with and without a numeric index
Minimal, reproducible example
Session Info
The text was updated successfully, but these errors were encountered: