Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some speedup #168

Merged
merged 4 commits into from
Jul 14, 2024
Merged

Some speedup #168

merged 4 commits into from
Jul 14, 2024

Conversation

etiennebacher
Copy link
Contributor

@etiennebacher etiennebacher commented Sep 27, 2023

Hi @gergness @bschneidr, following #164, I tried to explore how the performance of srvyr can be improved. Indeed, most of the time is spent on survey functions, but a few dplyr functions (mostly select()) could be replaced with their base R equivalent (select() is not that slow, but it has some overhead that is repeated thousands of times). I didn't check if there were more instances where select() can be replaced, I only looked at the functions called in the example in #164.

Here's a benchmark comparing main with this PR. The package cross is not on CRAN, it can be found here: https://github.com/davisVaughan/cross. It allows to compare the same code with both branches separately.

library(bench) 
library(dplyr)

out <- cross::run(
  pkgs = c("gergness/srvyr", "etiennebacher/srvyr@speedup"), 
  ~{
    library(dplyr)
    library(srvyr)
    N <- 20000
    set.seed(123)
    
    test <- data.frame(
      grp1 = sample(letters, N, TRUE),
      grp2 = sample(LETTERS, N, TRUE),
      grp3 = sample(1:10, N, TRUE),
      weight = sample(seq(0, 1, 0.01), N, TRUE)
    ) |> 
      arrange(grp1, grp2, grp3)
    
    test_sv <- as_survey_design(test, weights = weight)
    
    bench::mark(
      test_sv |> 
        group_by(grp1, grp2, grp3) |> 
        survey_count(),
      iterations = 15
    )
    
  }
)

tidyr::unnest(out, result) |> 
  select(pkg, median, mem_alloc)
#> # A tibble: 2 × 3
#>   pkg                           median mem_alloc
#>   <chr>                       <bch:tm> <bch:byt>
#> 1 gergness/srvyr                 32.7s      16GB
#> 2 etiennebacher/srvyr@speedup    19.9s    15.9GB

@etiennebacher
Copy link
Contributor Author

Hello, is this something that you would consider implementing at some point? I have many groups in some project and srvyr is very slow in this setting so any speedup would be appreciated :)

@ashleyasmus
Copy link

I, too would love to see this implemented! @gergness can you pull this into Main?

@gergness gergness merged commit f0c5b5e into gergness:main Jul 14, 2024
6 checks passed
@gergness
Copy link
Owner

Awesome, thanks so much, and so sorry to sit on this for so long - srvyr unfortunately fell into my procrastination pit

@etiennebacher etiennebacher deleted the speedup branch July 14, 2024 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants