-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with run times of data pulling #345
Comments
👋 @aylapear. Great to hear that you are using bcdata! This is a bit difficult to diagnose and is possibly a shiny issue rather than a bcdata one. Would you be able to construct a reprex that is self contained (i.e. not using shiny) and time it? I think one approach could be to run a bunch of bcdata commands in parallel and then sequential and see what sort of timings you get. |
I strongly suspect this is an I/O issue (busy traffic to the BC Data Catalogue API and/or network speed when transferring the CSV data). BC Stats has a few Apps with the same challenge. We improved one by breaking up the data calls (the App uses tabs and data is loaded only when users are on a tab). It might be worth exploring storing the open-licensed data somewhere else for the App and seeing if that improves speed reliability, and/or even better, if it opens up options for changing the data store format to something much more efficient, e.g., parquet files, which means less data to move? |
A single bucket for all open data would be pretty handy. |
This is the reprex I used to prove to myself it wasn't the shiny part causing the issue. I opened several scripts and ran it, I noticed as I opened more the runs times started to increase, it took at least 10 or so before I started to noticed the issue.
|
This is a new issue and the app has grown since the initial design. The bcdata package is great and worked for a long time for this app without issues. If this is an issue that is not going to go away or may only get worse with higher user volumes then the app will need to be refactored. I wanted to open the dialogue to see if others are having this issue and if something within bcdata could be changed instead of having to refactor the app. Glad people have responded indicating they have also had this issue and found some creative solutions as a work around. |
I am not an expert in APIs, but I think one way to test if the the performance reduction is |
@aylapear can you elaborate on what you were doing here?
Did you open up different R processes for this? Also can you post the times that you saw when you ran the above script? On my network I get:
That variation does not seem too bad to me? |
@boshek Yes multiple R processes were opened. I opened several instances of RStudio and then ran the script. So I had multiple RStudio windows/apps open and ran the script once in each RStudio. I found if only one or two RStudios were open (and running the script) then the run time were around 8 seconds but once 10 RStudios were open (and running the script) the run times started to increase to 22 seconds.
The app does not make 150 calls x 7 times to the catalogue. The app does make 7 calls to the catalogue each time it is opened by a user. The reason I made the 150 calls was to capture that users reported changes in the run times at different times of the day as well as it made it easy for me to open multiple R processes, get one running, then start the next one and ensure the first ones are still running by the time the last one started. The 150 number is just a random number I picked. The BC gov employees who manage the app have reported times of over 90 seconds for the app to start up and I noted this when I first started investigating the issue but at that time did not have a system for tracking it as I was unsure of what was causing the issue. When the app starts up there are messages it shows so we can see the app launches and then the next step is to download the data from the data catalogue using the bcdata package and this is where it would get stuck. It does not happen all the time but it has happened enough that users have been reporting it and have been frustrated by it. @aazizish do you have additional comments to add to this discussion? |
I ran things again this morning. I started by opening a single RStudio and ran the script. It got this run time:
Then I opened 14 RStudio's and ran the script in each one. I would start the script in one RStudio and then open the next RStudio and start the script until all 15 RStudio windows were running the script at the same time. The run times started to increase in all the RStudio's. With the highest run time being 34 seconds. Here are the run times from each RStudio:
|
The shinywqg shiny app uses the
bcdata::bcdc_get_data()
function to pull seven data sets from the BC data catalogue when the app launches.Normally it takes about 14 seconds for the data to pull using the
bcdata::bcdc_get_data()
function. Users have reported that sometimes it can take over 90 seconds or even longer and this is often during periods where you would expect high volume use ie during Monday morning.This issue has been difficult to diagnose because it is intermittent. I was able to simulate the issue by running the function in parallel and the more functions that were running at the same time I started to see the pull times for the seven data sets increase. When the function was run a single time it took 14 seconds to pull the seven data sets and when I had the function running 10 times simultaneously the pull times for the seven data sets increased to 35 seconds.
App users have been reporting these issues and so we are looking for solutions for how to decrease this run time when multiple users are accessing the app.
The text was updated successfully, but these errors were encountered: