-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NAs causing unexpected special cause flags #156
Comments
Thanks for raising this issue, and excellent catch. We'll need to decide what the behaviour should be here. My preference would be to tolerate the NA, dropping the point from the x-axis (the real world is messy and does occasionally contain unavoidable holes with missing data) Reproducing the case with no NAs: Reproducing the case with an NA: The point_type column is calculated in the code below, which looks at both the special_cause_flag, and relative_to_mean columns. These are both NA, so line 29 is executed. NHSRplotthedots/R/ptd_calculate_point_type.R Lines 25 to 30 in 19424d3
The values for special_cause_flag and relative_to_mean columns are set here, so this is likely where the code to tolerate NAs will need to be added: NHSRplotthedots/R/ptd_spc_standard.R Lines 91 to 97 in 19424d3
The next stage is probably for someone to write a failing test. It would be worth also checking for other bugs that might be created by the other NAs that appear in that row. |
Just a note, that as a workaround, you can pre-filter the incoming dataframe to contain only dates with data. This gives a result with gaps in the x axis, which makes the missing data more obvious while not affecting the SPC logic. data("ae_attendances")
stable_set <- ae_attendances %>%
filter(org_code == "RRK",
type ==1,
period < as.Date("2018-04-01"))
# remove some data
stable_set$breaches[stable_set$period==as.Date("2018-03-01")] <- NA
# removing more data in the middle of the plot, as an illustration
stable_set$breaches[stable_set$period==as.Date("2017-06-01")] <- NA
# filter to remove any dates with NAs
filtered_set <- stable_set %>% filter(!is.na(breaches))
ptd_spc(filtered_set, value_field = breaches, date_field = period, improvement_direction = "decrease")
|
I am leaning towards not implementing any changes to tolerate or work around NAs. Perhaps we should throw a warning to prompt to user to look more closely at their data? The user is in control of the data being passed in. Open to thoughts from others... |
If I recall correctly, the documentation clearly states to not include NAs. I had read the documentation but still accidently included them so I like the idea of a warning when NAs are present to prompt a check of the data. When NAs are included there is a risk that the users without experience of SPC may interpret the false special cause flags as real. If an error message will mitigate that risk then all good. |
I think the best solution here would be to have a check along the lines of I don't like the idea of doing implicit dropping of values... the logic would be a mess (if there are no NA's, do nothing. If there are, drop na's but give a warning). I think the cleanness of raising an error but telling what to do to fix the issue is best. |
Hi All
I'm new to github and NHSRplotthedots, fairly new to NHSR Community and not that experienced in R either so please be forgiving if this is not appropriate or has been dealt with elsewhere - I came across an apparent issue today which I thought I would share.
I've been playing around with the NHSRplotthedots package and accidently left an NA value in my dataset which caused some special cause flags that should have been common cause.
Here is an example of the issue:
`library(NHSRplotthedots)
library(NHSRdatasets)
library(dplyr)
library(ggplot2)
library(scales)
data("ae_attendances")
stable_set <- ae_attendances %>%
filter(org_code == "RRK",
type ==1,
period < as.Date("2018-04-01"))
ptd_spc(stable_set, value_field = breaches, date_field = period, improvement_direction = "decrease")
#Note last 6 data points show common cause variation`
`#Now set the last data point to NA and rerun SPC
stable_set$breaches[stable_set$period==as.Date("2018-03-01")] <- NA
ptd_spc(stable_set, value_field = breaches, date_field = period, improvement_direction = "decrease")
#Now last 5 points show special cause variation`
Thanks for all the great work you are doing and looking forward to future developments and more packages from the community!
Gary
The text was updated successfully, but these errors were encountered: