-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong data type of NA timestamps depending on order of data sources #269
Comments
I have already implemented a fix for that, see the linked patch: However, we also should properly test that, to make sure that the patch properly works in various situations and does not break anything. |
Your fix does not break any of the tests in our suite. From your last message I was not sure if you want me to write new tests that specifically test the previously faulty behavior or not since testing it would be a bit hacky with it only working when issue data is empty. Please let me know :) |
Sounds good.
Please write a test / tests to test the previously faulty behavior. It should be easy to achieve: Use the setter for issue data to set empty issue data. Then lock issues. This way, reading the actual issue data again should be prevented, if I am not mistaken. |
Small issue here. I tried to reproduce the error you are describing as it very reasonable to me and I want to figure out the best way to test against it. Unfortunately, I was not able to reproduce it at all. The timestamps used in |
Hm, I am pretty sure that this issue resulted from non-existent (a.k.a. empty) data. However, the order of data sources makes a difference in how Let's discuss that in our next meeting. I can show you the context in which I encountered the issue, and then we can discuss how we can reproduce this in our tests. |
Up until now, `get.data.cut.to.same.date(data.sources = c("issues", "mails", "commits"))` failed if some of the first data source was empty, but not if the second one was empty. The reason was that `NA` values introduced by empty data sources at the beginning of the data frame turned the data frame into a data frame of numeric objects instead of POSIXct objects. If there were already POSIXct objects in the data frame, this did not happen. To prevent the timestamps to be interpreted as numeric values, make sure that the `NA` values are always POSIXct objects. This fixes se-sic#269. Signed-off-by: Thomas Bock <[email protected]>
This test fails without the previous fix by Thomas Bock but does not fail when the fix is in place. This works towards fixing se-sic#269. Signed-off-by: Maximilian Löffler <[email protected]>
This test fails without the previous fix by Thomas Bock but does not fail when the fix is in place. This works towards fixing se-sic#269. Signed-off-by: Maximilian Löffler <[email protected]>
This test fails without the previous fix by Thomas Bock but does not fail when the fix is in place. This works towards fixing se-sic#269. Signed-off-by: Maximilian Löffler <[email protected]>
Up until now, `get.data.cut.to.same.date(data.sources = c("issues", "mails", "commits"))` failed if some of the first data source was empty, but not if the second one was empty. The reason was that `NA` values introduced by empty data sources at the beginning of the data frame turned the data frame into a data frame of numeric objects instead of POSIXct objects. If there were already POSIXct objects in the data frame, this did not happen. To prevent the timestamps to be interpreted as numeric values, make sure that the `NA` values are always POSIXct objects. This fixes se-sic#269. Signed-off-by: Thomas Bock <[email protected]>
This test fails without the previous fix by Thomas Bock but does not fail when the fix is in place. This works towards fixing se-sic#269. Signed-off-by: Maximilian Löffler <[email protected]>
Up until now, `get.data.cut.to.same.date(data.sources = c("issues", "mails", "commits"))` failed if some of the first data source was empty, but not if the second one was empty. The reason was that `NA` values introduced by empty data sources at the beginning of the data frame turned the data frame into a data frame of numeric objects instead of POSIXct objects. If there were already POSIXct objects in the data frame, this did not happen. To prevent the timestamps to be interpreted as numeric values, make sure that the `NA` values are always POSIXct objects. This fixes se-sic#269. Signed-off-by: Thomas Bock <[email protected]>
This test fails without the previous fix by Thomas Bock but does not fail when the fix is in place. This works towards fixing se-sic#269. Signed-off-by: Maximilian Löffler <[email protected]>
Description
project.data$get.data.cut.to.same.date(data.sources = c("issues", "mails", "commits"))
sometimes fails if some of the data sources are empty, ending up in the following error message:This error occurs if the issue data is the empty data source, but not if the mail data is the empty data source, which is an unexpected behavior.
After some debugging, I noticed that the error is caused by wrong data types when converting strings to POSIXct dates: Usually, if the date is not a POSIXct object, it is converted to POSIXct when splitting the data:
coronet/util-split.R
Lines 69 to 70 in 24005e4
However, this fails if the bins (i.e., the timestamps used for cutting) are not a string but a numeric (such as UNIX timestamps). Therefore, we need to look at where the timestamps for cutting come from: They are extracted in the function
extract.timestamps
:coronet/util-data.R
Lines 724 to 758 in 24005e4
If no data is available for the specified data source, the corresponding timestamps are set to
NA
. For whatever reason, this converts the data frame used for cutting into a numeric data frame of UNIX timestamps if the first row containsNA
values. Otherwise, if there are already POSIXct objects in there, the insertedNA
values are interpreted to be POSIXct objects.Suggested Fix
Instead of setting the timestamps to
NA
in case of missing data sources, set the timestamps toas.POSIXct(NA)
. This way, we ensure that the data type is POSIXct even if theNA
values are the very first ones that are inserted into the timestamps data frame.Versions
This affects, at least, coronet version 4.4 in combination with R version 4.4.1. Also earlier coronet versions or R versions may be affected by this bug.
The text was updated successfully, but these errors were encountered: