-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Splitting behavior: What to do with elements that occurred before the predefined splitting bins? #267
Comments
We need to figure out in which situations the elements before the first bin are actually needed. This is the key question to this problem, and I don't have an answer to this question. If there are no such situations, we could potentially remove them. Otherwise we need to find a way to keep them. But first we need to figure out whether there is any use case in which the elements before the first bin are relevant. Originally posted by @bockthom in #263 (comment) |
This change is in-line with the current treatment of the elements that occur after the last bin. Further, this fixes a minor "bug" where bin indecees are used to determine bin labels by indexing into a list of labels. Elements before the first bin received bin index 0, however, R is 1-indexed. Note: This "bug" did not cause problems because the bin with the incorrect label will be ignored later anyways. Works towards se-sic#267. Signed-off-by: Maximilian Löffler <[email protected]>
This change is in-line with the current treatment of the elements that occur after the last bin. Further, this fixes a minor "bug" where bin indecees are used to determine bin labels by indexing into a list of labels. Elements before the first bin received bin index 0, however, R is 1-indexed. Note: This "bug" did not cause problems because the bin with the incorrect label will be ignored later anyways. Works towards se-sic#267. Signed-off-by: Maximilian Löffler <[email protected]>
This change is in-line with the current treatment of the elements that occur after the last bin. Further, this fixes a minor "bug" where bin indecees are used to determine bin labels by indexing into a list of labels. Elements before the first bin received bin index 0, however, R is 1-indexed. Note: This "bug" did not cause problems because the bin with the incorrect label will be ignored later anyways. Works towards se-sic#267. Signed-off-by: Maximilian Löffler <[email protected]>
This change is in-line with the current treatment of the elements that occur after the last bin. Further, this fixes a minor "bug" where bin indecees are used to determine bin labels by indexing into a list of labels. Elements before the first bin received bin index 0, however, R is 1-indexed. Note: This "bug" did not cause problems because the bin with the incorrect label will be ignored later anyways. Works towards se-sic#267. Signed-off-by: Maximilian Löffler <[email protected]>
This change is in-line with the current treatment of the elements that occur after the last bin. Further, this fixes a minor "bug" where bin indecees are used to determine bin labels by indexing into a list of labels. Elements before the first bin received bin index 0, however, R is 1-indexed. Note: This "bug" did not cause problems because the bin with the incorrect label will be ignored later anyways. Works towards se-sic#267. Signed-off-by: Maximilian Löffler <[email protected]>
This change is in-line with the current treatment of the elements that occur after the last bin. Further, this fixes a minor "bug" where bin indecees are used to determine bin labels by indexing into a list of labels. Elements before the first bin received bin index 0, however, R is 1-indexed. Note: This "bug" did not cause problems because the bin with the incorrect label will be ignored later anyways. Works towards se-sic#267. Signed-off-by: Maximilian Löffler <[email protected]>
This issue has been detected by @MaLoefUDS in PR #263. In the following, I quote two comments that @MaLoefUDS had posted in PR #263:
Original posts by @MaLoefUDS:
[...] I have some nice insight on the issue regarding the splitting in
showcase.R
that "breaks" which @Leo-Send found here:First of all, this does have nothing to do with the selected data source or if commit interactions are present (as Leo already mentioned). Interestingly this broken application has been in
showcase.R
for a really long time. I guess nobody really cared because the only use of the split data is plotting it once, which does not break directly.The reason for the break is that for some data sources, some of their elements are before the predefined bins by which we split. When we split by given bins, we use the
base::findInterval
method to separate all data elements into the corresponding bins they belong, i.e., it returns a vector of integers, of which each integer describes the index of the bin in which we should place an item (e.g.,1 1 1 2 2 3 3 3 3 3 6
). Then we split according to this vector and rename all splits according to the datestring of the bin.However, the
findInterval
function returns0
for datapoints that do not lie in any bin (i.e., before the first bin). Consequently, when trying to find the name for said split, we index into the vector of bin-labels with 0 resulting incharacter(0)
. This in the end leads to the breakage observed. This might actually require a fix from our side. I believe, spitting should not break like this when some elements are not in any bin.Originally posted by @MaLoefUDS in #263 (comment)
Regarding fixing this problem of broken splitting which may or may not be the sole cause of Leo's problems, I have two possible fixes in mind.
But first of: Currently we do not have any problems with data elements that lie beyond the last bin. The bins we provide are always just the first date bound. The next following bin is then the last date bound, while also being the first date of its own new bin. In the current implementation, the last bin is then holds all elements from this point on until infinity. Obviously, we cannot simply include all elements before the first bin into the first bin as well, as this would lead to unintuitive behavior (i.e., what would it mean to provide only one bin when splitting?).
Here are my ideas for fixes:
0
s.character(0)
towhatever
.I don't have any preference on this, since I am not sure which idiomatic values of coronet are at stake 😅
Originally posted by @MaLoefUDS in #263 (comment)
The text was updated successfully, but these errors were encountered: