Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpt.meanvar return cost value? #49

Open
tdhock opened this issue Oct 16, 2020 · 3 comments
Open

cpt.meanvar return cost value? #49

tdhock opened this issue Oct 16, 2020 · 3 comments

Comments

@tdhock
Copy link

tdhock commented Oct 16, 2020

hey again @rkillick I'm using cpt.meanvar in class next week and I noticed that it can sometimes return segment variance of zero,

> changepoint::cpt.meanvar(c(0,0,4,5), penalty="Manual", method="PELT", pen.value=0)@param.est
$mean
[1] 0.0 4.5

$variance
[1] 0.00 0.25

I assume you are minimizing the negative log likelihood is that correct? In that case the cost of this model should be -Inf, right? Would it be possible to return the cost value, please? (it would be helpful)

In this case the variance is estimated as zero because there are two consecutive data points which have the same value.
I notice that you enforce minseglen=2 -- is this an effort to avoid segments of zero variance? i.e. only allow models which are "well-defined" in the sense that they have a finite log likelihood value?
If so you may consider an adaptive approach, by either using a run-length encoding/weights prior to running the algo OR by not allowing segments of zero variance during the algo.

FYI I used PELT above but the problem seems to affect SegNeigh as well.

@rkillick
Copy link
Owner

rkillick commented Nov 2, 2020

To avoid calculating variances on segments which do not vary there is a catch in the code whereby if a negative variance is estimated then the returned variance is replaced with a value very close to 0 (just within machine precision). This is to avoid a division by 0 in the likelihood - as you point out.

The minseglen=2 minimum is because we are estimating two parameters per segment, the mean and the variance. Thus we need at least two data points to do this.

The cost function is the same for all search methods so this would affect them all.

@rkillick
Copy link
Owner

rkillick commented Mar 8, 2022

Thinking about this more there are legitimate reasons to have a segment with a variance of 0, where all values are equal. This could be for a small length, in which case you likely don't want it identified as a changepoint but could also be longer lengths (I'm looking at some data which is rounded and so we do get runs of the same value). I personally think that this should be handled by the user and not the algorithm. Therefore I think a warning by the changepoint methods when we have sequential observations with the same value should suffice.

@tdhock
Copy link
Author

tdhock commented Mar 8, 2022

I agree that a warning would be a step in the right direction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants