Weird results with trueLL #5

piklprado · 2014-07-16T00:58:40Z

trueLL is supposed to provide a fair comparison between likelihoods of discrete and continuous distributions. For Fisher's moth data data however, I got a weird result with trueLL=TRUE: a truncated lognormal model has lower AIC than a Poisson-lognormal model, but diagnostic plots suggest that poilog fits better:

moths.ls <- fitsad(moths, "ls")
moths.pln <- fitsad(moths, "poilog")
moths.ln1 <- fitsad(moths, "lnorm", trunc=0.5)
moths.ln2 <- fitsad(moths, "lnorm", trunc=0.5, trueLL=F)
AICctab(moths.ls, moths.pln, moths.ln1, moths.ln2,
        nobs=length(moths), base=T, weights=T)

The model selection table:

          AICc   dAICc  df weight
moths.ln1 2175.0    0.0 2  0.44  
moths.pln 2176.2    1.2 2  0.24  
moths.ln2 2176.8    1.8 2  0.18  
moths.ls  2177.4    2.5 1  0.13

and the plot:

plot(octav(moths))
lines(octavpred(moths.pln))
lines(octavpred(moths.ln1), col="red")
legend("topright", c("Poilog", "Lognormal"), 
       lty=1, pch=1, col=c("blue", "red"))

piklprado · 2015-03-27T10:46:20Z

I conservatively removed trueLL argument from fitting functions. It will be available in AIC methods, but not as default untill this issue is not solved.

andrechalom · 2015-04-19T23:16:20Z

Not exactly related to this issue, but the man page of the trueLL function has a marker for a merge conflict:

<<<<<<< HEAD
> trueLL(x, "lnorm", coef=list(meanlog=mean(log(x)), sdlog=sd(log(x))),
> dec.places=1, )
> Data in classes
> xoc <- octav(x)
> xc <- as.numeric(as.character(xoc$octave))
> xb <- 2^(c(min(xc)-1, xc))
> xh <- hist(x, breaks=xb, plot=FALSE)
> xll <- trueLL(x, dens="lnorm", breaks = xb, counts = xoc$Freq,
>    coef = list(meanlog=mean(log(x)), sd=sd(log(x))))
> xp <- diff(plnorm(xh$breaks, mean(log(x)), sd(log(x))))
> xll2 <- sum( rep(log(xp), xh$counts))
> all.equal(xll, xll2) # should be TRUE
> =======
> trueLL(x, "lnorm", coef=list(meanlog=mean(log(x)), sdlog=sd(log(x))), dec.places=1)
> >>>>>>> provisorio

Also: the examples in fitsad.Rd are using the "dec.places" argument in the current version, I'm commenting them out in the development branch until dec.places is supported.

piklprado · 2015-05-07T04:44:46Z

Conflict solved and I removed the commented lines by now.

piklprado · 2015-05-21T21:35:03Z

Before implementing the method in AIC, we need to understand the weird result itself.

andrechalom · 2015-05-22T20:10:59Z

The decision of where to cut the underlying distribution is very problematic. I don't see any theoretical reason to chose between three alternatives:

at the midpoints between the integers (so that "5" individuals for example are accounted for in the interval [4.5, 5.5])
at the integers, from the left("5" is accounted in [5, 6])
at the integers, from the right("5" is accounted in [4,5])

The first seems a bit more natural, but this choice is perfectly arbitrary. However, the impacts of chosing each alternative are huge:

> l <- fitlnorm(moths, trunc=0.5)
> trueLL(moths, "lnorm", as.list(coef(l)), trunc=0.5) # alternative 1
[1] -1085.468
> trueLL(moths+0.5, "lnorm", as.list(coef(l)), trunc=0.5) # alternative 2
[1] -1102.392
> trueLL(moths-0.5, "lnorm", as.list(coef(l)), trunc=0.5) # alternative 3
[1] -1095.544

In this light, I don't think it's advisable to use the trueLL with counting data unless we can find a more firm theoretical grounding.

andrechalom · 2015-05-22T22:23:36Z

I rewrote the code on trueLL, along with the man pages, on branch https://github.com/andrechalom/sads/tree/trueLL. I believe the weird results are caused by the extreme sensitivity of the trueLL to the break points used. Sometimes, the breaks get "lucky" and end up providing a smaller nLL, but most of the time they provide a very large increase in nLL.

In my opinion, what we should do is:

Keep trueLL as it is, with a large warning in the man page and maybe in the vignette,
Do not include it in AIC, AICtab or fitsad methods, even as non-default,
Maybe write a AICt function to give AIC based on trueLL.

piklprado · 2015-05-29T02:11:05Z

Agree that we need more theoretical ground to use trueLL. So I'd be rather more conservative and move trueLL + man page to a branch from dev and remove this issue from the milestone.

andrechalom · 2015-05-29T15:32:49Z

So, trueLL should be kept just on a branch? For the released package, should we remove trueLL methods?

piklprado · 2015-05-29T15:41:44Z

Yes. Thinking that a package encapsulates an analytical workflow, I can't
see how trueLL fits in the package in the current state. What do you think?
Em 29/05/2015 12:32, "andrechalom" [email protected] escreveu:

So, trueLL should be kept just on a branch? For the released package,
should we remove trueLL methods?

—
Reply to this email directly or view it on GitHub
#5 (comment).

andrechalom · 2015-12-11T17:46:11Z

I have updated the branch trueLL to incorporate all the changes done so far in the package.

andrechalom · 2016-01-04T20:16:17Z

A couple of updates:

1- The weird behaviour of trueLL might be related to truncated continuous distributions. Compare the weird results above with this fit with no truncation:

> fit <- fitlnorm(moths)
> logLik(fit)
'log Lik.' -1097.723 (df=2)
> trueLL(fit)
[1] -1097.779

The diff here is around 0.05; in contrast, the 0.5 truncated version has almost 1 point of divergence. Other data sets show the same behavior.

Also, the "third alternative" mentioned in the 22 May 2015 comment is actually meaningless for this data, as it involves the calculation of probability densities below the truncation point: as D/2 = 0.5 for dec.places=0 (the default), moths-0.5 contains some data points as 0.5. The trueLL for these data is the integral from 0 to 1; but the distribution is truncated at 0.5, which is larger than the initial value of 0. The following graph shows how bizarre is the trueLL for truncated moth fits in which the truncation point is larger than D/2=0.5:

I am adding a check on trueLL to guarantee that the smallest x-D/2 is larger than the truncation point.

2- The results for trueLL, irrespective of truncation, are extremely sensitive to dec.places. The values of logLik for fitting moths range from around -1085 for sensible distributions to -1150 for fitgamma/fitweibull. The value of trueLL for increasing dec.places drops astonishingly fast, reaching -2200 for dec.places=2 (or D/2 = 5e-3). It may be remarkable that this drop forms a straight line when plotted against dec.places:

piklprado added the question label Jul 16, 2014

andrechalom added a commit that referenced this issue Apr 20, 2015

Commented out some example code, see issue #5

da4bf79

andrechalom added this to the sads 0.2.0 milestone May 22, 2015

andrechalom mentioned this issue May 26, 2015

True LL #51

Merged

andrechalom removed this from the sads 0.2.0 milestone May 29, 2015

andrechalom added this to the sads 0.3.0 milestone Jun 13, 2015

andrechalom added the help wanted label Oct 9, 2015

piklprado modified the milestones: sads 1.0.0, sads 0.3.0 Nov 3, 2015

piklprado mentioned this issue Dec 30, 2015

Distributions to include #7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weird results with trueLL #5

Weird results with trueLL #5

piklprado commented Jul 16, 2014

piklprado commented Mar 27, 2015

andrechalom commented Apr 19, 2015

piklprado commented May 7, 2015

piklprado commented May 21, 2015

andrechalom commented May 22, 2015

andrechalom commented May 22, 2015

piklprado commented May 29, 2015

andrechalom commented May 29, 2015

piklprado commented May 29, 2015

andrechalom commented Dec 11, 2015

andrechalom commented Jan 4, 2016

Weird results with trueLL #5

Weird results with trueLL #5

Comments

piklprado commented Jul 16, 2014

piklprado commented Mar 27, 2015

andrechalom commented Apr 19, 2015

piklprado commented May 7, 2015

piklprado commented May 21, 2015

andrechalom commented May 22, 2015

andrechalom commented May 22, 2015

piklprado commented May 29, 2015

andrechalom commented May 29, 2015

piklprado commented May 29, 2015

andrechalom commented Dec 11, 2015

andrechalom commented Jan 4, 2016