Fit of continuous data to discrete distributions should return an error #120

piklprado · 2016-02-09T23:56:09Z

Most of the functions that fit discrete sads proceed to fitting even when there is non-integer values in the data. I got fits to data with continuous values from fitls, fitpower and fitmzsm. Fits from the other discrete sads did not converge in the tests I've done, but return an error from mle2, showing that they proceed to fitting. Only fitpoilog stops when there is any continuous value in the data, which is an error-checking from poilog::fitpoilog

> distr("poilog")
[1] "discrete"
> x1 <- c(rpoilog(1000, 1.5, 1), 1.1)
> x1 <- x1[x1>0]
> fitpoilog(x1) ## error: "all n must be integers"
Error in dpoilog(un, z[1], exp(z[2])) (from fitpoilog.R#7) : all n must be integers

Which makes sense to me.

The text was updated successfully, but these errors were encountered:

piklprado · 2016-02-10T00:08:46Z

Some details about the tests I ran (script here ):

fitls and fitpower

The density functions for these sads models correctly outputs zero for non-integer values, making the log-likelihood = -Inf. Still, the Brent's method use in these two functions do return a fit:

> x1 <- c(rls(100, N=1000, 10), 1.18)
> fitls(x1) ## fit with LogLik=-Inf and issues warnings about non-integer values
Maximum likelihood estimation
Type: discrete  species abundance distribution
Species: 101 individuals: 2368.18 

Call:
mle2(minuslogl = function (N, alpha) 
-sum(dls(x, N, alpha, log = TRUE)), start = list(alpha = 21.4238153469672), 
    method = "Brent", fixed = list(N = 2368.18), data = list(
        x = list(1, 75, 3, 4, 1, "etc")), lower = 0, upper = 101L)

Coefficients:
      N   alpha 
2368.18  101.00 

Log-likelihood: -Inf 
There were 50 or more warnings (use warnings() to see the first 50)
> warnings()[1]
Mensagem de aviso:
In dls(x, N, alpha, log = TRUE) : non integer values in x

fitmzsm

The density function incorrectly returns non-zero to continuous values (#119 ) and thus fits data with continuous values returns a numeric log-likelihood:

> x1 <- c(rmzsm(999, 1000, 20), 1.18)
> fitmzsm(x1) ## fit with LogLik!=-Inf and issues warnings about non-integer values.
Maximum likelihood estimation
Type: discrete  species abundance distribution
Species: 1000 individuals: 14280.18 

Call:
mle2(minuslogl = function (J, theta) 
-sum(dmzsm(x, J = J, theta = theta, log = TRUE)), start = list(
    theta = 1000L), method = "Brent", fixed = list(J = 14280.18), 
    data = list(x = list(24, 2, 10, 2, 35, "etc")), lower = 0.001, 
    upper = 1000L)

Coefficients:
         J      theta 
14280.1800   242.8042 

Log-likelihood: -3380.22 
> warnings()[1]
Mensagem de aviso:
In dls(x, N, alpha, log = TRUE) : non integer values in x

fitgeom, fitpowbend, fitnbinom, fitvolkov

At least in my tests did not fit because of convergence problems.

andrechalom · 2016-02-11T16:46:10Z

OK, what we need to decide is how to deal with this in a coherent fashion. I believe that all fitting procedures should return an error if invalid data is entered, but the problem is: what is invalid data? Non-integer numbers for discrete fits are invalid, that's fine, but also negative numbers are invalid for all distributions, and still they fit (with ll=-Inf ):

> fitls(x = c(-1, moths))
(...)
Coefficients:
    N alpha 
15608   241 

Log-likelihood: -Inf

This is particularly troubling for rad fits, because as they are converted to ranks, no checking at all is done and the fit seems valid:

> fitzipf(x = c(-1, moths))
(...)
Coefficients:
         N          s 
240.000000   1.034841 

Log-likelihood: -65008.43

We can add a check to all fitting functions to make sure x is positive; also integer if the distribution is discrete. Are we overlooking some other case of invalid data?

andrechalom · 2016-02-11T17:53:57Z

[Related: #101 and #18: can we make some way to automatically discard zeros? How do zero counts relate to parametric diversity indexes? ]

piklprado · 2016-05-01T02:13:14Z

To check that x>0 for all models and x is integer for discrete models seems enough to version 0.3. We left the zero issue (#101 and #18 ) for version 1.0.0

piklprado added the enhancement label Feb 9, 2016

piklprado added this to the sads 0.3.0 milestone Feb 9, 2016

andrechalom added a commit that referenced this issue Feb 11, 2016

Checks for positive integers (fixed #120)

02f09a6

piklprado closed this as completed May 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fit of continuous data to discrete distributions should return an error #120

Fit of continuous data to discrete distributions should return an error #120

piklprado commented Feb 9, 2016

piklprado commented Feb 10, 2016

andrechalom commented Feb 11, 2016

andrechalom commented Feb 11, 2016

piklprado commented May 1, 2016

Fit of continuous data to discrete distributions should return an error #120

Fit of continuous data to discrete distributions should return an error #120

Comments

piklprado commented Feb 9, 2016

piklprado commented Feb 10, 2016

fitls and fitpower

fitmzsm

fitgeom, fitpowbend, fitnbinom, fitvolkov

andrechalom commented Feb 11, 2016

andrechalom commented Feb 11, 2016

piklprado commented May 1, 2016