-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
larger training set deviance for smaller values of reg_lambda: bug in convergence criterion? #65
Labels
Comments
The last plot just looks like a zoomed version of the first plot to me. Did you try other metrics like RMSE or so? |
pavanramkumar
changed the title
convergence bug?
larger training set deviance for smaller values of reg_lambda: bug?
May 11, 2016
pavanramkumar
changed the title
larger training set deviance for smaller values of reg_lambda: bug?
larger training set deviance for smaller values of reg_lambda: bug in convergence criterion?
May 11, 2016
renaming a few issue titles for clarity |
I think this is related to #226. You just need to use a smaller |
1 task
@pavanramkumar @hugoguh this seems to be fixed now with latest code. Just run this: import numpy as np
import scipy.sparse as sps
from sklearn.preprocessing import StandardScaler
import scikits.bootstrap as boot
from pyglmnet import GLM, simulate_glm
np.random.seed(42)
# create an instance of the GLM class
reg_lambda = np.exp(np.linspace(-10, -3, num=100))
n_samples, n_features = 10000, 100
# coefficients
beta0 = np.random.normal(0.0, 1.0, 1)[0]
beta = sps.rand(n_features, 1, 0.1)
beta = np.array(beta.todense())[:, 0]
# training data
Xr = np.random.normal(0.0, 1.0, [n_samples, n_features])
yr = simulate_glm('poisson', beta0, beta, Xr)
# testing data
Xt = np.random.normal(0.0, 1.0, [n_samples, n_features])
yt = simulate_glm('poisson', beta0, beta, Xt)
# fit Generalized Linear Model
scaler = StandardScaler().fit(Xr)
dev_t = list()
for rl in reg_lambda:
glm = GLM(distr='poisson', verbose=True, alpha=0.95, reg_lambda=rl)
glm.fit(scaler.transform(Xr), yr)
dev_t.append(glm.score(scaler.transform(Xr), yr))
import matplotlib.pyplot as plt
upto = 60
plt.plot(np.log(reg_lambda[0:upto]), dev_t[0:upto], '-o', c='k')
plt.xlabel('log(Lambda)')
plt.ylabel('Poisson Deviance') I get: can you close if you agree? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I noticed it while working on cvpyglmnet:
If you notice there is a weird behavior on the training performance: the training set deviance is supposed to always go down as
reg_lambda
approaches zero (or log(Lambda) becomes more negative).Here’s some code for you to replicate it:
doesn’t happen always, for instance try using
np.random.seed(0)
fit and compute deviance
using average deviance, just a scaler diference but better measure when cross-validating because different folds might have different number of elements, use the commented line for deviance
now plot
Here’s the output:
play around with the range of
reg_lambdas
and also try with otherrandom.seed
(different simulatedXr
andyr
).Any idea on where it is coming from? At first I thought it was a warm start effect but @pavanramkumar says it starts by fitting the larger
reg_lambdas
.This might give a hint, it seems to not depend on the actual value of
reg_lambda
: when it happens it seems to always happen towards the end, see if I use a slightly different range ofreg_lambdas
when instantiating the model:The text was updated successfully, but these errors were encountered: