Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: __init__() got an unexpected keyword argument 'normalize' #182

Closed
Y-Isaac opened this issue Feb 29, 2024 · 6 comments · Fixed by #183
Closed

TypeError: __init__() got an unexpected keyword argument 'normalize' #182

Y-Isaac opened this issue Feb 29, 2024 · 6 comments · Fixed by #183

Comments

@Y-Isaac
Copy link

Y-Isaac commented Feb 29, 2024

HI,

When I use polyfun.py to re-estimate per-SNP heritabilities via S-LDSC, there have a error:

[INFO] Reading summary statistics from /public/home/P202306/polyfun_test/summary/pheno1_munged.parquet ...
[INFO] Read summary statistics for 13087844 SNPs.
[INFO] Reading reference panel LD Score from /public/home/P202306/polyfun_test/output/pheno1/pheno1.[1-22] ...
[INFO] Read reference panel LD Scores for 13156184 SNPs.
[INFO] Reading regression weight LD Score from /public/home/P202306/polyfun_test/ldscore/weight/[1-22] ...
[INFO] Read regression weight LD Scores for 13156184 SNPs.
[INFO] After merging with reference panel LD, 13087844 SNPs remain.
[INFO] After merging with regression SNP LD, 13087844 SNPs remain.
[INFO] Removed 183 SNPs with chi^2 > 431.334 (13087661 SNPs remain)
Traceback (most recent call last):
File "/public/home/P202306/software/polyfun/polyfun.py", line 849, in
polyfun_obj.polyfun_main(args)
File "/public/home/P202306/software/polyfun/polyfun.py", line 780, in polyfun_main
self.compute_h2_bins(args, constrain_range=True)
File "/public/home/P202306/software/polyfun/polyfun.py", line 757, in compute_h2_bins
self.run_ldsc(args, use_ridge=False, nn=True, evenodd_split=True, keep_large=False)
File "/public/home/P202306/software/polyfun/polyfun.py", line 217, in run_ldsc
hsqhat = regressions.Hsq(chisq,
File "/public/home/P202306/software/polyfun/ldsc_polyfun/regressions.py", line 401, in init
LD_Score_Regression.init(self, y, x, w, N, M, n_blocks, intercept=intercept,
File "/public/home/P202306/software/polyfun/ldsc_polyfun/regressions.py", line 243, in init
jknife = jk.LstsqJackknifeSlow(x, y, is_large_chi2, n_blocks, evenodd_split=evenodd_split, nn=True, chr_num=chr_num, nnls_exact=nnls_exact)
File "/public/home/P202306/software/polyfun/ldsc_polyfun/jackknife.py", line 267, in init
lasso = Lasso(alpha=1e-100, fit_intercept=False, normalize=False, precompute=xtx, positive=True, max_iter=10000, random_state=0)
TypeError: init() got an unexpected keyword argument 'normalize'

And this is my code, if it's helpful:

python ~/software/polyfun/polyfun.py
--compute-h2-bins
--output-prefix /public/home/P202306/polyfun_test/output/pheno1/pheno1
--sumstats /public/home/P202306/polyfun_test/summary/pheno1_munged.parquet
--w-ld-chr /public/home/P202306/polyfun_test/ldscore/weight/

@jdblischak
Copy link
Contributor

lasso = Lasso(alpha=1e-100, fit_intercept=False, normalize=False, precompute=xtx, positive=True, max_iter=10000, random_state=0)
TypeError: init() got an unexpected keyword argument 'normalize'

It looks like the function Lasso no longer has the argument normalize. This is likely a version difference. Could you please try running your code in the locked conda env, polyfun.yml.lock, that is known to work with the polyfun scripts:

mamba create --name polyfun --file polyfun.yml.lock
conda activate polyfun

@Y-Isaac
Copy link
Author

Y-Isaac commented Mar 1, 2024

@jdblischak HI,

Following your advice, I configured the environment based on the file polyfun.yml.lock. Since mamba is not available on my server, I used the following command to create the environment, hoping it would serve the same purpose:

conda create --name polyfun-lock --file polyfun.yml.lock

Unfortunately, this method doesn't seem to have worked, as I encountered the same error message again. I tried checking the version of the sklearn package in both environments. In polyfun-lock, the version is 1.2.2, while in the polyfun environment, it's 1.3.2, yet they both produced the same error.

All in all, thank you very much for your help!

@Y-Isaac
Copy link
Author

Y-Isaac commented Mar 1, 2024

I attempted to remove the normalize=False parameter from lines 267 and 295 in the jackknife.py script (I'm not certain this was a reliable action, it was only a trial), and this time it worked. I reviewed the prior probability results for chromosome 22, which range from 7.19e-7 to 9.09e-9, from highest to lowest. Compared to the example file, this result seems to be normal.

I eagerly look forward to your guidance on what I should do next. I hope you have a pleasant day!

@jdblischak
Copy link
Contributor

I investigated the argument normalize. Turns out it was deprecated in scikit-learn 1.0.0!

API Change : The parameter normalize of linear_model.LinearRegression is deprecated and will be removed in 1.2. Motivation for this deprecation: normalize parameter did not take any effect if fit_intercept was set to False and therefore was deemed confusing. The behavior of the deprecated LinearModel(normalize=True) can be reproduced with a Pipeline with LinearModel (where LinearModel is LinearRegression, Ridge, RidgeClassifier, RidgeCV or RidgeClassifierCV) as follows: make_pipeline(StandardScaler(with_mean=False), LinearModel()). The normalize parameter in LinearRegression was deprecated in #17743 by Maria Telenczuk and Alexandre Gramfort. Same for Ridge, RidgeClassifier, RidgeCV, and RidgeClassifierCV, in: #17772 by Maria Telenczuk and Alexandre Gramfort. Same for BayesianRidge, ARDRegression in: #17746 by Maria Telenczuk. Same for Lasso, LassoCV, ElasticNet, ElasticNetCV, MultiTaskLasso, MultiTaskLassoCV, MultiTaskElasticNet, MultiTaskElasticNetCV, in: #17785 by Maria Telenczuk and Alexandre Gramfort.

So now I am confused why this hasn't been caught before. We test polyfun.py --compute-h2-bins

polyfun/test_polyfun.py

Lines 144 to 146 in 00afe71

#polyfun stage 4
cmd = '%s %s --compute-h2-bins --output-prefix %s --sumstats %s --w-ld-chr %s --nnls-exact'% \
(python3_exe, script_exe, output_prefix, sumstats_file, w_ld_prefix)

Ah, it's because of the flag --nnls-exact used in the test. That bypasses the call to Lasso():

#estimate taus
if nn: # non-negative least squares
if nnls_exact:
self.est = np.atleast_2d(nnls(x, np.array(y).T[0])[0])
else:
xtx = x.T.dot(x)
lasso = Lasso(alpha=1e-100, fit_intercept=False, normalize=False, precompute=xtx, positive=True, max_iter=10000, random_state=0)
self.est = lasso.fit(x,y[:,0]).coef_.reshape((1, x.shape[1]))
else:
self.est = np.atleast_2d(np.linalg.lstsq(x, np.array(y).T[0])[0])

if nnls_exact:
jk_est = np.atleast_2d(nnls(x_noblock, y_noblock[:,0])[0])
else:
x_block = x[s[i] : s[i+1]]
xtx_noblock = xtx - x_block.T.dot(x_block)
lasso_noblock = Lasso(alpha=1e-100, fit_intercept=False, normalize=False, precompute=xtx_noblock, positive=True, max_iter=10000, random_state=0)
jk_est = lasso_noblock.fit(x_noblock, y_noblock[:,0]).coef_.reshape((1, x.shape[1]))
###z = nnls(x_noblock, y_noblock[:,0])[0]
###assert np.allclose(z, jk_est[0])

@Y-Isaac
Copy link
Author

Y-Isaac commented Mar 2, 2024

@jdblischak ohh, I get it, thanks for your help! Now I'm going to close this issue.

@Y-Isaac Y-Isaac closed this as completed Mar 2, 2024
@omerwe
Copy link
Owner

omerwe commented Mar 3, 2024

@Y-Isaac thanks for flagging this! I've accepted pull request #183, so the problem should be fixed for everyone now (thanks @jdblischak!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants