Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【第7章-收敛率】待推导或待解析公式征集+答疑专区 #8

Open
youngfish42 opened this issue Oct 27, 2020 · 2 comments

Comments

@youngfish42
Copy link
Member

No description provided.

@pppooo332
Copy link

你好,我想问一下,为什么书中的梯度下降算法将T轮迭代的均值作为输出?实际中不是以wT作为最终结果吗?
QQ截图20211012105919

@datawhalechina datawhalechina deleted a comment from DashanGao May 28, 2023
@datawhalechina datawhalechina deleted a comment from zhanhao93 May 28, 2023
@datawhalechina datawhalechina deleted a comment from zhanhao93 May 28, 2023
@zhimin-z
Copy link
Collaborator

zhimin-z commented May 28, 2023

你好,我想问一下,为什么书中的梯度下降算法将T轮迭代的均值作为输出?实际中不是以wT作为最终结果吗? QQ截图20211012105919

感谢你的提问 @pppooo332 ,这是因为在凸函数的梯度下降时,我们设定的步长$\eta$是启发式的,因此每次迭代产生的$\omega'$无法保证是局部最优解。考虑到定理7.1的结论,$T$轮迭代的$\omega$均值具有次线性收敛率,而我们却无法证明最后一次迭代值$\omega_T$也具有与之相较的收敛率。总之,返回$\omega$的均值可能会提高计算的代价,但却可以确保稳定的收敛率。该思想在7.3.1和7.3.2中梯度下降算法中亦有体现。

作为对比,在7.2.2中强凸函数的梯度下降算法中,我们只输出了最后一次迭代值$\omega_T$。这是因为在强凸函数的条件下,每次迭代的梯度更新均有闭式解:$\omega_{t+1}=\omega_t-\frac{1}{\gamma}\nabla f(\omega_t)$。每次迭代无需任何启发式算法就可以得到该临域的全局最优解,这也是此算法拥有更快收敛率(线性收敛率)的原因。因而,无需返回历史$\omega$的均值。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants