We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No description provided.
The text was updated successfully, but these errors were encountered:
你好,我想问一下,为什么书中的梯度下降算法将T轮迭代的均值作为输出?实际中不是以wT作为最终结果吗?
Sorry, something went wrong.
感谢你的提问 @pppooo332 ,这是因为在凸函数的梯度下降时,我们设定的步长$\eta$是启发式的,因此每次迭代产生的$\omega'$无法保证是局部最优解。考虑到定理7.1的结论,$T$轮迭代的$\omega$均值具有次线性收敛率,而我们却无法证明最后一次迭代值$\omega_T$也具有与之相较的收敛率。总之,返回$\omega$的均值可能会提高计算的代价,但却可以确保稳定的收敛率。该思想在7.3.1和7.3.2中梯度下降算法中亦有体现。
作为对比,在7.2.2中强凸函数的梯度下降算法中,我们只输出了最后一次迭代值$\omega_T$。这是因为在强凸函数的条件下,每次迭代的梯度更新均有闭式解:$\omega_{t+1}=\omega_t-\frac{1}{\gamma}\nabla f(\omega_t)$。每次迭代无需任何启发式算法就可以得到该临域的全局最优解,这也是此算法拥有更快收敛率(线性收敛率)的原因。因而,无需返回历史$\omega$的均值。
No branches or pull requests
No description provided.
The text was updated successfully, but these errors were encountered: