The ML@B Deep Learning Reading Group is organized by William Guss and Professor {TBD sponsoring professor} in aims to educate and inform its members in the state of the art in deep learning. The group meets two hours a week at in {TBD} and is currently limited to {TBD num members} members. The following schedule and suggested readings are open to public, as we hope to act as an aggregator of deep learning research for the UC Berkeley community.
The following are scheduled meetings the following semester and a list of suggested topics. If you plan on presenting (highly encouraged) please select a paper from the list and email wguss [at] berkeley [dot] edu or submit a pull request and add to the table!
(Prior to presenting, please coordinate with Gianna (gianna [at] berkeley [dot] edu) to confirm the order for pastries at least 24 hours in advance, and please pick them up from Brewed Awakening after 10:30am - it's a catering order for Gianna - and bring them to the meeting.)
Date | Presenter | Topic |
Jan 27 |
all |
5-10 minutes on your research (past or present or interests) |
Feb 3 | Yasin |
A Fast and Reliable Policy Improvement Algorithm, Yasin Abbasi-Yadkori, Peter L. Bartlett, and Stephen Wright. Artificial Intelligence and Statistics (AISTATS), 2016. |
Feb 10 | Billy | O. Dekel, R. Eldan, T. Koren. [Bandit Smooth Convex Optimization: Improving the Bias-Variance Tradeoff.](http://tx.technion.ac.il/~tomerk/papers/bco58.pdf) |
Feb 17 | Will | Yann Dauphin, Razvan Pascanu, Çaglar Gülçehre, Kyunghyun Cho, Surya Ganguli, and Yoshua Bengio. [Identifying and attacking the saddle point problem in high-dimensional non-convex optimization.](http://arxiv.org/abs/1406.2572) |
Feb 24 | Ben Rubinstein |
12-1pm Title: Private Bayesian Inference
Abstract: Differential privacy is a leading framework for guaranteeing privacy of data when releasing aggregate statistics or models fit to data. While much is known about privatising many common learning algorithms, and frameworks such as regularised ERM, little work has focused on inference in the Bayesian setting. In that setting, the defender wishes to release a posterior on sensitive data while the untrusted third party is modelled as an adversary wishing to uncover information about the private data given query access to the defender’s release mechanism, full knowledge of the likelihood family, prior, and unbounded computation. I’ll present a natural response mechanism that simply samples from the (non-private) posterior. If either of two assumptions are met, then this mechanism is both robust and differentially private: uniformly-Lipschitz likelihoods, or a prior that concentrates on smooth likelihoods. A selection of results will be presented taken from: bounds on utility, privacy; necessary conditions; examples of common distributions; and specialisation to graphical models and alternate mechanisms which demonstrate the influence of graph structure on privacy. This is joint work with Christos Dimitrakakis, Zuhe Zhang, Katerina Mitrokotsa, Blaine Nelson; papers at ALT’14 (longer version with corrections in submission to JMLR) and AAAI’16.
|
Mar 2 | Aldo |
Learning Polynomials with Neural Networks. Alexandr Andoni, Rina Panigrahy, Gregory Valiant, Li Zhang. Proceedings of the 31st International Conference on Machine Learning, Beijing, China, 2014. JMLR: W&CP volume 32. |
Mar 9 | Alan |
(1 hour: 11-12) COLT paper: Minimax Linear Regression |
Mar 16 |
**-- no meeting -- ** |
|
Mar 23 | **-- no meeting -- ** | |
Mar 30 | Sören | Deep Online Convex Optimization by Putting Forecaster to Sleep. D. Balduzzi. [https://dl.dropboxusercontent.com/u/5874168/doco.pdf](https://dl.dropboxusercontent.com/u/5874168/doco.pdf) |
Apr 6 | Xiang |
Anna Choromanska, Mikael Henaff, Michaël Mathieu, Gérard Ben Arous, and Yann LeCun. The loss surface of multilayer networks. See also:
|
Apr 13 | Thomas | (1 hour: 11-12) Sparse and spurious: dictionary learning with noise and outliers. Rémi Gribonval, Rodolphe Jenatton, Francis Bach. 2014. [https://hal.inria.fr/hal-01025503v3](https://hal.inria.fr/hal-01025503v3) |
Apr 20 | Arturo | [Proximal Algorithms](http://stanford.edu/~boyd/papers/pdf/prox_algs.pdf). N. Parikh and S. Boyd. Foundations and Trends in Optimization, 1(3):123-231, 2014. |
Apr 27 | Niladri | Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization. 2013 - Alekh Agarwal, Animashree Anandkumar, Prateek Jain, Praneeth Netrapalli, and Rashish Tandon. [http://arxiv.org/abs/1310.7991](http://arxiv.org/abs/1310.7991) |
May 4 | **-- no meeting -- ** | |
May 11 | **-- no meeting -- ** | |
May 18 | Walid |
On a Natural Dynamics for Linear Programming. Damian Straszak, Nisheeth K. Vishnoi. http://arxiv.org/abs/1511.07020 |
May 25 |
-
Feedback Stabilization Using Two-Hidden-Layer Nets. Eduardo D. Sontag. IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 3, NO. 6, 981-, 1992.
-
Learning Polynomials with Neural Networks. Alexandr Andoni, Rina Panigrahy, Gregory Valiant, Li Zhang. Proceedings of the 31st International Conference on Machine Learning, Beijing, China, 2014. JMLR: W&CP volume 32.
-
Deep Online Convex Optimization by Putting Forecaster to Sleep. D. Balduzzi. https://dl.dropboxusercontent.com/u/5874168/doco.pdf
-
Provable bounds for learning some deep representations. Sanjeev Arora, Aditya Bhaskara, Rong Ge, and Tengyu Ma. CoRR, abs/1310.6343, 2013.
-
Large-Scale Convex Minimization with a Low-Rank Constraint. Shai Shalev-Shwartz, Alon Gonen, Ohad Shamir. Proceedings of the 28 th International Conference on Machine Learning, Bellevue, WA, USA, 2011.
-
Yann Dauphin, Razvan Pascanu, Çaglar Gülçehre, Kyunghyun Cho, Surya Ganguli, and Yoshua Bengio. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. CoRR, abs/1406.2572, 2014.
-
Anna Choromanska, Mikael Henaff, Michaël Mathieu, Gérard Ben Arous, and Yann LeCun. The loss surface of multilayer networks. CoRR, abs/1412.0233, 2014. See also:
-
Complexity of random smooth functions on the high-dimensional sphere. A. Auffinger and G. Ben Arous. ArXiv e-prints, October 2011. arXiv math.PR 1110.5872.
-
Random Matrices and complexity of Spin Glasses. A. Auffinger, G. Ben Arous, and J. Cerny. ArXiv e-prints, March 2010. arXiv math.PR1003.1129.
-
Moritz Hardt, Benjamin Recht, and Yoram Singer. Train faster, generalize better: Stability of stochastic gradient descent. CoRR, abs/1509.01240, 2015.
-
Matus Telgarsky. Representation benefits of deep feedforward networks. abs/1509.08101, 2015.
-
Pierre Baldi and Peter J Sadowski. Understanding dropout. In C.J.C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 2814–2822. Curran Associates, Inc., 2013.
-
Stefan Wager, Sida Wang, and Percy Liang. Dropout training as adaptive regularization. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 351–359. 2013.
- J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 11(1):19–60, 2010. www.di.ens.fr/~fbach/mairal10a.pdf
see also:
- Alekh Agarwal, Animashree Anandkumar, Prateek Jain, Praneeth Netrapalli, and Rashish Tandon. Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization. 2013. http://arxiv.org/abs/1310.7991
- Sanjeev Arora, Rong Ge, and Ankur Moitra. New Algorithms for Learning Incoherent and Overcomplete Dictionaries. 2013. http://arxiv.org/abs/1308.6273
- Alekh Agarwal, Animashree Anandkumar, and Praneeth Netrapalli. Exact Recovery of Sparsely Used Overcomplete Dictionaries. 2013. http://arxiv.org/abs/1309.1952v2
- Dictionary Learning Algorithms for Sparse Representation. Kenneth Kreutz-Delgado, Joseph F. Murray, Bhaskar D. Rao, Kjersti Engan, Te-Won Lee, and Terrence J. Sejnowski. Neural Comput. 2003 Feb; 15(2): 349–396. doi: 10.1162/089976603762552951
ADMM and Proximal Algorithms:
- Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Foundations and Trends in Machine Learning, 3(1):1–122, 2011. http://stanford.edu/~boyd/papers/pdf/admm_distr_stats.pdf
- Proximal Algorithms. N. Parikh and S. Boyd. Foundations and Trends in Optimization, 1(3):123-231, 2014. http://stanford.edu/~boyd/papers/pdf/prox_algs.pdf
Computational Implications of Reducing Data to Sufficient Statistics. Andrea Montanari. http://arxiv.org/abs/1409.3821
The power of localization for efficiently learning linear separators with malicious noise. Pranjal Awasthi, Maria-Florina Balcan, and Philip M. Long. CoRR, abs/1307.8371, 2013.
Cortical prediction markets. D. Balduzzi. 13th International Conference on Autonomous Agents and Multiagent Systems (AAMAS).https://dl.dropboxusercontent.com/u/5874168/nmarkets.pdf