Deep Learning Resources

Online Courses

Deep- and Machine-Learning Fora

Reinforcement Learning

Academic Papers and Other Writings

Deep Learning Boook; Yoshua Bengio, Ian Goodfellow, Aaron Courville; MIT Press

Understanding LSTMs; Christopher Olah

Semantic Compositionality through Recursive Matrix-Vector Spaces; Richard Socher, Brody Huval, Christopher D. Manning and Andrew Y. Ng; Computer Science Department, Stanford University

Deep learning of the tissue-regulated splicing code; Michael K. K. Leung, Hui Yuan Xiong, Leo J. Lee and Brendan J. Frey

The human splicing code reveals new insights into the genetic determinants of disease; Hui Y. Xiong et al

Notes on AdaGrad; Chris Dyer; School of Computer Science, Carnegie Mellon University

Adaptive Step-Size for Online Temporal Difference Learning; William Dabney and Andrew G. Barto; University of Massachusetts Amherst

Practical Recommendations for Gradient-Based Training of Deep Architectures; Yoshua Bengio; 2012

Greedy Layer-Wise Training of Deep Networks; Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle; Université de Montreal

Notes on Convolutional Neural Networks; Jake Bouvrie; Center for Biological and Computational Learning, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology

Natural Language Processing (Almost) from Scratch; Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu and Pavel Kuksa; NEC Laboratories America

Unsupervised Feature Learning Via Sparse Hierarchical Representations; Honglak Lee; Stanford University; August 2010

Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations; Honglak Lee, Roger Grosse, Rajesh Ranganath, Andrew Y. Ng; Computer Science Department, Stanford University, Stanford

Deep Belief Networks for phone recognition; Abdel-rahman Mohamed, George Dahl, and Geoffrey Hinton; Department of Computer Science, University of Toronto

Reducing the Dimensionality of Data with Neural Networks; G. E. Hinton and R. R. Salakhutdinov; 28 July 2006 vol. 313 Science

Using Very Deep Autoencoders for Content-Based Image Retrieval; Alex Krizhevsky and Geoffrey E. Hinton; University of Toronto, Dept of Computer Science

Learning Deep Architectures for AI; Yoshua Bengio; Dept. IRO, Université de Montreal

Analysis of Recurrent Neural Networks with Application to Speaker Independent Phoneme Recognition; Esko O. Dijk; University of Twente, Department of Electrical Engineering

A fast learning algorithm for deep belief nets; Geoffrey E. Hinton and Simon Osindero, Department of Computer Science University of Toronto; Yee-Whye Teh, Department of Computer Science, National University of Singapore

Learning Deep Architectures for AI; Yoshua Bengio; Foundations and Trends in Machine Learning, Vol. 2, No. 1 (2009)

An Analysis of Gaussian-Binary Restricted Boltzmann Machines for Natural Images; Nan Wang, Jan Melchior and Laurenz Wiskott; Institut fuer Neuroinformatik and International Graduate School of Neuroscience

IPAM Summer School 2012 Tutorial on: Deep Learning; Geoffrey Hinton; Canadian Institute for Advanced Research & Department of Computer Science, University of Toronto

A Practical Guide to Training Restricted Boltzmann Machines; Geoﬀrey Hinton; Department of Computer Science, University of Toronto

Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent; Feng Niu, Benjamin Recht, Christopher Re and Stephen J. Wright; Computer Sciences Department, University of Wisconsin-Madison

Improved Learning of Gaussian-Bernoulli Restricted Boltzmann Machines; KyungHyun Cho, Alexander Ilin, and Tapani Raiko; Department of Information and Computer Science, Aalto University School of Science, Finland

Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations; Honglak Lee, Roger Grosse, Rajesh Ranganath and Andrew Y. Ng; Computer Science Department, Stanford University

Rectiﬁed Linear Units Improve Restricted Boltzmann Machines; Vinod Nair and Geoﬀrey E. Hinton; Department of Computer Science, University of Toronto

Iris Data Analysis Using Back Propagation Neural Networks; Sean Van Osselaer; Murdoch University, Western Australia

Distributed Training Strategies for the Structured Perceptron; Ryan McDonald, Keith Hall and Gideon Mann; Google

Large Scale Distributed Deep Networks; Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc’Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang and Andrew Y. Ng; Google

Learning meanings for sentences; Charles Elkan; University of California San Diego; 2013

Lecture 17: Linear Gaussian Models; Kevin Murphy; University of British Columbia; 17 November 2004

Efficient Backprop; Yann LeCun, Leon Bottou, Genevieve B. Orr and Klaus-Robert Mueller; various institutions.

Deep Learning for NLP (without magic); Richard Socher and Christopher Manning; Stanford University

Deep Neural Networks for Object Detection; Christian Szegedy, Alexander Toshev and Dumitru Erhan; Google

Deep Learning: Methods And Applications; Li Deng and Dong Yu; Microsoft Research

Numerical Optimization; Jorge Nocedal and Stephen J. Wright; Springer

Neural Networks for Named-Entity Recognition; Richard Socher; Programming Assignment 4, CS 224N; Dec. 5th, 2012

Large Scale Deep Learning; Quoc V. Le; Google & Carnegie Mellon University; MLconf 2013

Deep Learning Made Easier by Linear Transformations in Perceptrons; Tapani Raiko, Harri Valpola and Yann LeCun; Aalto University and New York University

Training Restricted Boltzmann Machines on Word Observations; George E. Dahl, Ryan P. Adams and Hugo Larochelle; University of Toronto, Harvard University and Université de Sherbrooke

Representational Power of Restricted Boltzmann Machines and Deep Belief Networks; Nicolas Le Roux and Yoshua Bengio; Université de Montréal

Robust Boltzmann Machines for Recognition and Denoising; Yichuan Tang, Ruslan Salakhutdinov and Geoffrey Hinton; University of Toronto

Semantic hashing; Ruslan Salakhutdinov and Geoffrey Hinton; Department of Computer Science, University of Toronto

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank; Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng and Christopher Potts; Stanford University

Opinion Mining and Sentiment Analysis; Bo Pang and Lillian Lee; Yahoo Research; Foundations and Trends in Information Retrieval

Sparse autoencoder: CS294A Lecture notes; Andrew Ng; Stanford University

Deep Sparse Rectiﬁer Neural Networks; Xavier Glorot, Antoine Bordes and Yoshua Bengio; University of Montreal

Stochastic Pooling for Regularization of Deep Convolutional Neural Networks; Matthew D. Zeiler and Rob Fergus; Courant Institute, New York University

Symmetry breaking in non-monotonic neural nets; G. Boffetta, R. Monasson and R. Zecchina; Journal of Physics A: Mathematical and General

Phone Recognition Using Restricted Boltzmann Machines; Abdel-rahman Mohamed and Geoffrey Hinton; University of Toronto

Why Does Unsupervised Pre-training Help Deep Learning?; Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre-Antoine Manzagol, Pascal Vincent and Samy Bengio; Université de Montréeal and Google Research

Training Restricted Boltzmann Machines on Word Observations; George E. Dahl, Ryan P. Adams and Hugo Larochelle; University of Toronto, Harvard University and Université de Sherbrooke

Visually Debugging Restricted Boltzmann Machine Training with a 3D Example; Jason Yosinski and Hod Lipson; Cornell University

Efficient Estimation of Word Representations in Vector Space; Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean; Google

Exploiting Similarities among Languages for Machine Translation; Tomas Mikolov, Quoc V. Le, Ilya Sutskever; Google

word2vec Explained: Deriving Mikolov et al.’s Negative-Sampling Word-Embedding Method; Yoav Goldberg and Omer Levy

A Few Useful Things to Know about Machine Learning; Pedro Domingos, University of Washington

A Neural Conversational Model; Oriol Vinyals and Quoc Le, Google

On Chomsky and the Two Cultures of Statistical Learning; Peter Norvig

Geometry of the restricted Boltzmann machine; Maria Angelica Cueto, Jason Morton, Bernd Sturmfels

Untersuchungen zu dynamischen neuronalen Netzen; Josef “Sepp” Hochreiter und Juergen Schmidhuber

Notes on Contrastive Divergence

Transition-Based Dependency Parsing with Stack Long Short-Term Memory; Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, Noah A. Smith

How transferable are features in deep neural networks?; Jason Yosinski, Jeff Clune, Yoshua Bengio and Hod Lipson

Learning Internal Representations by Error Propagation; Rumelhart, Hinton and Williams

Backpropagation Through Time: What It Does and How to Do It; Paul Werbos

Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation; Cho et al

Explorations in Parallel Distributed Processing: A Handbook of Models, Programs, and Exercises; James L. McClelland

Memory Networks & QA Systems; Jason Weston, Sumit Chopra & Antoine Bordes (2014)

Understanding Machine Learning: From Theory to Algorithms

Reinforcement Learning: An Introduction; Richard Sutton and Andrew Barto

Algorithms for Reinforcement Learning; Csaba Szepesvári

Playing Atari with Deep Reinforcement Learning; Volodymyr Mnih et al

The Markov Chain Monte Carlo Revolution; Diaconis

An Introduction to MCMC for Machine Learning

Continuous control with deep reinforcement learning; DeepMind

Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks

Using Neural Networks for Modeling and Representing Natural Languages

Thought Vectors

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank;
Socher et al. 2013. Introduces Recursive Neural Tensor Network. Uses a parse tree.

Distributed Representations of Sentences and Documents
Le; Mikolov. 2014. Introduces Paragraph Vector. Concatenates and averages pretrained, fixed word vectors to create vectors for sentences, paragraphs and documents. Also known as paragraph2vec. Doesn’t use a parse tree.

Deep Recursive Neural Networks for Compositionality in Language;
Irsoy & Cardie. 2014. Uses Deep Recursive Neural Networks. Uses a parse tree.

Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks; Tai et al. 2015 Introduces Tree LSTM. Uses a parse tree.

Dialog

A Neural Network Approach to Context-Sensitive Generation of Conversational Responses; Sordoni 2015. Generates responses to tweets. Uses Recurrent Neural Network Language Model (RLM) architecture of (Mikolov et al., 2010).

Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks; Weston 2015. Classifies QA tasks. Expands on Memory Networks.

A Neural Conversation Model; Vinyals, Le 2015. Uses LSTM RNNs to generate conversational responses. Uses seq2seq framework.

**A Tutorial on Support Vector Machines for Pattern Recognition

Advanced Memory Architectures

Neural Turing Machines; Graves et al. 2014.

Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets; Joulin, Mikolov 2015. Stack RNN source code

Researchers’ Personal Websites

Linear Algebra Resources

Other Resources

Open Data for Deep Learning
Machine Learning: Generative and Discriminative Models (Power Point); Sargur N. Srihari
Neural Networks Demystified (A seven-video series)
A Neural Network in 11 Lines of Python
A Step-by-Step Backpropagation Example
Generative Learning algorithms; Notes by Andrew Ng
Calculus on Computational Graphs: Backpropagation
Understanding LSTM Networks
Probability Cheatsheet