Deep Learning Resources
Online Courses
- Andrew Ng’s Machine-Learning Class on Coursera
- Geoff Hinton’s Neural Networks Class on Coursera (2012)
- U. Toronto: Introduction to Neural Networks (2015)
- Yann LeCun’s NYU Couse
- Ng’s Lecture Notes for Stanford’s CS229 Machine Learning
- Nando de Freitas’s Deep Learning Class at Oxford (2015)
- Andrej Karpathy’s Convolutional Neural Networks Class at Stanford
- Patrick Winston’s Introduction to Artificial Intelligence
- Richard Socher’s Deep Learning for NLP course
- Machine Learning and Probabilistic Graphical Models
- Bhiksha Raj’s “Deep Learning” @CMU
- Sebastian Thrun’s “Artificial Intelligence and Robotics”
- Caltech’s Learning From Data ML Course
- Deep Learning Course at Udacity; Vincent Vanhoucke
Deep- and Machine-Learning Fora
- Google + Deep Learning Group
- KDNuggets: Data Science Hub
- Datatau: Hacker News for Data Science
- r/MachineLearning
- Deeplearning.net: A Portal for Theano/PyLearn
- Gitter Channel for Deeplearning4j
Get Started With Deeplearning4j
Reinforcement Learning
Academic Papers and Other Writings
Deep Learning Boook; Yoshua Bengio, Ian Goodfellow, Aaron Courville; MIT Press
Understanding LSTMs; Christopher Olah
Semantic Compositionality through Recursive Matrix-Vector Spaces; Richard Socher, Brody Huval, Christopher D. Manning and Andrew Y. Ng; Computer Science Department, Stanford University
Deep learning of the tissue-regulated splicing code; Michael K. K. Leung, Hui Yuan Xiong, Leo J. Lee and Brendan J. Frey
The human splicing code reveals new insights into the genetic determinants of disease; Hui Y. Xiong et al
Notes on AdaGrad; Chris Dyer; School of Computer Science, Carnegie Mellon University
Adaptive Step-Size for Online Temporal Difference Learning; William Dabney and Andrew G. Barto; University of Massachusetts Amherst
Practical Recommendations for Gradient-Based Training of Deep Architectures; Yoshua Bengio; 2012
Greedy Layer-Wise Training of Deep Networks; Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle; Université de Montreal
Notes on Convolutional Neural Networks; Jake Bouvrie; Center for Biological and Computational Learning, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology
Natural Language Processing (Almost) from Scratch; Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu and Pavel Kuksa; NEC Laboratories America
Unsupervised Feature Learning Via Sparse Hierarchical Representations; Honglak Lee; Stanford University; August 2010
Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations; Honglak Lee, Roger Grosse, Rajesh Ranganath, Andrew Y. Ng; Computer Science Department, Stanford University, Stanford
Deep Belief Networks for phone recognition; Abdel-rahman Mohamed, George Dahl, and Geoffrey Hinton; Department of Computer Science, University of Toronto
Reducing the Dimensionality of Data with Neural Networks; G. E. Hinton and R. R. Salakhutdinov; 28 July 2006 vol. 313 Science
Using Very Deep Autoencoders for Content-Based Image Retrieval; Alex Krizhevsky and Geoffrey E. Hinton; University of Toronto, Dept of Computer Science
Learning Deep Architectures for AI; Yoshua Bengio; Dept. IRO, Université de Montreal
Analysis of Recurrent Neural Networks with Application to Speaker Independent Phoneme Recognition; Esko O. Dijk; University of Twente, Department of Electrical Engineering
A fast learning algorithm for deep belief nets; Geoffrey E. Hinton and Simon Osindero, Department of Computer Science University of Toronto; Yee-Whye Teh, Department of Computer Science, National University of Singapore
Learning Deep Architectures for AI; Yoshua Bengio; Foundations and Trends in Machine Learning, Vol. 2, No. 1 (2009)
An Analysis of Gaussian-Binary Restricted Boltzmann Machines for Natural Images; Nan Wang, Jan Melchior and Laurenz Wiskott; Institut fuer Neuroinformatik and International Graduate School of Neuroscience
IPAM Summer School 2012 Tutorial on: Deep Learning; Geoffrey Hinton; Canadian Institute for Advanced Research & Department of Computer Science, University of Toronto
A Practical Guide to Training Restricted Boltzmann Machines; Geoffrey Hinton; Department of Computer Science, University of Toronto
Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent; Feng Niu, Benjamin Recht, Christopher Re and Stephen J. Wright; Computer Sciences Department, University of Wisconsin-Madison
Improved Learning of Gaussian-Bernoulli Restricted Boltzmann Machines; KyungHyun Cho, Alexander Ilin, and Tapani Raiko; Department of Information and Computer Science, Aalto University School of Science, Finland
Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations; Honglak Lee, Roger Grosse, Rajesh Ranganath and Andrew Y. Ng; Computer Science Department, Stanford University
Rectified Linear Units Improve Restricted Boltzmann Machines; Vinod Nair and Geoffrey E. Hinton; Department of Computer Science, University of Toronto
Iris Data Analysis Using Back Propagation Neural Networks; Sean Van Osselaer; Murdoch University, Western Australia
Distributed Training Strategies for the Structured Perceptron; Ryan McDonald, Keith Hall and Gideon Mann; Google
Large Scale Distributed Deep Networks; Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc’Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang and Andrew Y. Ng; Google
Learning meanings for sentences; Charles Elkan; University of California San Diego; 2013
Lecture 17: Linear Gaussian Models; Kevin Murphy; University of British Columbia; 17 November 2004
Efficient Backprop; Yann LeCun, Leon Bottou, Genevieve B. Orr and Klaus-Robert Mueller; various institutions.
Deep Learning for NLP (without magic); Richard Socher and Christopher Manning; Stanford University
Deep Neural Networks for Object Detection; Christian Szegedy, Alexander Toshev and Dumitru Erhan; Google
Deep Learning: Methods And Applications; Li Deng and Dong Yu; Microsoft Research
Numerical Optimization; Jorge Nocedal and Stephen J. Wright; Springer
Neural Networks for Named-Entity Recognition; Richard Socher; Programming Assignment 4, CS 224N; Dec. 5th, 2012
Large Scale Deep Learning; Quoc V. Le; Google & Carnegie Mellon University; MLconf 2013
Deep Learning Made Easier by Linear Transformations in Perceptrons; Tapani Raiko, Harri Valpola and Yann LeCun; Aalto University and New York University
Training Restricted Boltzmann Machines on Word Observations; George E. Dahl, Ryan P. Adams and Hugo Larochelle; University of Toronto, Harvard University and Université de Sherbrooke
Representational Power of Restricted Boltzmann Machines and Deep Belief Networks; Nicolas Le Roux and Yoshua Bengio; Université de Montréal
Robust Boltzmann Machines for Recognition and Denoising; Yichuan Tang, Ruslan Salakhutdinov and Geoffrey Hinton; University of Toronto
Semantic hashing; Ruslan Salakhutdinov and Geoffrey Hinton; Department of Computer Science, University of Toronto
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank; Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng and Christopher Potts; Stanford University
Opinion Mining and Sentiment Analysis; Bo Pang and Lillian Lee; Yahoo Research; Foundations and Trends in Information Retrieval
Sparse autoencoder: CS294A Lecture notes; Andrew Ng; Stanford University
Deep Sparse Rectifier Neural Networks; Xavier Glorot, Antoine Bordes and Yoshua Bengio; University of Montreal
Stochastic Pooling for Regularization of Deep Convolutional Neural Networks; Matthew D. Zeiler and Rob Fergus; Courant Institute, New York University
Symmetry breaking in non-monotonic neural nets; G. Boffetta, R. Monasson and R. Zecchina; Journal of Physics A: Mathematical and General
Phone Recognition Using Restricted Boltzmann Machines; Abdel-rahman Mohamed and Geoffrey Hinton; University of Toronto
Why Does Unsupervised Pre-training Help Deep Learning?; Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre-Antoine Manzagol, Pascal Vincent and Samy Bengio; Université de Montréeal and Google Research
Training Restricted Boltzmann Machines on Word Observations; George E. Dahl, Ryan P. Adams and Hugo Larochelle; University of Toronto, Harvard University and Université de Sherbrooke
Visually Debugging Restricted Boltzmann Machine Training with a 3D Example; Jason Yosinski and Hod Lipson; Cornell University
Efficient Estimation of Word Representations in Vector Space; Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean; Google
Exploiting Similarities among Languages for Machine Translation; Tomas Mikolov, Quoc V. Le, Ilya Sutskever; Google
word2vec Explained: Deriving Mikolov et al.’s Negative-Sampling Word-Embedding Method; Yoav Goldberg and Omer Levy
A Few Useful Things to Know about Machine Learning; Pedro Domingos, University of Washington
A Neural Conversational Model; Oriol Vinyals and Quoc Le, Google
On Chomsky and the Two Cultures of Statistical Learning; Peter Norvig
Geometry of the restricted Boltzmann machine; Maria Angelica Cueto, Jason Morton, Bernd Sturmfels
Untersuchungen zu dynamischen neuronalen Netzen; Josef “Sepp” Hochreiter und Juergen Schmidhuber
Notes on Contrastive Divergence
Transition-Based Dependency Parsing with Stack Long Short-Term Memory; Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, Noah A. Smith
How transferable are features in deep neural networks?; Jason Yosinski, Jeff Clune, Yoshua Bengio and Hod Lipson
Learning Internal Representations by Error Propagation; Rumelhart, Hinton and Williams
Backpropagation Through Time: What It Does and How to Do It; Paul Werbos
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation; Cho et al
Explorations in Parallel Distributed Processing: A Handbook of Models, Programs, and Exercises; James L. McClelland
Memory Networks & QA Systems; Jason Weston, Sumit Chopra & Antoine Bordes (2014)
Understanding Machine Learning: From Theory to Algorithms
Reinforcement Learning: An Introduction; Richard Sutton and Andrew Barto
Algorithms for Reinforcement Learning; Csaba Szepesvári
Playing Atari with Deep Reinforcement Learning; Volodymyr Mnih et al
The Markov Chain Monte Carlo Revolution; Diaconis
An Introduction to MCMC for Machine Learning
Continuous control with deep reinforcement learning; DeepMind
Using Neural Networks for Modeling and Representing Natural Languages
Thought Vectors
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank;
Socher et al. 2013. Introduces Recursive Neural Tensor Network. Uses a parse tree.
Distributed Representations of Sentences and Documents
Le; Mikolov. 2014. Introduces Paragraph Vector. Concatenates and averages pretrained, fixed word vectors to create vectors for sentences, paragraphs and documents. Also known as paragraph2vec. Doesn’t use a parse tree.
Deep Recursive Neural Networks for Compositionality in Language;
Irsoy & Cardie. 2014. Uses Deep Recursive Neural Networks. Uses a parse tree.
Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks; Tai et al. 2015 Introduces Tree LSTM. Uses a parse tree.
Dialog
A Neural Network Approach to Context-Sensitive Generation of Conversational Responses; Sordoni 2015. Generates responses to tweets. Uses Recurrent Neural Network Language Model (RLM) architecture of (Mikolov et al., 2010).
Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks; Weston 2015. Classifies QA tasks. Expands on Memory Networks.
A Neural Conversation Model; Vinyals, Le 2015. Uses LSTM RNNs to generate conversational responses. Uses seq2seq framework.
**A Tutorial on Support Vector Machines for Pattern Recognition
Advanced Memory Architectures
Neural Turing Machines; Graves et al. 2014.
Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets; Joulin, Mikolov 2015. Stack RNN source code
Researchers’ Personal Websites
Linear Algebra Resources
- Andrew Ng’s 6-Part Review of Linear Algebra
- Linear Algebra for Machine Learning; Patrick van der Smagt
- Khan Academy’s Linear Algebra Course
- CMU’s Linear Algebra Review
- The Matrix Cookbook
- Old and New Matrix Algebra Useful for Statistics
- Math for Machine Learning
- Immersive Linear Algebra
Other Resources
- Open Data for Deep Learning
- Machine Learning: Generative and Discriminative Models (Power Point); Sargur N. Srihari
- Neural Networks Demystified (A seven-video series)
- A Neural Network in 11 Lines of Python
- A Step-by-Step Backpropagation Example
- Generative Learning algorithms; Notes by Andrew Ng
- Calculus on Computational Graphs: Backpropagation
- Understanding LSTM Networks
- Probability Cheatsheet