In this graduate-level seminar we will read an discuss papers on Reinforcement Learning. The first 3-4 weeks will be lectures based on Sutton and Barto's book (see below). Following lectures will be given by students in the class. Each student is expected to teach two lectures.
Days: TuTh Time: 2pm - 3:20pm Room: EBU3B 4140 / On 3/8/2012 - Room 4217
- Neuro-Dynamic Programming Bertsekas and Tsitsiklis. In spite of the strange name, this is probably the most rigorous book on reinforcement learning.
- Reinforcement Learning / An introduction Sutton & Barto, Pointer to online version.
- Algorithms for Reinforcement Learning Csaba Szepesvari (Narrower than Sutton and Barto, focused on mathematical analysis and proofs). Web page on book (not an online version)
- Reinforcement Learning and Dynamic Programming Using Function Approximators (Automation and Control Engineering) Robert Babuska, Lucian Busoniu, and Bart de Schutter.
- Approximate dynamic programming: solving the curses of dimensionality Warren Powel.
Repositories / Conference Proceedings
- Reinforcement learning repository University of Massachusetts, Amherst
- EWRL European Workshop On Reinforcement Learning.
- ADPRL 2011 Adaptive Dynamic Programming And Reinforcement Learning
- On the complexity of solving Markov decision problems Littman, Michael, Thomas L. Dean and Leslie Pack Kaelbling, 1995
- A survey of POMDP solution techniques Kevin Murphy, 2000
- Tony Cassandra's POMDP Research papers (check out his PhD. Thesis)
- An analysis of temporal-difference learning with function approximation Tsitsiklis and Van Roy. 1997
- Variable resolution discretization for high-accuracy solutions of optimal control problems Munos, Remi and Andrew Moore. 1999
- Near-optimal regret bounds for reinforcement learning Peter Auer, Thomas Jaksch and Ronald Ortner, 2010
- Model-based Reinforcement Learning with Nearly Tight Exploration Complexity Bounds Istvan Szita and Csaba Szepesvari. 2010
- Dynamic Programming and Optimal Control / Approximate Dynamic Control (Chapter 6) Dimitri P. Bertsekas, 2010
- Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax Michel Tokic and Günther Palm, 2011.
- Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms Satinder Singh, Tommi Jaakkola, Michael L. Littman and Csaba Szepesvári (2000)
- Reinforcement Learning in Finite MDPs: PAC Analysis Alexander L. Strehl, Lihong Li, Michael L. Littman; November 2 2009.
- Knows what it knows: A framework for self-aware learning Lihong Li, Michael L. Littman, Thomas J. Walsh, and Alexander L. Strehl, 2011.
- Reinforcement Learning for Humanoid Robotics Jan Peters, Sethu Vijayakumar and Stefan Schaal. 2003
- Module-Based Reinforcement Learning: Experiments with a Real Robot Zsolt Kalmár, Csaba Szepesvári and András Lörincz. 1998
- Batch mode reinforcement learning
- A Generalization Error for Q-Learning Susan Murphy 2005
- Tree-based batch mode reinforcement learning.Ernst, D., Geurts, P., and Wehenkel, L. (2005).
- Fast gradient-descent methods for temporal-difference learning with linear function approximation Richard S. Sutton, Hamid Reza Maei, Doina Precup, Shalabh Bhatnagar, David Silver, Csaba Szepesva ́ri, Eric Wiewiora. (2009)