# ReinforcementLearning

## Contents

## Reinforcement learning

In this graduate-level seminar we will read an discuss papers on Reinforcement Learning. The first 3-4 weeks will be lectures based on Sutton and Barto's book (see below). Following lectures will be given by students in the class. Each student is expected to teach two lectures.

## Coordinates

**Days:** TuTh
**Time:** 2pm - 3:20pm
**Room:** EBU3B 4140 / On 3/8/2012 - Room 4217

## Books

- Neuro-Dynamic Programming Bertsekas and Tsitsiklis. In spite of the strange name, this is probably the most rigorous book on reinforcement learning.
- Reinforcement Learning / An introduction Sutton & Barto, Pointer to online version.
- Algorithms for Reinforcement Learning Csaba Szepesvari (Narrower than Sutton and Barto, focused on mathematical analysis and proofs). Web page on book (not an online version)
- Reinforcement Learning and Dynamic Programming Using Function Approximators (Automation and Control Engineering) Robert Babuska, Lucian Busoniu, and Bart de Schutter.
- Approximate dynamic programming: solving the curses of dimensionality Warren Powel.

## Repositories / Conference Proceedings

- Reinforcement learning repository University of Massachusetts, Amherst
- EWRL European Workshop On Reinforcement Learning.
- ADPRL 2011 Adaptive Dynamic Programming And Reinforcement Learning

## Recommended Papers

- Theory

- On the complexity of solving Markov decision problems Littman, Michael, Thomas L. Dean and Leslie Pack Kaelbling, 1995
- A survey of POMDP solution techniques Kevin Murphy, 2000
- Tony Cassandra's POMDP Research papers (check out his PhD. Thesis)
- An analysis of temporal-difference learning with function approximation Tsitsiklis and Van Roy. 1997
- Variable resolution discretization for high-accuracy solutions of optimal control problems Munos, Remi and Andrew Moore. 1999
- Near-optimal regret bounds for reinforcement learning Peter Auer, Thomas Jaksch and Ronald Ortner, 2010
- Model-based Reinforcement Learning with Nearly Tight Exploration Complexity Bounds Istvan Szita and Csaba Szepesvari. 2010
- Dynamic Programming and Optimal Control / Approximate Dynamic Control (Chapter 6) Dimitri P. Bertsekas, 2010
- Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax Michel Tokic and Günther Palm, 2011.
- Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms Satinder Singh, Tommi Jaakkola, Michael L. Littman and Csaba Szepesvári (2000)
- Reinforcement Learning in Finite MDPs: PAC Analysis Alexander L. Strehl, Lihong Li, Michael L. Littman; November 2 2009.
- Knows what it knows: A framework for self-aware learning Lihong Li, Michael L. Littman, Thomas J. Walsh, and Alexander L. Strehl, 2011.

- Practice

- Reinforcement Learning for Humanoid Robotics Jan Peters, Sethu Vijayakumar and Stefan Schaal. 2003
- Module-Based Reinforcement Learning: Experiments with a Real Robot Zsolt Kalmár, Csaba Szepesvári and András Lörincz. 1998

- Batch mode reinforcement learning

- A Generalization Error for Q-Learning Susan Murphy 2005
- Tree-based batch mode reinforcement learning.Ernst, D., Geurts, P., and Wehenkel, L. (2005).
- Fast gradient-descent methods for temporal-difference learning with linear function approximation Richard S. Sutton, Hamid Reza Maei, Doina Precup, Shalabh Bhatnagar, David Silver, Csaba Szepesva ́ri, Eric Wiewiora. (2009)