From seed
Jump to: navigation, search

Reinforcement learning

In this graduate-level seminar we will read an discuss papers on Reinforcement Learning. The first 3-4 weeks will be lectures based on Sutton and Barto's book (see below). Following lectures will be given by students in the class. Each student is expected to teach two lectures.


Days: TuTh Time: 2pm - 3:20pm Room: EBU3B 4140 / On 3/8/2012 - Room 4217


Repositories / Conference Proceedings

Recommended Papers

  • Theory
  1. On the complexity of solving Markov decision problems Littman, Michael, Thomas L. Dean and Leslie Pack Kaelbling, 1995
  2. A survey of POMDP solution techniques Kevin Murphy, 2000
  3. Tony Cassandra's POMDP Research papers (check out his PhD. Thesis)
  4. An analysis of temporal-difference learning with function approximation Tsitsiklis and Van Roy. 1997
  5. Variable resolution discretization for high-accuracy solutions of optimal control problems Munos, Remi and Andrew Moore. 1999
  6. Near-optimal regret bounds for reinforcement learning Peter Auer, Thomas Jaksch and Ronald Ortner, 2010
  7. Model-based Reinforcement Learning with Nearly Tight Exploration Complexity Bounds Istvan Szita and Csaba Szepesvari. 2010
  8. Dynamic Programming and Optimal Control / Approximate Dynamic Control (Chapter 6) Dimitri P. Bertsekas, 2010
  9. Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax Michel Tokic and Günther Palm, 2011.
  10. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms Satinder Singh, Tommi Jaakkola, Michael L. Littman and Csaba Szepesvári (2000)
  11. Reinforcement Learning in Finite MDPs: PAC Analysis Alexander L. Strehl, Lihong Li, Michael L. Littman; November 2 2009.
  12. Knows what it knows: A framework for self-aware learning Lihong Li, Michael L. Littman, Thomas J. Walsh, and Alexander L. Strehl, 2011.

  • Practice
  1. Reinforcement Learning for Humanoid Robotics Jan Peters, Sethu Vijayakumar and Stefan Schaal. 2003
  2. Module-Based Reinforcement Learning: Experiments with a Real Robot Zsolt Kalmár, Csaba Szepesvári and András Lörincz. 1998
  • Batch mode reinforcement learning
  1. A Generalization Error for Q-Learning Susan Murphy 2005
  2. Tree-based batch mode reinforcement learning.Ernst, D., Geurts, P., and Wehenkel, L. (2005).
  3. Fast gradient-descent methods for temporal-difference learning with linear function approximation Richard S. Sutton, Hamid Reza Maei, Doina Precup, Shalabh Bhatnagar, David Silver, Csaba Szepesva ́ri, Eric Wiewiora. (2009)