PhD thesis available on reinforcement learning

Tue May 6 22:42:22 EDT 2003

Hi all,

My thesis, "On the Sample Complexity of Reinforcement Learning", is now
available at:

http://www.gatsby.ucl.ac.uk/~sham/publications.html

Below is the abstract and table of contents.

cheers 
-Sham

=================================================================
Abstract:

This thesis is a detailed investigation into the following question: how
much data must an agent collect in order to perform "reinforcement
learning" successfully? This question is analogous to the classical issue
of the sample complexity in supervised learning, but is harder because of
the increased realism of the reinforcement learning setting. This thesis
summarizes recent sample complexity results in the reinforcement learning
literature and builds on these results to provide novel algorithms with
strong performance guarantees.

We focus on a variety of reasonable performance criteria and sampling
models by which agents may access the environment. For instance, in a
policy search setting, we consider the problem of how much simulated
experience is required to reliably choose a "good" policy among a
restricted class of policies \Pi (as in Kearns, Mansour, and Ng [2000]).
In a more online setting, we consider the case in which an agent is placed
in an environment and must follow one unbroken chain of experience with no
access to "offline" simulation (as in Kearns and Singh [1998]).

We build on the sample based algorithms suggested by Kearns, Mansour, and
Ng [2000]. Their sample complexity bounds have no dependence on the size
of the state space, an exponential dependence on the planning horizon
time, and linear dependence on the complexity of \Pi . We suggest novel
algorithms with more restricted guarantees whose sample complexities are
again independent of the size of the state space and depend linearly on
the complexity of the policy class \Pi , but have only a polynomial
dependence on the horizon time. We pay particular attention to the
tradeoffs made by such algorithms.

=================================================================
Table of Contents:

Chapter 1 Introduction
    1.1 Studying the Sample Complexity
    1.2 Why do we we care about the sample complexity?
    1.3 Overview
    1.4 Agnostic Reinforcement Learning
Chapter 2 Fundamentals of Markov Decision Processes
    2.1 MDP Formulation
    2.2 Optimality Criteria
    2.3 Exact Methods
    2.4 Sampling Models and Sample Complexity
    2.5 Near-Optimal, Sample Based Planning
Chapter 3 Greedy Value Function Methods
    3.1 Approximating the Optimal Value Function
    3.2 Discounted Approximate Iterative Methods
    3.3 Approximate Linear Programming
Chapter 4 Policy Gradient Methods
    4.1 Introduction
    4.2 Sample Complexity of Estimation
    4.3 The Variance Trap
Chapter 5 The Mismeasure of Reinforcement Learning
    5.1 Advantages and the Bellman Error
    5.2 Performance Differences
    5.3 Non-stationary Approximate Policy Iteration
    5.4 Remarks
Chapter 6 \mu -Learnability
    6.1 The Trajectory Tree Method
    6.2 Using a Measure \mu 
    6.3 \mu -PolicySearch
    6.4 Remarks
Chapter 7 Conservative Policy Iteration
    7.1 Preliminaries
    7.2 A Conservative Update Rule
    7.3 Conservative Policy Iteration
    7.4 Remarks
Chapter 8 On the Sample Complexity of Exploration
    8.1 Preliminaries
    8.2 Optimality Criteria
    8.3 Main Theorems
    8.4 The Modified R_{max} Algorithm
    8.5 The Analysis
    8.6 Lower Bounds
Chapter 9 Model Building and Exploration
    9.1 The Parallel Sampler
    9.2 Revisiting Exploration 
Chapter 10 Discussion
    10.1 N, A, and T
    10.2 From Supervised to Reinforcement Learning
    10.3 POMDPs
    10.4 The Complexity of Reinforcement Learning