Optimal Ordered Problem Solver

Mon Aug 5 09:23:18 EDT 2002

Optimal Ordered Problem Solver       Juergen Schmidhuber, IDSIA

TR IDSIA-12-02;     arXiv:cs.AI/0207097 v1;        31 July 2002
36 pages,  submitted to JMRL,  condensed 8 page version to NIPS
http://www.idsia.ch/~juergen/oops.html
ftp://ftp.idsia.ch/pub/juergen/oops.ps.gz

We extend principles of non-incremental universal search to
build a novel, optimally fast, incremental learner that is able
to improve itself through experience. The Optimal Ordered Problem
Solver (OOPS) searches for a universal algorithm that solves each
task in a sequence of tasks. It continually organizes and exploits
previously found solutions to earlier tasks, efficiently searching
not only the space of domain-specific algorithms, but also the
space of search algorithms.

The initial bias is embodied by a task-dependent probability
distribution on possible program prefixes (pieces of code that
may continue). Prefixes are self-delimiting and executed in online
fashion while being generated. They compute the probabilities of
their own possible continuations. Let p^n denote a found prefix
solving the first n tasks. It may exploit previous solutions
p^i (i<n) stored in non-modifiable memory by calling them as
subprograms, or by copying them into modifiable memory and editing
the copies before executing them. We provide equal resources for
two searches that run in parallel until p^{n+1} is discovered and
stored. The first search is exhaustive; it systematically tests
all possible prefixes on all tasks up to n+1. The second search is
much more focused; it only searches for prefixes that start with
p^n, and only tests them on task n+1, which is safe, because we
already know that such prefixes solve all tasks up to n.  Both
searches are depth-first and bias-optimal: the branches of the
search trees are program prefixes, and backtracking is triggered
once the sum of the runtimes of the current prefix on all current
tasks exceeds the prefix probability multiplied by the total
search time so far.

In illustrative experiments, our self-improver becomes the first
general system that learns to solve all n disk Towers of Hanoi
tasks (solution size 2^n-1) for n up to 30, profiting from
previously solved, simpler tasks involving samples of a simple
context free language.

Keywords:  OOPS, bias-optimality, incremental optimal universal
search, metasearching, metalearning, self-improvement

---------------------------------------------------------------

I welcome comments for a revised version, including pointers to
additional relevant references.

Juergen Schmidhuber                http://www.idsia.ch/~juergen