A neat idea from L. Breiman
Bryan Thompson
bryan at ad.cise.nsf.gov
Mon Aug 24 10:57:37 EDT 1992
This sounds very much like an attempt to approximate the results of
dynamic programming. In dynamic programming each state (decision
point) or state-action pair is associated with the cumulative
expected future utility of that action. Decisions are then made
locally by selecting the option that has the highest cumulative
expected future utility. Standard dynamic programming works
backwards from a terminus, but there are approximations to dynamic
programming that operate in forward time (e.g. heuristic dynamic
programming or Q-learning) and can be applied within neural networks.
In practive these approximations often develop non-optimal decision
policies and a balance must be struck between preformance (when you
always choose the action that is perceived to be optimal) and
exploration (when you choose actions that are believed _not_ to be
optimal in order to efficently explore the decision space, avoid
local minima in the decision policy, detect a changing environment,
etc.).
Bryan Thompson
National Science Foundation
More information about the Connectionists
mailing list