A neat idea from L. Breiman

Bryan Thompson bryan at ad.cise.nsf.gov
Mon Aug 24 10:57:37 EDT 1992


This sounds very much like an attempt to approximate the results of  
dynamic programming.  In dynamic programming each state (decision  
point) or state-action pair is associated with the cumulative  
expected future utility of that action.  Decisions are then made  
locally by selecting the option that has the highest cumulative  
expected future utility.  Standard dynamic programming works  
backwards from a terminus, but there are approximations to dynamic  
programming that operate in forward time (e.g. heuristic dynamic  
programming or Q-learning) and can be applied within neural networks.

In practive these approximations often develop non-optimal decision  
policies and a balance must be struck between preformance (when you  
always choose the action that is perceived to be optimal) and  
exploration (when you choose actions that are believed _not_ to be  
optimal in order to efficently explore the decision space, avoid  
local minima in the decision policy, detect a changing environment,  
etc.).

Bryan Thompson
National Science Foundation



More information about the Connectionists mailing list