Composite Networks

Wed Mar 4 10:33:27 EST 1992

Hi! 

	***This is in repsonse to P. J. Hampson's message about composite
networks.***

>From STAY8026 at iruccvax.ucc.ie Mon Mar  2 09:45:00 1992
  >Subject: Composite networks
  >Hi,
  >
  >I am interested in modelling tasks in which invariant information from
  >previous input-output pairs is brought to bear on the acquisition of current
  >input-output pairs.  Thus I want to use previously extracted regularity to
  >influence current processing.  Does anyone think this is feasible??
  >...

	I have studied learning agents that have to learn to solve MULTIPLE
sequential decision tasks (SDTs) in the same external environment.
Specifically, I have looked at reinforcement learning agents that have to
solve a set of compositionally-structured sequential decision tasks.

E.g., consider a navigation environment ( a robot in a room):

Task 1: Go to location A optimally.
Task 2: Go to location B optimally.
Task 3: Go to location A and then to B optimally.

Tasks 1 and 2 are 'elemental' SDTs and Task 3 is a 'composite' SDT.

I have studied two different ways of achieving the obvious kind of TRANSFER
of LEARNING across such a set of tasks. I am going to (try and) be brief -
anyone interested in further discussion or my papers can contact me
individually. 

Method 1:
*********

I have used a modified Jacobs-Jordan-Nowlan-Hinton ``mixture  of expert
modules'' network with Watkin's Q-learning algorithm to construct a 
mixture of ``adaptive critics'' that learns the elemental tasks in separate
modules and then the gating module learns to sequence the correct elemental
modules to solve the composite tasks. Note, that the representation of the
tasks is not ``linguistic'' and therefore the agent cannot simply ``parse''
the composite task representation to determine which elemental modules to
sequence. The decomposition has to be discovered by trial-and-error.

	Transfer of learning is achieved by sharing the solution of
previously acquired elemental tasks across multiple composite tasks.

	Sequential decision tasks are particularly difficult to learn to
solve because there is no supervised target information, only a
success/failure reponse at the end of the task.

Ref:
---

@InProceedings{Singh-NIPS4,
  author = 	"Singh,S.P.",
  title = 	"On the efficient learning of multiple sequential tasks",
  booktitle = 	"Advances in Neural Information Processing Systems 4",
  year = 	"1992",
  editor = 	"J.E. Moody and S.J. Hanson and R.P. Lippman",
  OPTpages = 	"",
  OPTorganization = 	"",
  publisher = 	"Morgan Kauffman",
  address = 	"San Mateo, CA",
  OPTmonth = 	"",
  OPTnote = 	"Oral"}

@Article{Singh-MLjournal,
  author = 	"Singh,S.P.",
  title = 	"Transfer of Learning by Composing Solutions for Elemental
		 Sequential Tasks",
  journal = 	"Machine Learning",
  year = 	"1992",
  OPTvolume = 	"",
  OPTnumber = 	"",
  OPTpages = 	"",
  OPTmonth = 	"",
  OPTnote = 	"to appear"}

@phdthesis{Watkins-thesis,
        author="C. J. C. H. Watkins",
        title="Learning from Delayed Rewards",
        school="Cambridge Univ.",
        address="Cambridge, England", year=1989}

@Article{Jacobs-Jordan-Nowlan-Hinton,
  author = 	"R. A. Jacobs and M. I. Jordan and S. J. Nowlan and G. E. Hinton",
  title = 	"Adaptive Mixtures of Local Experts",
  journal = 	"Neural Computation",
  year = 	"1991",
  volume = 	"3",
  number = 	"1",
  OPTpages = 	"",
  OPTmonth = 	"",
  OPTnote = 	""
}

Method 2:
*********

Method 1 did not learn models of the environment. For learning to solve a
single SDT it is not always clear that the considerable expense of doing
system identification is warranted (Barto and Singh, Gullapalli), however
if an agent is going to solve multiple tasks in the same environment it
is almost certainly going to be useful.

I consider a hierarchy of world-models, where the ``actions/operators'' for
upper level models are the policies for tasks lower in the hierarchy. I
prove that for compositionally-structured tasks, doing Dynamic Programming
in such upper level models leads to the same solutions as doing it in the
real world - only it is much faster since the actions of the upper level
world models ignore much TEMPORAL DETAIL.

Ref:
****

@InProceedings{Singh-AAAI92,
  author = 	"Singh, S.P.",
  title = 	"Reinforcement learning with a hierarchy of abstract models",
  booktitle = 	"Proceedings of the Tenth National Conference on Artificial
		 Intelligence",
  year = 	"1992",
  OPTeditor = 	"",
  OPTpages = 	"",
  OPTorganization = 	"",
  OPTpublisher = 	"",
  address = 	"San Jose,CA",
  OPTmonth = 	"",
  note = 	"Forthcoming"
}

@InProceedings{Singh-ML92,
  author = 	"Singh, S.P.",
  title = 	"Scaling reinforcement learning algorithms by learning
		 variable temporal resolution models",
  booktitle = 	"Proceedings of the Machine Learning Conference, 1992",
  year = 	"1992",
  OPTeditor = 	"",
  OPTpages = 	"",
  OPTorganization = 	"",
  OPTpublisher = 	"",
  OPTaddress = 	"",
  OPTmonth = 	"",
  OPTnote = 	"to appear"
}

@inproceedings{Barto-Singh,
	title="On the Computational Economics of Reinforcement Learning",
	author="Barto, A.G. and Singh,S.P.",
	booktitle="Proceedings of the 1990 Connectionist Models Summer School",
	year="Nov. 1990",
	address="San Mateo, CA",
	editors="Touretzsky, D.S. and Elman, J.L. and Sejnowski, T.J. and
		Hinton, G.E.",
	publisher="Morgan Kaufmann",
	status="Read"}

@incollection{Gullapalli,
	author="V. Gullapalli",
	title="A Comparison of Supervised and Reinforcement Learning Methods on a Reinforcement
		Learning Task",
	booktitle = "Proceedings of the 1991 {IEEE} Symposium on Intelligent Control", 
	address="Arlington, VA", year="1991"}

satinder.
satinder at cs.umass.edu