Data Allocation

Fri Oct 23 00:24:47 EDT 1998

INTRODUCTION: With respect to my query regarding the DATA ALLOCATION
problem, I got several interesting replies and biblio
pointers. However, I feel that my original question has NOT been
answered yet.

I want to keep this message relatively brief. I first present a summary of the
original question, then my own results up to date, then the responses by other
people and some of my subsequent thoughts. At the end of this message you can
find a short BIBLIOGRAPHY of related references. A more complete presentation
and a larger BIBTEX file can be found at

<http://skiron.control.ee.auth.gr/%7Ekehagias/thn/thn030.htm>http://skiron.
control.ee.auth.gr/~kehagias/thn/thn030.htm

=========================================================================

DATA ALLOCATION PROBLEM: The full description was given in a previous message
on this list and can also be found at
<http://skiron.control.ee.auth.gr/%7Ekehagias/thn/thn030.htm>http://skiron.
control.ee.auth.gr/~kehagias/thn/thn030.htm . In short the problem is this: a
time series is generated by alternate activation of two sources. The time
series is observed, but the source process is not. It is required to find the
source process.

MY QUESTION: Suppose SOME online filter is used to separate the data on two
classes, using incoming data to retrain the filter (and thus sharpen the
separation criterion). What can be said about the CONVERGENCE of this
filter to
correct data allocation (alias separation or segmentation) using only VERY
GENERAL assumptions about the filtering algorithm and the nature of the
sources?

MAIN RESULT: Here is the main result I have been so far able to obtain, stated
VERY INFORMALLY:

"THEOREM": Say that at time t the filter has accepted into its data pool
 M(t) samples from source1 and N(t) samples from source2.
 IF: the COMBINATION of: source1, source2, the data allocation criterion
 and the filter retraining algorithm  satisfy certain SEPARABILITY
 conditions,
 THEN:  as t goes to infinity,  the ratio M(t)/N(t) goes either to zero or to
 one, with probability one.

REMARK: This means that asymptotically, the filter will be trained on a data
pool which contains predominantly either source1 data or source2 data.

A bunch of (real) theorems along the above lines are stated and proved in Part
III of "PREDICTIVE MODULAR NEURAL NETWORKS"
(<http://skiron.control.ee.auth.gr/%7Ekehagias/thn/thn02b01.htm>http://skir
on.control.ee.auth.gr/~kehagias/thn/thn02b01.htm). Note that the
conditions
used in the above "Theorem" (as well as the true theorems) are quite general:
the separability conditions do not refer to a particular algorithm.

It follows that D. Ringach's remark (see below) is right to the point: some
kind of SEPARABILITY assumption is necessary.

SOME REMARKS BY OTHER PEOPLE: Some people responded by citing papers which use
specific algorithms to solve the above problem. For instance, J. Kohlmorgen
cited his own work, R. Rohwer mentioned the mixture model problem, , A.
Storkey
cited his own work and also papers by (a) Z. Ghahramani and G. Hinton, (b) A.
Weigend et al. S. Waterhouse mentioned a range of possibilities (mixture
models, mixtures of experts, HMMs, HM Trees) and proposed some possible
algorithmic approaches; he also referred me to his WEB page
(<http://www.ultimode.com/stevew/>http://www.ultimode.com/stevew/) for related
references. Finally, D. Ringach made a very significant observation, which I
reproduce here:

>I don't see how you can do anything unless you assume something about
>the sources...  Otherwise, how could you rule out the null hypothesis
>that is a single source with the measured distribution of y(i)?

MY RESPONSE: Most of the above people proposed specific ALGORITHMS to solve
the
problem. However, my orignal question was about:

 the data allocation CONVERGENCE problem, in an UNSUPERVISED ONLINE
 setting and at a GENERAL level (not related to specific algorithms).

Most of the pointers I got contained no convergence analysis. At any rate,
I am
familiar with a number of similar algorithms for which ALGORITHM SPECIFIC
convergence analyses ARE available (e.g. for the LVQ algorithm). I have
attempted to give a GENERAL analysis in the PAPERS and BOOK referred to in my
previous messages and in my home page
(<http://skiron.control.ee.auth.gr/%7Ekehagias/thn/thn.htm>http://skiron.co
ntrol.ee.auth.gr/~kehagias/thn/thn.htm). Also I present some relevant thoughts
in
(<http://skiron.control.ee.auth.gr/%7Ekehagias/thn/thn030.htm>http://skiron.
control.ee.auth.gr/~kehagias/thn/thn030.htm).

BIBLIOGRAPHY: Here are a FEW references about the data allocation problem
(alias segmentation or data separation problem). You can find a more complete
BIBTEX file at
<http://skiron.control.ee.auth.gr/%7Ekehagias/thn/thn0301.bib>http://skiron.
control.ee.auth.gr/~kehagias/thn/thn0301.bib . These references are mostly
about
algorithms to effect data allocation, either in an offline or online fashion;
usually (but not always) the question of convergence is not treated
theoretically.

I should stress that the list below, as well as the bib tex file at my home
site are  by no means an exhaustive bibliographic coverage. They are just
lists
of some papers I have read and enjoyed.

----------------------------------------------------------------------------
1. Gharahmani, Z.  and Hinton, G.E.: Switching State-Space Models. (1998).
Submitted for Publication.

2. Jordan, M.I.  and R. A. Jacobs``Hierarchical mixtures of experts and the EM
algorithm.'' . Neural Computation, 6, 181-214, 1994.

3. Jordan, M.I.  and L. Xu. ``Convergence results for the EM approach to
mixtures of experts architectures'' Neural Networks, 8, 1409-1431.

4. Kohlmorgen, J. and  M=FCller, K.-R. and Pawelzik, K. (1998), Analysis of
Drifting Dynamics with Neural Network Hidden Markov Models, in NIPS '97:
Advances in Neural Information Processing Systems 10, MIT Press, to appear in
1998.

5. Levin, E. "Hidden control neural architecture modeling of nonlinear time
varying
systems and its applications". IEEE Trans. on Neural Networks 4 (1993) pp.
109-116.

6. Pawelzik, K. and Kohlmorgen, J. and M=FCller, K.-R. (1996), Annealed
Competition of Experts for a Segmentation and Classification of Switching
Dynamics, Neural Computation 8, pp. 340-356.

7. Storkey, A.J. Gaussian Processes for switching regimes. ICANN98.

8. Waterhouse, S. and Tony Robinson. "Classification Using Hierarchical
Mixtures of Experts". Presented at IEEE Conference on Neural Networks and
Signal Processing, 1994

9. Waterhouse, Steve  and Tony Robinson. "Constructive Methods for Mixtures of
Experts" (UK Version).  Presented at NIPS: Neural Information Processing
Systems 1995.

10. Weigend, A.S. and  M. Mangeas and A. N. Srivastava (1995) Nonlinear gated
experts
for time series: discovering regimes and avoiding overfitting.

11. Xu, L. and M. I. Jordan``On convergence properties of the EM Algorithm for
Gaussian mixtures'' . Neural Computation, 8, 129-151, 1996.
___________________________________________________________________
Ath. Kehagias
--Assistant Prof. of Mathematics, American College of Thessaloniki
--Research Ass., Dept. of Electrical and Computer Eng. Aristotle Univ.,
Thessaloniki, GR54006, GREECE

--email:	 kehagias at egnatia.ee.auth.gr, kehagias at ac.anatolia.edu.gr
--web:	http://skiron.control.ee.auth.gr/~kehagias/index.htm