Connectionists: how the brain works? (UNCLASSIFIED)

Mon Apr 14 06:51:46 EDT 2014

Dear all,

   It has been very interesting to follow the discussion on the 
functioning of ART, stability-plasticity dilemma and the related issues. 
In that context, I would like to point to an exciting property of the 
practopoietic theory, which enables us to understand what is needed for 
a general solution to the problems similar to the stability-plasticity 
dilemma.

The issue of stability-plasticity dilemma can be described as a problem 
of deciding when a new category of a stimulus needs to be created and 
the system has to be adjusted as opposed to deciding to treat the 
stimulus as old and familiar and thus, not needing to adjust. 
Practopoietic theory helps us understand how  a general solution can be 
implemented for deciding whether to use old types of behavior or to come 
up with new ones. This is possible in a so-called "T_3 system" in which 
a process called "anapoiesis" takes place. When a system is organized in 
such a T_3 way, every stimulus, old or new, is treated in the same 
fashion, i.e., as new. The system always adjusts--to everything(!)--even 
to stimuli that have been seen thousands of times. There is never a 
simple direct categorization (or pattern recognition) in which a 
mathematical mapping would take place from input vectors to output 
vectors, as traditionally implemented in multi-layer neural networks.

Rather the system readjusts itself continuously to prepare for 
interactions with the surrounding world. The only simple input-output 
mappings that take place are the sensory-motor loops that execute the 
actual behavior. The internal processes corresponding to perception, 
recognition, categorization etc. are implemented by the mechanisms of 
internal system adjustments (based on anapoiesis). These mechanisms 
create new sensory-motor loops, which are then most similar to the 
traditional mapping operations. The difference between old and new 
stimuli (i.e., familiar and unfamiliar) is detectable in the behavior of 
the system only because the system adjusts quicker to the older that to 
the newer stimuli.

The claimed advantage of such a T_3 practopoietic system is that only 
such a system can become generally intelligent as a whole and behave 
adaptively and consciously with understanding of what is going on 
around; The system forms a general "adjustment machine" that can become 
aware of its surroundings and can be capable of interpreting the 
situation appropriately to decide on the next action. Thus, the 
perceptual dilemma of stability vs. plasticity is converted into a 
general understanding of the current situation and the needs of the 
system. If the current goals of the system requires treating a slightly 
novel stimulus as new, it will be treated as "new". However, if a slight 
change in the stimulus features does not make a difference for the 
current goals and the situation, than the stimulus will be treated as "old".

Importantly, practopoietic theory is not formulated in terms of neurons 
(inhibition, excitation, connections, changes of synaptic weights, 
etc.). Instead, the theory is formulated much more elegantly--in terms 
of interactions between cybernetic control mechanisms organized into a 
specific type of hierarchy (poietic hierarchy). This abstract 
formulation is extremely helpful for two reasons. First, it enables one 
to focus on the most important functional aspects and thus, to 
understand much easier the underlying principles of system operations. 
Second, it tells us what is needed to create intelligent behavior using 
any type of implementation, neuronal or non-neuronal.

I hope this will be motivating enough to give practopoiesis a read.

With best regards,

Danko

Link:
http://www.danko-nikolic.com/practopoiesis/

On 4/11/14 2:42 AM, Tsvi Achler wrote:
> I can't comment on most of this, but I am not sure if all models of 
> sparsity and sparse coding fall into the connectionist realm either 
> because some make statistical assumptions.
> -Tsvi
>
>
> On Tue, Apr 8, 2014 at 9:19 PM, Juyang Weng <weng at cse.msu.edu 
> <mailto:weng at cse.msu.edu>> wrote:
>
>     Tavi:
>
>     Let me explain a little more detail:
>
>     There are two large categories of biological neurons, excitatory
>     and inhibitory.   Both are developed through mainly signal
>     statistics,
>     not specified primarily by the genomes.   Not all people agree
>     with my this point, but please tolerate my this view for now.
>     I gave a more detailed discussion on this view in my NAI book.
>
>     The main effect of inhibitory connections is to reduce the number
>     of firing neurons (David Field called it sparse coding), with the
>     help of
>     excitatory connections.  This sparse coding is important because
>     those do not fire are long term memory of the area at this point
>     of time.
>     My this view is different from David Field.  He wrote that sparse
>     coding is for the current representations.  I think sparse coding is
>     necessary for long-term memory. Not all people agree with my this
>     point, but please tolerate my this view for now.
>
>     However, this reduction requires very fast parallel neuronal
>     updates to avoid uncontrollable large-magnitude oscillations.
>     Even with the fast biological parallel neuronal updates, we still
>     see slow but small-magnitude oscillations such as the
>     well-known theta waves and alpha waves.   My view is that such
>     slow but small-magnitude oscillations are side effects of
>     excitatory and inhibitory connections that form many loops, not
>     something really desirable for the brain operation (sorry,
>     Paul Werbos).  Not all people agree with my this point, but please
>     tolerate my this view for now.
>
>     Therefore, as far as I understand, all computer simulations for
>     spiking neurons are not showing major brain functions
>     because they have to deal with the slow oscillations that are very
>     different from the brain's, e.g., as Dr. Henry Markram reported
>     (40Hz?).
>
>     The above discussion again shows the power and necessity of an
>     overarching brain theory like that in my NAI book.
>     Those who only simulate biological neurons using superficial
>     biological phenomena are not going to demonstrate
>     any major brain functions.  They can talk about signal statistics
>     from their simulations, but signal statistics are far from brain
>     functions.
>
>     -John
>
>
>     On 4/8/14 1:30 AM, Tsvi Achler wrote:
>>     Hi John,
>>     ART evaluates distance between the contending representation and
>>     the current input through vigilance.  If they are too far apart,
>>     a poor vigilance signal will be triggered.
>>     The best resonance will be achieved when they have the least
>>     amount of distance.
>>     If in your model, K-nearest neighbors is used without a neural
>>     equivalent, then your model is not quite in the spirit of a
>>     connectionist model.
>>     For example, Bayesian networks do a great job emulating brain
>>     behavior, modeling the integration of priors. and has been
>>     invaluable to model cognitive studies.  However they assume a
>>     statistical configuration of connections and distributions which
>>     is not quite known how to emulate with neurons.  Thus pure
>>     Bayesian models are also questionable in terms of connectionist
>>     modeling.  But some connectionist models can emulate some
>>     statistical models for example see section 2.4  in Thomas &
>>     McClelland's chapter in Sun's 2008 book
>>     (http://www.psyc.bbk.ac.uk/people/academic/thomas_m/TM_Cambridge_sub.pdf).
>>     I am not suggesting Hodgkin-Huxley level detailed neuron models,
>>     however connectionist models should have their connections
>>     explicitly defined.
>>     Sincerely,
>>     -Tsvi
>>
>>
>>
>>     On Mon, Apr 7, 2014 at 10:58 AM, Juyang Weng <weng at cse.msu.edu
>>     <mailto:weng at cse.msu.edu>> wrote:
>>
>>         Tsvi,
>>
>>         Note that ART uses a vigilance value to pick up the first
>>         "acceptable" match in its sequential bottom-up and top-down
>>         search.
>>         I believe that was Steve meant when he mentioned vigilance.
>>
>>         Why do you think "ART as a neural way to implement a
>>         K-nearest neighbor algorithm"?
>>         If not all the neighbors have sequentially participated,
>>         how can ART find the nearest neighbor, let alone K-nearest
>>         neighbor?
>>
>>         Our DN uses an explicit k-nearest mechanism to find the
>>         k-nearest neighbors in every network update,
>>         to avoid the problems of slow resonance in existing models of
>>         spiking neuronal networks.
>>         The explicit k-nearest mechanism itself is not meant to be
>>         biologically plausible,
>>         but it gives a computational advantage for software
>>         simulation of large networks
>>         at a speed slower than 1000 network updates per second.
>>
>>         I guess that more detailed molecular simulations of
>>         individual neuronal spikes (such as using the Hodgkin-Huxley
>>         model of
>>         a neuron, using the NEURON software,
>>         <http://www.neuron.yale.edu/neuron/> or like the Blue Brain
>>         project <http://bluebrain.epfl.ch/> directed by respected Dr.
>>         Henry Markram)
>>         are very useful for showing some detailed molecular,
>>         synaptic, and neuronal properties.
>>         However, they miss necessary brain-system-level mechanisms so
>>         much that it is difficult for them
>>         to show major brain-scale functions
>>         (such as learning to recognize objects and detection of
>>         natural objects directly from natural cluttered scenes).
>>
>>         According to my understanding, if one uses a detailed
>>         neuronal model for each of a variety of neuronal types and
>>         connects those simulated neurons of different types according
>>         to a diagram of Brodmann areas,
>>         his simulation is NOT going to lead to any major brain function.
>>         He still needs brain-system-level knowledge such as that
>>         taught in the BMI 871 course.
>>
>>         -John
>>
>>         On 4/7/14 8:07 AM, Tsvi Achler wrote:
>>>         Dear Steve, John
>>>         I think such discussions are great to spark interests in
>>>         feedback (output back to input) such models which I feel
>>>         should be given much more attention.
>>>         In this vein it may be better to discuss more of the details
>>>         here than to suggest to read a reference.
>>>
>>>         Basically I see ART as a neural way to implement a K-nearest
>>>         neighbor algorithm.  Clearly the way ART overcomes the
>>>         neural hurdles is immense especially in figuring out how to
>>>         coordinate neurons.  However it is also important to
>>>         summarize such methods in algorithmic terms  which I attempt
>>>         to do here (and please comment/correct).
>>>         Instar learning is used to find the best weights for quick
>>>         feedforward recognition without too much resonance
>>>         (otherwise more resonance will be needed).  Outstar learning
>>>         is used to find the expectation of the patterns.  The
>>>         resonance mechanism evaluates distances between the
>>>         "neighbors" evaluating how close differing outputs are to
>>>         the input pattern (using the expectation).  By choosing one
>>>         winner the network is equivalent to a 1-nearest neighbor
>>>         model.  If you open it up to more winners (eg k winners) as
>>>         you suggest  then it becomes a k-nearest neighbor mechanism.
>>>
>>>         Clearly I focused here on the main ART modules and did not
>>>         discuss other additions.  But I want to just focus on the
>>>         main idea at this point.
>>>         Sincerely,
>>>         -Tsvi
>>>
>>>
>>>         On Sun, Apr 6, 2014 at 1:30 PM, Stephen Grossberg
>>>         <steve at cns.bu.edu <mailto:steve at cns.bu.edu>> wrote:
>>>
>>>             Dear John,
>>>
>>>             Thanks for your questions. I reply below.
>>>
>>>             On Apr 5, 2014, at 10:51 AM, Juyang Weng wrote:
>>>
>>>>             Dear Steve,
>>>>
>>>>             This is one of my long-time questions that I did not
>>>>             have a chance to ask you when I met you many times before.
>>>>             But they may be useful for some people on this list.
>>>>             Please accept my apology of my question implies any
>>>>             false impression that I did not intend.
>>>>
>>>>             (1) Your statement below seems to have confirmed my
>>>>             understanding:
>>>>             Your top-down process in ART in the late 1990's is
>>>>             basically for finding an acceptable match
>>>>             between the input feature vector and the stored feature
>>>>             vectors represented by neurons (not meant for the
>>>>             nearest match).
>>>
>>>             ART has developed a lot since the 1990s. A non-technical
>>>             but fairly comprehensive review article was published in
>>>             2012 in /Neural Networks/ and can be found at
>>>             http://cns.bu.edu/~steve/ART.pdf
>>>             <http://cns.bu.edu/%7Esteve/ART.pdf>.
>>>
>>>             I do not think about the top-down process in ART in
>>>             quite the way that you state above. My reason for this
>>>             is summarized by the acronym CLEARS for the processes of
>>>             Consciousness, Learning, Expectation, Attention,
>>>             Resonance, and Synchrony. All the CLEARS processes come
>>>             into this story, and ART top-down mechanisms contribute
>>>             to all of them. For me, the most fundamental issues
>>>             concern how ART dynamically self-stabilizes the memories
>>>             that are learned within the model's bottom-up adaptive
>>>             filters and top-down expectations.
>>>
>>>             In particular, during learning, a big enough mismatch
>>>             can lead to hypothesis testing and search for a new, or
>>>             previously learned, category that leads to an acceptable
>>>             match. The criterion for what is "big enough mismatch"
>>>             or "acceptable match" is regulated by a vigilance
>>>             parameter that can itself vary in a state-dependent way.
>>>
>>>             After learning occurs, a bottom-up input pattern
>>>             typically directly selects the best-matching category,
>>>             without any hypothesis testing or search. And even if
>>>             there is a reset due to a large initial mismatch with a
>>>             previously active category, a single reset event may
>>>             lead directly to a matching category that can directly
>>>             resonate with the data.
>>>
>>>             I should note that all of the foundational predictions
>>>             of ART now have substantial bodies of psychological and
>>>             neurobiological data to support them. See the review
>>>             article if you would like to read about them.
>>>
>>>>             The currently active neuron is the one being examined
>>>>             by the top down process
>>>
>>>             I'm not sure what you mean by "being examined", but
>>>             perhaps my comment above may deal with it.
>>>
>>>             I should comment, though, about your use of the word
>>>             "currently active neuron". I assume that you mean at the
>>>             category level.
>>>
>>>             In this regard, there are two ART's. The first aspect of
>>>             ART is as a cognitive and neural theory whose scope,
>>>             which includes perceptual, cognitive, and adaptively
>>>             timed cognitive-emotional dynamics, among other
>>>             processes, is illustrated by the above referenced 2012
>>>             review article in /Neural Networks/. In the biological
>>>             theory, there is no general commitment to just one
>>>             "currently active neuron". One always considers the
>>>             neuronal population, or populations, that represent a
>>>             learned category. Sometimes, but not always, a
>>>             winner-take-all category is chosen.
>>>
>>>             The 2012 review article illustrates some of the large
>>>             data bases of psychological and neurobiological data
>>>             that have been explained in a principled way,
>>>             quantitatively simulated, and successfully predicted by
>>>             ART over a period of decades. ART-like processing is,
>>>             however, certainly not the only kind of computation that
>>>             may be needed to understand how the brain works. The
>>>             paradigm called Complementary Computing that I
>>>             introduced awhile ago makes precise the sense in which
>>>             ART may be just one kind of dynamics supported by
>>>             advanced brains. This is also summarized in the review
>>>             article.
>>>
>>>             The second aspect of ART is as a series of algorithms
>>>             that mathematically characterize key ART design
>>>             principles and mechanisms in a focused setting, and
>>>             provide algorithms for large-scale applications in
>>>             engineering and technology. ARTMAP, fuzzy ARTMAP, and
>>>             distributed ARTMAP are among these, all of them
>>>             developed with Gail Carpenter. Some of these algorithms
>>>             use winner-take-all categories to enable the proof of
>>>             mathematical theorems that characterize how underlying
>>>             design principles work. In contrast, the distributed
>>>             ARTMAP family of algorithms, developed by Gail Carpenter
>>>             and her colleagues, allows for distributed category
>>>             representations without losing the benefits of fast,
>>>             incremental, self-stabilizing learning and prediction in
>>>             response to a large non-stationary databases that can
>>>             include many unexpected events.
>>>
>>>             See, e.g.,
>>>             http://techlab.bu.edu/members/gail/articles/115_dART_NN_1997_.pdf
>>>             and
>>>             http://techlab.bu.edu/members/gail/articles/155_Fusion2008_CarpenterRavindran.pdf.
>>>
>>>             I should note that FAST learning is a technical concept:
>>>             it means that each adaptive weight can converge to its
>>>             new equilibrium value on EACH learning trial. That is
>>>             why ART algorithms can often successfully carry out
>>>             one-trial incremental learning of a data base. This is
>>>             not true of many other algorithms, such as back
>>>             propagation, simulated annealing, and the like, which
>>>             all experience catastrophic forgetting if they try to do
>>>             fast learning. Almost all other learning algorithms need
>>>             to be run using slow learning, that allows only a small
>>>             increment in the values of adaptive weights on each
>>>             learning trial, to avoid massive memory instabilities,
>>>             and work best in response to stationary data. Such
>>>             algorithms often fail to detect important rare cases,
>>>             among other limitations. ART can provably learn in
>>>             either the fast or slow mode in response to
>>>             non-stationary data.
>>>
>>>>             in a sequential fashion: one neuron after another,
>>>>             until an acceptable neuron is found.
>>>>
>>>>             (2) The input to the ART in the late 1990's is for a
>>>>             single feature vector as a monolithic input.
>>>>             By monolithic, I mean that all neurons take the entire
>>>>             input feature vector as input.
>>>>             I raise this point here because neuron in ART in the
>>>>             late 1990's does not have an explicit local sensory
>>>>             receptive field (SRF),
>>>>             i.e., are fully connected from all components of the
>>>>             input vector.   A local SRF means that each neuron is
>>>>             only connected to a small region
>>>>             in an input image.
>>>
>>>             Various ART algorithms for technology do use fully
>>>             connected networks. They represent a single-channel
>>>             case, which is often sufficient in applications and
>>>             which simplifies mathematical proofs. However, the
>>>             single-channel case is, as its name suggests, not a
>>>             necessary constraint on ART design.
>>>
>>>             In addition, many ART biological models do not restrict
>>>             themselves to the single-channel case, and do have
>>>             receptive fields. These include the LAMINART family of
>>>             models that predict functional roles for many identified
>>>             cell types in the laminar circuits of cerebral cortex.
>>>             These models illustrate how variations of a shared
>>>             laminar circuit design can carry out very different
>>>             intelligent functions, such as 3D vision (e.g., 3D
>>>             LAMINART), speech and language (e.g., cARTWORD), and
>>>             cognitive information processing (e.g., LIST PARSE).
>>>             They are all summarized in the 2012 review article, with
>>>             the archival articles themselves on my web page
>>>             http://cns.bu.edu/~steve <http://cns.bu.edu/%7Esteve>.
>>>
>>>             The existence of these laminar variations-on-a-theme
>>>             provides an existence proof for the exciting goal of
>>>             designing a family of chips whose specializations can
>>>             realize all aspects of higher intelligence, and which
>>>             can be consistently connected because they all share a
>>>             similar underlying design. Work on achieving this goal
>>>             can productively occupy lots of creative modelers and
>>>             technologists for many years to come.
>>>
>>>             I hope that the above replies provide some relevant
>>>             information, as well as pointers for finding more.
>>>
>>>             Best,
>>>
>>>             Steve
>>>
>>>
>>>
>>>>
>>>>             My apology again if my understanding above has errors
>>>>             although I have examined the above two points carefully
>>>>             through multiple your papers.
>>>>
>>>>             Best regards,
>>>>
>>>>             -John
>>>>
>>>>             Juyang (John) Weng, Professor
>>>>             Department of Computer Science and Engineering
>>>>             MSU Cognitive Science Program and MSU Neuroscience Program
>>>>             428 S Shaw Ln Rm 3115
>>>>             Michigan State University
>>>>             East Lansing, MI 48824 USA
>>>>             Tel:517-353-4388  <tel:517-353-4388>
>>>>             Fax:517-432-1061  <tel:517-432-1061>
>>>>             Email:weng at cse.msu.edu  <mailto:weng at cse.msu.edu>
>>>>             URL:http://www.cse.msu.edu/~weng/  <http://www.cse.msu.edu/%7Eweng/>
>>>>             ----------------------------------------------
>>>>
>>>
>>>             Stephen Grossberg
>>>             Wang Professor of Cognitive and Neural Systems
>>>             Professor of Mathematics, Psychology, and Biomedical
>>>             Engineering
>>>             Director, Center for Adaptive Systems
>>>             http://www.cns.bu.edu/about/cas.html
>>>             http://cns.bu.edu/~steve <http://cns.bu.edu/%7Esteve>
>>>             steve at bu.edu <mailto:steve at bu.edu>
>>>
>>>
>>>
>>>
>>>
>>
>>         -- 
>>         --
>>         Juyang (John) Weng, Professor
>>         Department of Computer Science and Engineering
>>         MSU Cognitive Science Program and MSU Neuroscience Program
>>         428 S Shaw Ln Rm 3115
>>         Michigan State University
>>         East Lansing, MI 48824 USA
>>         Tel:517-353-4388  <tel:517-353-4388>
>>         Fax: 517-432-1061 <tel:517-432-1061> Email: weng at cse.msu.edu
>>         <mailto:weng at cse.msu.edu> URL: http://www.cse.msu.edu/~weng/
>>         <http://www.cse.msu.edu/%7Eweng/>
>>         ----------------------------------------------
>>
>>
>
>     -- 
>     --
>     Juyang (John) Weng, Professor
>     Department of Computer Science and Engineering
>     MSU Cognitive Science Program and MSU Neuroscience Program
>     428 S Shaw Ln Rm 3115
>     Michigan State University
>     East Lansing, MI 48824 USA
>     Tel:517-353-4388  <tel:517-353-4388>
>     Fax:517-432-1061  <tel:517-432-1061>
>     Email:weng at cse.msu.edu  <mailto:weng at cse.msu.edu>
>     URL:http://www.cse.msu.edu/~weng/  <http://www.cse.msu.edu/%7Eweng/>
>     ----------------------------------------------
>
>

-- 
Prof. Dr. Danko Nikolic'

Web:
http://www.danko-nikolic.com

Mail address 1:
Department of Neurophysiology
Max Planck Institut for Brain Research
Deutschordenstr. 46
60528 Frankfurt am Main
GERMANY

Mail address 2:
Frankfurt Institute for Advanced Studies
Wolfgang Goethe University
Ruth-Moufang-Str. 1
60433 Frankfurt am Main
GERMANY

----------------------------
Office: (..49-69) 96769-736
Lab: (..49-69) 96769-209
Fax: (..49-69) 96769-327
danko.nikolic at gmail.com
----------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/connectionists/attachments/20140414/0a9bc580/attachment.html>