Connectionists: three articles on invariant object learning and recognition

Stephen Grossberg steve at cns.bu.edu
Tue May 3 23:23:13 EDT 2011


The following articles are now available at http://cns.bu.edu/~steve:

On the road to invariant recognition: How cortical area V2 transforms absolute into relative disparity during 3D vision
Grossberg, S., Srinivasan, K., and Yazdanbakhsh, A.

Abstract:
Invariant recognition of objects depends on a hierarchy of cortical stages that build invariance gradually. Binocular disparity computations are a key part of this transformation. Cortical area V1 computes absolute disparity, which is the horizontal difference in retinal location of an image in left and right foveas.  Many cells in cortical area V2 compute relative disparity, which is the difference in absolute disparity of two visible features.  Relative, but not absolute, disparity is invariant under both a disparity change across a scene and vergence eye movements. A neural network model is introduced which predicts that shunting lateral inhibition of disparity-sensitive layer 4 cells in V2 causes a peak shift in cell responses that transforms absolute disparity from V1 into relative disparity in V2.  This inhibitory circuit has previously been implicated in contrast gain control, divisive normalization, selection of perceptual groupings, and attentional focusing. The model hereby links relative disparity to other visual functions and thereby suggests new ways to test its mechanistic basis.  Other brain circuits are reviewed wherein lateral inhibition causes a peak shift that influences behavioral responses.
**********************************************************************************************************************************************
On the road to invariant recognition: Explaining tradeoff and morph properties of cells in inferotemporal cortex using multiple-scale task-sensitive attentive learning
Grossberg, S., Markowitz, J., and Cao, Y.

Abstract:
Visual object recognition is an essential accomplishment of advanced brains. Object recognition needs to be tolerant, or invariant, with respect to changes in object position, size, and view. In monkeys and humans, a key area for recognition is the anterior inferotemporal cortex (ITa). Recent neurophysiological data show that ITa cells with high object selectivity often have low position tolerance. We propose a neural model whose cells learn to simulate this tradeoff, as well as ITa responses to image morphs, while explaining how invariant recognition properties may arise in stages due to processes across multiple cortical areas. These processes include the cortical magnification factor, multiple receptive field sizes, and top-down attentive matching and learning properties that may be tuned by task requirements to attend to either concrete or abstract visual features with different levels of vigilance. The model predicts that data from the tradeoff and image morph tasks emerge from different levels of vigilance in the animals performing them. This result illustrates how different vigilance requirements of a task may change the course of category learning, notably the critical features that are attended and incorporated into learned category prototypes. The model outlines a path for developing an animal model of how defective vigilance control can lead to symptoms of various mental disorders, such as autism and amnesia.
**********************************************************************************************************************************************
How does the brain rapidly learn and reorganize view- and positionally-invariate object representations in inferior temporal cortex?
Cao, Y., Grossberg, S., and Markowitz, J.

Abstract:
All primates depend for their survival on being able to rapidly learn about and recognize objects. Objects may be visually detected at multiple positions, sizes, and viewpoints. How does the brain rapidly learn and recognize objects while scanning a scene with eye movements, without causing a combinatorial explosion in the number of cells that are needed? How does the brain avoid the problem of erroneously classifying parts of different objects together at the same or different positions in a visual scene? In monkeys and humans, a key area for such invariant object category learning and recognition is the inferotemporal cortex (IT). A neural model is proposed to explain how spatial and object attention coordinate the ability of IT to learn invariant category representations of objects that are seen at multiple positions, sizes, and viewpoints. The model clarifies how interactions within a hierarchy of processing stages in the visual brain accomplish this. These stages include retina, lateral geniculate nucleus, and cortical areas V1, V2, V4, and IT in the brain's What cortical stream, as they interact with spatial attention processes within the parietal cortex of the Where cortical stream. The model builds upon the ARTSCAN model, which proposed how view-invariant object representations are generated. The pARTSCAN model proposes how the following additional processes in the What cortical processing stream also enable positionally-invariant object representations to be learned:  IT cells with persistent activity, and a combination of normalizing object category competition and a view-to-object learning law which together ensure that unambiguous views have a larger effect on object recognition than ambiguous views. The model explains how such invariant learning can be fooled when monkeys, or other primates, are presented with an object that is swapped with another object during eye movements to foveate the original object. The swapping procedure is predicted to prevent the reset of spatial attention, which would otherwise keep the representations of multiple objects from being combined by learning. Li & DiCarlo (2008) have presented neurophysiological data from monkeys showing how unsupervised natural experience in a target swapping experiment can rapidly alter object representations in IT. The model quantitatively simulates the swapping data by showing how the swapping procedure fools the spatial attention mechanism. More generally, the model provides a unifying framework, and testable predictions in both monkeys and humans, for understanding object learning data using neurophysiological methods in monkeys and spatial attention, episodic learning, and memory retrieval data using functional imaging methods in humans.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.srv.cs.cmu.edu/mailman/private/connectionists/attachments/20110503/93520574/attachment.html


More information about the Connectionists mailing list