Connectionists: ConvNet's parameter sharing is fundamentally incompatible with processor memory co-localization

Rod Rinkus rod.rinkus at gmail.com
Mon Feb 8 18:30:13 EST 2016


Dear Connectionists,

Currently, the leading, and very successful, paradigm of machine
intelligence is Convolutional Networks (ConvNets), introduced by Yann LeCun.
As Dr. LeCun states in the Deep Learning Tutorial
<http://research.microsoft.com/apps/video/default.aspx?id=259574&r=1>
(Hinton, Bengio & LeCun) given about two months ago at NIPS, “Everyone uses
ConvNets” (at ~minute 49 of the talk).


At the same time, within the hardware community, it has long been
understood that most of the computational time / power used in executing
algorithms on the essentially ubiquitous Von Neumann computer model is
expended in moving data between memory and processor.   For this reason,
there is a tremendous imperative to build new architectures in which the
processor is physically co-localized with memory.  Clearly, whatever
algorithm(s) underlie natural intelligence, they run on brains which are
networks of neurons, each one of which is both processor and memory.  More
specifically, each synapse is essentially both processor and memory: it’s
memory because it retains (stably over potentially very long periods)
knowledge of the history of signals that it has mediated; it’s processor
because it effectively multiplies the signal being mediated by the
instantaneous weight.  A Memristor, for example, perfectly co-localizes
processor and memory.


The above two conditions compel me to submit for discussion what I believe
is a fundamental problem regarding the scalability of ConvNets.  Specifically,
the technique of “shared parameters”, upon which, I think most would
concur, ConvNets depend in order to scale to massive problems, is
*fundamentally* incompatible with processor-memory co-localization (PMC).


In the shared parameters technique, a single filter (kernel) is learned for
each feature map.  The sharing occurs essentially through *averaging the
gradients* computed for all the, say *U*×*V*, units comprising that map.  The
justification for learning a single filter for an entire input surface is
that, for natural input domains, the local statistics in any filter-scale
2D patch are highly similar across the entire input surface: e.g.,
virtually all patches of a visual input can approximately decomposed as
some small number of low-order visual features, e.g., oriented Gabors,
edges, etc.  Sharing parameters has two major benefits: a) it greatly
boosts the number of samples informing the learning of the filter, yielding
a better model; and b) it drastically reduces the number of parameters that
need to be learned (typically via stochastic gradient descent), which in
turn drastically reduces learning time.


However, averaging the gradients implies aggregating, at a central
location, information originating from multiple spatial locales (i.e., *U*×
*V* locales) on the input surface.  Suppose the kernel itself is an *X×Y*
array.  The *X×Y* values collected for the (0,0)^th position of the map are
physically distinct from those collected for the (*U*-1,*V-1*)^th position
(modulo overlap).  Even if we were to grant that the *X×Y* array positions
(i.e., “synaptic” weights) for the (0,0)^th position of the map were
represented by Memristors, they cannot be the same Memristors that
represent the weights for the (*U*-1,*V-1*)^th position, nor in general,
any of the other *U*×*V*‑1 map positions.  Thus, in order to do the
gradient averaging, there must be macroscopic movement of large amounts of
data, not simply between the processor and memory of any single
“node”, but *between
nodes* (or, from all *U*×*V *nodes to a central point).  There appears to
be no way around the fact that the shared (tied) parameters technique of
ConvNets entails massive movement of data.  Note that the situation
described here applies to every feature map learned at every level of a
network.


Thus we have the situation that: a) in order to scale to massive problem
sizes, ConvNets require parameter sharing; b) to achieve huge reductions in
the amount of time and power expended in computation, we need PMC; and c)
parameter sharing is incompatible with PMC.



In fact, Dr. LeCun acknowledges that sharing parameters entails a large
amount of data movement (at ~minute 46 of the talk, Slide 66 “Distributed
Learning”).  On that same slide, he references efforts to address this
issue, e.g., asynchronous stochastic gradient descent, but indicates that
substantial challenges remain.  There has been some exploration of “locally
connected” ConvNets (Gregor, Szlam, Lecun, 2011), i.e., ConvNets which do
not use parameter sharing.  However, the fact remains that without
parameter sharing, scalabilty of ConvNets to massive problems has to be
considered an open question.  For example, what would be the training time
on a larger benchmark like ImageNet, without parameter sharing?


In fact, the brain is, of course, locally connected.  That would seem to
make it overwhelmingly likely that the actual algorithm of intelligence
requires local connectivity. The existence proof that is the brain,
therefore, constitutes another serious challenge to ConvNets.  It is clear
that ConvNets have risen to the forefront of machine intelligence over the
past decade.  But that preeminent stature only makes the point I raise that
much more important.  If, as I’ve claimed, there is fundamentally no way to
reconcile parameter sharing with PMC, there could be massive economic
implications, e.g., regarding future hardware.


I hope this post stimulates thought on this matter and leads to a lively
discussion.


Sincerely,
Rod Rinkus
-- 
Gerard (Rod) Rinkus, PhD
President,
rod at neurithmicsystems dot com
Neurithmic Systems LLC <http://sparsey.com>
275 Grove Street, Suite 2-400
Newton, MA 02466
617-997-6272

Visiting Scientist, Lisman Lab
Volen Center for Complex Systems
Brandeis University, Waltham, MA
grinkus at brandeis dot edu
http://people.brandeis.edu/~grinkus/
<http://people.brandeis.edu/%7Egrinkus/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/connectionists/attachments/20160208/411c877d/attachment.html>


More information about the Connectionists mailing list