<div dir="ltr"><div><br class="gmail-Apple-interchange-newline">Salience is a much more fundamental phenomena within recognition than the spotlight attention type map suggested by Itti et al and Treisman et al 1980 (the cognitive psychology-equivalent reference).</div><div>It is also integrated into non-spatial modalities and occurs even when the display is too fast to form an attention map in fast-masking experiments eg (Francis & Cho 2008).</div><div>It occurs from a bottom up (through input interactions) way before there is a chance to select a spatial region focus and is a source of "pop-out". Salience is associated with a signal-to-noise ratio during processing which can be measured by the speed of processing given different inputs.</div><div>These effects of salience can be measured both in spatial processing and by reaction times and errors in humans given fast stimuli.  Salience kicks in immediately while processing information so it is an integral part of processing, not an attention spatial filter after-effect as hypothesized in the old cognitive and not very much updated current neural network literatures.</div><div><br></div><div>Pop-out and difficulty with similarity (Duncan & Humphreys 1989; Wolfe 2001) which are analogous signal-to-noise effects (Rosenholtz 2001) are observed in non-visual modalities with poor spatial resolution such as olfaction (e.g. Rinberg et al 2006).</div><div><div>Salience seems generated “on-the-fly” as an inseparable part of recognition mechanisms and thus my opinion (and computational findings) are that top-down connections back to inputs are very important for all recognition (not an aftereffect of spatial attention).</div></div><div><br></div><div>Here are some references I have accumulated over my studies:</div><div>Francis, G. & Cho, Y. (2008). Effects of temporal integration on the shape of visual backward masking functions. Journal of Experimental Psychology: Human Perception & Performance, 34, 1116-1128.<br>Treisman, A.M. and G. Gelade, A feature-integration theory of attention. Cogn Psychol, 1980. 12(1): p. 97-136.<br>Macknik S. L., Martinez-Conde S. (2007). The role of feedback in visual masking and visual processing. Advances in Cognitive Psychology, 3, 125–152.<br>Enns, J.T., & Di Lollo, V. (1997). Object substitution: A new form of visual masking in unattended visual locations. Psychological Science, 8, 135-139.<br>Duncan, J. and G. W. Humphreys (1989). "Visual-Search and Stimulus Similarity." Psychological Review 96(3): 433-458.<br>Breitmeyer, B. G., & Öğmen, H. (2006). Visual Masking: Time Slices Through Conscious and Unconscious Vision. Oxford: Oxford University Press.<br>Bichot, N. P., A. F. Rossi, et al. (2005). "Parallel and serial neural mechanisms for visual search in macaque area V4." Science 308(5721): 529-34.<br>Wolfe, J.M., Asymmetries in visual search: An introduction. Perception & Psychophysics, 2001. 63(3): p. 381-389.<br></div><div>Rinberg D, Koulakov A, Gelperin A (2006) Speed accuracy tradeoff in olfaction. Neuron, 51(3), pp.351-358<br>Rosenholtz R (2001) Search asymmetries? What search asymmetries? Perception & Psychophysics, 63(3), 476-489<br></div><div><br></div><div>P.S. In order not to offend as much (but dont worry I believe every field deserves criticisms) I have put my opinion about the state of the field here after the references.</div><div>I find the neural network community is stuck with 1950's feedforward neurons and 1980's attention mechanisms and its associated computer science community is stuck using data sets and paradigms that promote feedforward methods but are unrealistic paradigms for real life environments.</div><div>The computational neuroscience community is also generally bogged down with a large number of parameters but additionally with statistical models (not really connectionist) with predominantly feedforward and lateral inhibition structures.</div><div>The cognitive community sits on the most interesting data but is also stuck with (either) overparameterized rate models or abstract non-computational models.  </div><div>The cognitive community is more open to feedback back to inputs but trying to publish or get funds by doing something that covers all three communities gets bogged down by sometimes conflicting requirements, nomenclature and politics in each one.  In my opinion and experience this is why there is little progress even if there are new ideas.</div><div>Thus brain science progress suffers a lot because of these separations.</div><div> -Tsvi</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Feb 16, 2022 at 9:39 AM Brad Wyble <<a href="mailto:bwyble@gmail.com">bwyble@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div>Hi Balazs, <br></div></div></div></blockquote><div> </div><div>You wrote:</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div>That is a very interesting question and I would love to know more about the reconciliation of the two views. From what I understand, saliency in cognitive science is dependent on both 1) the scene represented by pixels (or other sensors) and 2) the state of mind of the perceiver (focus, goal, memory, etc.). Whereas the current paradigm in computer vision seems to me that perception is bottom up, the "true" salience of various image parts are a function of the image, and the goal is to learn it from examples. Furthermore, it seems to me that there is a consensus that salience detection is pre-inferential, so it cannot be learned in the classical supervised way: to select and label the data to learn salience, one would need to have the very faculty that determines salience, leading to a loop.</div><div><br></div><div>I'm very cautious on all this since it's far from my main expertise, so my aim is to ask for information rather than to state anything with certainty. I'm reading all these discussions with a lot of interest, I find that this channel has a space between twitter and formal scientific papers.<br></div><div><br></div></div></div></blockquote><div><br></div><div>Very good point and it's absolutely true that computational approaches to salience are a shallow version of how humans compute salience.  A great example I like to use is that if you show someone a picture with a Sun in it, noone looks at the sun, regardless of how salient it is according Itti-et al. 1998.  We incorporate meaning into our assessment of what is important, and this controls even the very first eye movements in response to viewing a new visual scene. </div><div><br></div><div>However, my point was that using NN's to compute salience is a very active area of research with a wide variety of approaches being used, including more recently the involvement of meaning.  Recent work is starting to tease apart what recent approaches to salience are missing, e.g.</div><div><br></div><div><a href="https://www.nature.com/articles/s41598-021-97879-z#:~:text=Deep%20saliency%20models%20represent%20the,look%20in%20real%2Dworld%20scenes.&text=We%20found%20that%20all%20three,feature%20weightings%20and%20interaction%20patterns" target="_blank">https://www.nature.com/articles/s41598-021-97879-z#:~:text=Deep%20saliency%20models%20represent%20the,look%20in%20real%2Dworld%20scenes.&text=We%20found%20that%20all%20three,feature%20weightings%20and%20interaction%20patterns</a>.<br></div><div><br></div><div>So while these approaches are still far from getting it right (just like the rest of AI), I just wanted to highlight that there is a lot of work in active progress.</div><div><br></div><div>Thanks!</div><div>-Brad</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div> </div></div><div><br></div>-- <br><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr">Brad Wyble<br>Associate Professor<span style="font-size:12.8px"> </span><br>Psychology Department<br>Penn State University<div><br></div><div><a href="http://wyblelab.com" target="_blank">http://wyblelab.com</a></div></div></div></div></div></div></div></div></div></div>

</blockquote></div>