<div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div>Hi Balazs, <br></div></div></div></blockquote><div> </div><div>You wrote:</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div>That is a very interesting question and I would love to know more about the reconciliation of the two views. From what I understand, saliency in cognitive science is dependent on both 1) the scene represented by pixels (or other sensors) and 2) the state of mind of the perceiver (focus, goal, memory, etc.). Whereas the current paradigm in computer vision seems to me that perception is bottom up, the "true" salience of various image parts are a function of the image, and the goal is to learn it from examples. Furthermore, it seems to me that there is a consensus that salience detection is pre-inferential, so it cannot be learned in the classical supervised way: to select and label the data to learn salience, one would need to have the very faculty that determines salience, leading to a loop.</div><div><br></div><div>I'm very cautious on all this since it's far from my main expertise, so my aim is to ask for information rather than to state anything with certainty. I'm reading all these discussions with a lot of interest, I find that this channel has a space between twitter and formal scientific papers.<br></div><div><br></div></div></div></blockquote><div><br></div><div>Very good point and it's absolutely true that computational approaches to salience are a shallow version of how humans compute salience.  A great example I like to use is that if you show someone a picture with a Sun in it, noone looks at the sun, regardless of how salient it is according Itti-et al. 1998.  We incorporate meaning into our assessment of what is important, and this controls even the very first eye movements in response to viewing a new visual scene. </div><div><br></div><div>However, my point was that using NN's to compute salience is a very active area of research with a wide variety of approaches being used, including more recently the involvement of meaning.  Recent work is starting to tease apart what recent approaches to salience are missing, e.g.</div><div><br></div><div><a href="https://www.nature.com/articles/s41598-021-97879-z#:~:text=Deep%20saliency%20models%20represent%20the,look%20in%20real%2Dworld%20scenes.&text=We%20found%20that%20all%20three,feature%20weightings%20and%20interaction%20patterns">https://www.nature.com/articles/s41598-021-97879-z#:~:text=Deep%20saliency%20models%20represent%20the,look%20in%20real%2Dworld%20scenes.&text=We%20found%20that%20all%20three,feature%20weightings%20and%20interaction%20patterns</a>.<br></div><div><br></div><div>So while these approaches are still far from getting it right (just like the rest of AI), I just wanted to highlight that there is a lot of work in active progress.</div><div><br></div><div>Thanks!</div><div>-Brad</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div> </div></div><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr">Brad Wyble<br>Associate Professor<span style="font-size:12.8px"> </span><br>Psychology Department<br>Penn State University<div><br></div><div><a href="http://wyblelab.com" target="_blank">http://wyblelab.com</a></div></div></div></div></div></div></div></div></div></div>