<div dir="ltr"><div style="font-size:12.8px">Dear faculty and students,</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">We look forward to seeing you next Tuesday, Nov 14, at noon in NSH 3305 for AI Seminar sponsored by Apple. To learn more about the seminar series, please visit the AI Seminar <a href="http://www.cs.cmu.edu/~aiseminar/" target="_blank">webpage</a>.</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">On Tuesday, <a href="https://www.cs.cmu.edu/~nhaghtal/">Nika Haghtalab</a><span style="font-size:12.8px"> will give the following talk: </span></div><div style="font-size:12.8px"><br></div><div><div><span style="font-size:12.8px">Title: Algorithms for Generalized Topic Modeling</span></div><div><span style="font-size:12.8px"><br></span></div><div style="font-size:12.8px"><span style="font-size:12.8px">Abstract:</span></div><div style="font-size:12.8px"><span style="font-size:12.8px"><br></span></div><div style="font-size:12.8px"><div style="font-size:12.8px"><div style="font-size:12.8px"><div style="font-size:12.8px"> Topic modeling is an area with significant recent work in the intersection of algorithms and machine learning. In standard topic models, a topic (such as sports, business, or politics) is viewed as a probability distribution \vec a_i over words, and a document is generated by first selecting a mixture \vec w over topics, and then generating words iid from the associated mixture \vec w^T A. Given a large collection of such documents, the goal is to recover the topic vectors and then to correctly classify new documents according to their topic mixture.</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">In this work we consider a broad generalization of this framework in which words are no longer assumed to be drawn iid and instead a topic is a complex distribution over sequences of paragraphs. Since one could not hope to even represent such a distribution in general (even if paragraphs are given using some natural feature representation), we aim instead to directly learn a document classifier. That is, we aim to learn a predictor that given a new document, accurately predicts its topic mixture, without learning the distributions explicitly. We present several natural conditions under which one can do this efficiently and discuss issues such as noise tolerance and sample complexity in this model. More generally, our model can be viewed as a generalization of the multi-view or co-training setting in machine learning.</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">This talk is based on joint work with Avrim Blum. To appear in AAAI 2018.</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px"><br></div></div></div></div></div></div>