<div dir="ltr"><div class="gmail_default" style="color:#0b5394"><span style="color:rgb(34,34,34)">Dear faculty and students:</span><br style="color:rgb(34,34,34)"><br style="color:rgb(34,34,34)"><span style="color:rgb(34,34,34)">We look forward to seeing you next Tuesday, Oct. 9th, at noon in GHC 6115 for </span><span class="gmail-il" style="color:rgb(34,34,34)">AI</span><span style="color:rgb(34,34,34)"> </span><span class="gmail-il" style="color:rgb(34,34,34)">Seminar</span><span style="color:rgb(34,34,34)"> sponsored by Apple. To learn more about the </span><span class="gmail-il" style="color:rgb(34,34,34)">seminar</span><span style="color:rgb(34,34,34)"> series, please visit the website. </span><br style="color:rgb(34,34,34)"><span style="color:rgb(34,34,34)">On Tuesday, Qizhe Xie will give the following talk:</span><br style="color:rgb(34,34,34)"><br style="color:rgb(34,34,34)"><span style="color:rgb(34,34,34)">Title: From Credit Assignment to Entropy Regularization: Two New Algorithms for Neural Sequence Prediction </span><br style="color:rgb(34,34,34)"><br style="color:rgb(34,34,34)"></div><div class="gmail_default" style="color:#0b5394"><span style="color:rgb(34,34,34)">Background:</span><br style="color:rgb(34,34,34)"><span style="color:rgb(34,34,34)">Modeling and predicting discrete sequences is the central problem to many natural language processing tasks. Despite the distinct evaluation metrics for different tasks, the standard training algorithm for language generation has been maximum likelihood estimation (MLE). However, the MLE algorithm has two obvious weaknesses: (1) the MLE training ignores the information of the task specific metric; (2) MLE can suffer from the exposure bias, which refers to the phenomenon that the model is never exposed to its own failures during training. The recently proposed reward augmented maximum likelihood (RAML) tackles these problems by constructing a task metric dependent target distribution, and training the model to match this task-specific target instead of the empirical data distribution. </span><br style="color:rgb(34,34,34)"><br style="color:rgb(34,34,34)"><span style="color:rgb(34,34,34)">Abstract: </span><br style="color:rgb(34,34,34)"><span style="color:rgb(34,34,34)">In this talk, we study the credit assignment problem in reward augmented maximum likelihood (RAML), and establish a theoretical equivalence between the token-level counterpart of RAML and the entropy regularized reinforcement learning. Inspired by the connection, we propose two sequence prediction algorithms, one extending RAML with fine-grained credit assignment and the other improving Actor-Critic with a systematic entropy regularization. On two benchmark datasets, we show that the proposed algorithms outperform RAML and Actor-Critic respectively.</span><br></div><div><br></div>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><span style="font-size:13px;border-collapse:collapse;color:rgb(136,136,136)"><b>Han Zhao<br>Machine Learning Department</b></span></div><div><span style="font-size:13px;border-collapse:collapse;color:rgb(136,136,136)"><b>School of Computer Science<br>Carnegie Mellon University<br>Mobile: +1-</b></span><b style="color:rgb(136,136,136);font-size:13px">412-652-4404</b></div></div></div></div></div></div>