Connectionists: Talk by Hinrich Schütze on LLM

Barbara Hammer bhammer at techfak.uni-bielefeld.de
Mon Jun 26 05:17:03 EDT 2023


Dear colleagues,

I would like to draw your attention to a talk in the lecture series of the large scale project SAIL (www.sail.nrw <http://www.sail.nrw/>):

When: July 6, 16-17:30 CEST
Who: Dr Hinrich Schütze, LMU (Homepage of Hinrich Schütze's lab <https://schuetze.cis.lmu.de/>) 
Where: Zoom link <https://uni-bielefeld.zoom.us/j/64775735478?pwd=TFpEUVFPME5EQXFKMHZHY1ZsM2Y4Zz09>
Title: "Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages"

Abstract: Large language models (LLMs) are currently the most active area of research in NLP. Most work has focused on what we call "vertical" scaling: making LLMs even better for a relatively small number of high-resource languages. We address "horizontal" scaling instead: extending LLMs to a large subset of the world's languages, focusing on low-resource languages. Our Glot500-m model is trained on 500 languages, many of which are not covered by any other language model. I will talk about the major challenges we faced in creating Glot500: (i) finding, validating and cleaning training data for that many languages; (ii) evaluating performance of Glot500-m on languages for which native speakers and labeled datasets were not available to us; and (iii) determining the factors that ultimately make training on a language successful. We find that trying to reduce such factors to the so-called curse of multilinguality is naive and there is in fact also a "boon of multilinguality". We are in the process of making Glot500-c, our training corpus covering 500 languages, publicly available.

Best wishes

Barbara Hammer


-- 
Prof. Dr. Barbara Hammer
Machine Learning Group, CITEC
Bielefeld University
D-33594 Bielefeld
Phone: +49 521 / 106 12115



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/connectionists/attachments/20230626/7fa81855/attachment.html>


More information about the Connectionists mailing list