<html><body><div style="font-family: arial, helvetica, sans-serif; font-size: 12pt; color: #000000"><!--StartFragment--><div style="font-family: 'arial' , 'helvetica' , sans-serif; font-size: 12pt; color: #000000;" data-mce-style="font-family: 'arial' , 'helvetica' , sans-serif; font-size: 12pt; color: #000000;"><div><div class="entry-content clearfix"><span style="font-family: 'arial' , 'helvetica' , sans-serif; font-size: 12pt;" data-mce-style="font-family: 'arial' , 'helvetica' , sans-serif; font-size: 12pt;">Dear All, <br></span></div><div class="entry-content clearfix"><span style="font-family: 'arial' , 'helvetica' , sans-serif; font-size: 12pt;" data-mce-style="font-family: 'arial' , 'helvetica' , sans-serif; font-size: 12pt;"><br data-mce-bogus="1"></span></div><div class="entry-content clearfix"><span style="font-family: 'arial' , 'helvetica' , sans-serif; font-size: 12pt;" data-mce-style="font-family: 'arial' , 'helvetica' , sans-serif; font-size: 12pt;">Below is the details on call for Masters 2 internship, <br data-mce-bogus="1"></span></div><div class="entry-content clearfix"><span style="font-family: 'arial' , 'helvetica' , sans-serif;" data-mce-style="font-family: 'arial' , 'helvetica' , sans-serif;"><span style="text-decoration: underline;" data-mce-style="text-decoration: underline;"><span style="font-size: 14pt;" data-mce-style="font-size: 14pt;"><strong><br></strong></span></span></span></div><div class="entry-content clearfix"><span style="font-family: 'arial' , 'helvetica' , sans-serif;" data-mce-style="font-family: 'arial' , 'helvetica' , sans-serif;"><span style="text-decoration: underline;" data-mce-style="text-decoration: underline;"><span style="font-size: 14pt;" data-mce-style="font-size: 14pt;"><strong>Title:</strong></span></span><span style="font-size: 14pt;" data-mce-style="font-size: 14pt;"><strong> </strong></span><strong><span style="font-size: 14pt; font-family: sans-serif;" data-mce-style="font-size: 14pt; font-family: sans-serif;"> Multi-task learning for hate speech classification</span></strong></span><span style="font-family: 'arial' , 'helvetica' , sans-serif;" data-mce-style="font-family: 'arial' , 'helvetica' , sans-serif;"><strong><span style="font-size: 14pt; font-family: sans-serif;" data-mce-style="font-size: 14pt; font-family: sans-serif;"></span></strong></span><span style="font-family: 'arial' , 'helvetica' , sans-serif;" data-mce-style="font-family: 'arial' , 'helvetica' , sans-serif;"><strong><span style="font-size: 14pt; font-family: sans-serif;" data-mce-style="font-size: 14pt; font-family: sans-serif;"> </span></strong></span></div><div class="entry-content clearfix"><span style="font-family: 'arial' , 'helvetica' , sans-serif;" data-mce-style="font-family: 'arial' , 'helvetica' , sans-serif;"><strong><span style="font-size: 14pt; font-family: sans-serif;" data-mce-style="font-size: 14pt; font-family: sans-serif;"><br></span></strong></span></div><div class="entry-content clearfix"><span style="font-family: 'arial' , 'helvetica' , sans-serif;" data-mce-style="font-family: 'arial' , 'helvetica' , sans-serif;"><strong><span style="font-size: 14pt; font-family: sans-serif;" data-mce-style="font-size: 14pt; font-family: sans-serif;"><span style="font-size: 12pt;" data-mce-style="font-size: 12pt;"><span style="font-size: 14pt;" data-mce-style="font-size: 14pt;"><span style="font-family: sans-serif;" data-mce-style="font-family: sans-serif;">Research Lab</span><span style="font-family: sans-serif;" data-mce-style="font-family: sans-serif;">: </span></span></span></span></strong><span style="font-size: 14pt; font-family: sans-serif;" data-mce-style="font-size: 14pt; font-family: sans-serif;"><span style="font-size: 12pt;" data-mce-style="font-size: 12pt;"><span style="font-family: sans-serif;" data-mce-style="font-family: sans-serif;">MULTISPEECH team, LORIA-INRIA, Nancy, France</span></span></span></span><p style="margin: 0px;" data-mce-style="margin: 0px;"><span style="font-family: 'arial' , 'helvetica' , sans-serif; font-size: 14pt;" data-mce-style="font-family: 'arial' , 'helvetica' , sans-serif; font-size: 14pt;"><strong>Supervisors:</strong></span></p><span style="font-family: 'arial' , 'helvetica' , sans-serif; font-size: 12pt;" data-mce-style="font-family: 'arial' , 'helvetica' , sans-serif; font-size: 12pt;"><strong>Irina Illina,</strong> Associate Professor, HDR (<span lang="zxx">illina@loria.fr)<br><strong>Ashwin Geet D’Sa</strong>, PhD Thesis student (<span lang="fr-FR">ashwin-geet.dsa@loria.fr) </span><br><span lang="fr-FR"><strong>Dominique Fohr</strong>, Research scientist CNRS </span>(dominique.fohr@loria.fr)</span></span></div><div class="entry-content clearfix"><p style="margin: 0px;" data-mce-style="margin: 0px;"><span style="font-family: 'arial' , 'helvetica' , sans-serif; font-size: 14pt;" data-mce-style="font-family: 'arial' , 'helvetica' , sans-serif; font-size: 14pt;"><strong><br data-mce-bogus="1"></strong></span></p><p style="margin: 0px;" data-mce-style="margin: 0px;"><span style="font-family: 'arial' , 'helvetica' , sans-serif; font-size: 14pt;" data-mce-style="font-family: 'arial' , 'helvetica' , sans-serif; font-size: 14pt;"><strong>Motivation and context:</strong></span></p><span style="font-family: 'arial' , 'helvetica' , sans-serif;" data-mce-style="font-family: 'arial' , 'helvetica' , sans-serif;">During the last years, online communication through social media has skyrocketed. Although most people use social media for constructive purposes, few misuse these platforms to spread hate speech. Hate speech is anti-social communicative behavior and targets a minor section of the society based on religion, gender, race, etc. (Delgado and Stefancic, 2014). It often leads to threats, fear, and violence to an individual or a group. Online hate speech is a punishable offense by the law, and the owners of the platform are held accountable for the hate speech posted by its users. Manual moderation of hate speech by humans is often expensive and time-consuming. Therefore, automatic classification techniques have been employed for the detection of hate speech.</span><span style="font-family: 'arial' , 'helvetica' , sans-serif;" data-mce-style="font-family: 'arial' , 'helvetica' , sans-serif;">Recently, deep learning techniques have outperformed the classical machine learning techniques and have become state-of-the-art methodology for hate speech classification (Badjatiya et al., 2017). These methodologies need a large quantity of annotated data for training. The task of annotation is very expensive. To train a powerful deep neural network based classification system, several corpora can be used.</span><span style="font-family: 'arial' , 'helvetica' , sans-serif;" data-mce-style="font-family: 'arial' , 'helvetica' , sans-serif;">Multi-task learning is a deep learning-based methodology (MT-DNN) which has proven to outperform the single-task based deep learning models, especially in the low-resource setting (Worsham and Jugal, 2020; Liu et al., 2019). This methodology jointly learns a model on multiple tasks such as classification, question-answering, etc. Thus, the information learned in one task can benefit other tasks and improves the performance of all the tasks.</span><span style="font-family: 'arial' , 'helvetica' , sans-serif;" data-mce-style="font-family: 'arial' , 'helvetica' , sans-serif;">Existing hate speech corpora are collected from various sources such as Wikipedia, Twitter, etc. Labeling of these corpora can vary greatly. Some corpus creators combine various forms of hate, such as ‘abuse’, ‘threat’, etc. and collectively annotate the samples as ‘hate speech’ or ‘toxic speech’. Whereas, other authors create more challenging corpora using fine-grained annotations such as ‘hate speech’, ‘abusive speech’, ‘racism’, etc. (Davidson et al., 2017). Furthermore, the definition of hate speech remains unclear, and corpus creators can use different definitions. Thus, a model trained on one corpus cannot be easily used to classify the comments from another corpus. To take advantage of the different available hate speech corpora and to improve the performance of hate classification, we would like to explore the multi-corpus learning by using the methodology of multi-task learning.</span><p style="margin: 0px;" data-mce-style="margin: 0px;"><span style="font-family: 'arial' , 'helvetica' , sans-serif; font-size: 14pt;" data-mce-style="font-family: 'arial' , 'helvetica' , sans-serif; font-size: 14pt;"><strong>Objectives:</strong></span></p><span style="font-family: 'arial' , 'helvetica' , sans-serif;" data-mce-style="font-family: 'arial' , 'helvetica' , sans-serif;">The objective of this work is to improve the existing deep learning hate speech classifier by developing the multi-task learning system using several hate speech corpora during the training. In the MT-DNN model of (Liu et al., 2019), the multi-task learning model consists of a set of task-specific layers on top of shared layers. The shared layers are the bottom layers of the model and are jointly trained on several corpora. The task-specific layers are built on top of the shared layers and each of these layers is trained on a single task. We want to explore this setup for hate speech detection learning. In this case, shared layers will be jointly trained on several corpora of hate speech. Each task-specific layer will be used to learn a specific hate speech task from a specific corpus. For example, one task-specific layer can use very toxic, toxic, neither, healthy and very healthy classification task and the Wikipedia detox corpus (Wulczyn et al.,). Another task-specific layer can use hateful, abusive, normal classification task and the Founta corpus (Founta et al.). Since shared-layers are jointly trained using multiple corpora, it will improve the performance of the task-specific layers, especially for a task with small amount of data.</span><span style="font-family: 'arial' , 'helvetica' , sans-serif;" data-mce-style="font-family: 'arial' , 'helvetica' , sans-serif;">The work plan for the internship is as follows: at the beginning, the intern will conduct a literature survey on the hate speech classification using deep neural networks. Using the hate speech classification baseline system (CNN-based or Bi-LSTM-based), existing in our team, the student will evaluate the performance of this system on several available hate speech corpora. After this, the student will develop a new methodology based on the MT-DNN model for efficient learning. We can use pre-trained BERT model (Devlin et al., 2019) to initialize the shared layers of MT-DNN. The performance of the proposed MT-DNN model will be evaluated and compared to single corpora learning and multi-corpora learning (grouping all corpora together).</span></div><div class="entry-content clearfix"><p style="margin: 0px;" data-mce-style="margin: 0px;" align="justify"><span style="font-family: 'arial' , 'helvetica' , sans-serif; font-size: 14pt;" data-mce-style="font-family: 'arial' , 'helvetica' , sans-serif; font-size: 14pt;"><strong><br data-mce-bogus="1"></strong></span></p><p style="margin: 0px;" data-mce-style="margin: 0px;" align="justify"><span style="font-family: 'arial' , 'helvetica' , sans-serif; font-size: 14pt;" data-mce-style="font-family: 'arial' , 'helvetica' , sans-serif; font-size: 14pt;"><strong>Required Skills:</strong></span></p><span style="font-family: 'arial' , 'helvetica' , sans-serif;" data-mce-style="font-family: 'arial' , 'helvetica' , sans-serif;">Background in statistics, natural language processing and computer program skills (Python).</span><br><span style="font-family: 'arial' , 'helvetica' , sans-serif;" data-mce-style="font-family: 'arial' , 'helvetica' , sans-serif;">Candidates should email a detailed CV with diploma.</span></div><div class="entry-content clearfix"><span style="font-family: 'arial' , 'helvetica' , sans-serif;" data-mce-style="font-family: 'arial' , 'helvetica' , sans-serif;"><br data-mce-bogus="1"></span></div><div class="entry-content clearfix"><p style="margin: 0px;" data-mce-style="margin: 0px;"><span style="font-family: 'arial' , 'helvetica' , sans-serif; font-size: 14pt;" data-mce-style="font-family: 'arial' , 'helvetica' , sans-serif; font-size: 14pt;"><strong>Bibliography:</strong></span></p><p style="margin:0px" data-mce-style="margin: 0px;" align="justify"><span style="font-size: 10pt;" data-mce-style="font-size: 10pt;">Badjatiya P., Gupta S., Gupta M., and Varma V. “Deep learning for hate speech detection in tweets.” In Proceedings of the 26th International Conference on World Wide Web Companion, pp. 759-760, 2017.</span></p><p style="margin:0px" data-mce-style="margin: 0px;" align="justify"><span style="font-size: 10pt;" data-mce-style="font-size: 10pt;">Davidson T., Warmsley D., Macy M., and Weber I. “Automated hate speech detection and the problem of offensive language.” arXiv preprint arXiv:1703.04009, 2017.</span></p><p style="margin:0px" data-mce-style="margin: 0px;" align="justify"><span style="font-size: 10pt;" data-mce-style="font-size: 10pt;">Delgado R., and Stefancic J. “Hate speech in cyberspace.” Wake Forest L. Rev. 49: 319, 2014.</span></p><p style="margin:0px" data-mce-style="margin: 0px;" align="justify"><span style="font-size: 10pt;" data-mce-style="font-size: 10pt;">Devlin J., Chang M., Lee K., and Toutanova K. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171-4186. 2019.</span></p><p style="margin:0px" data-mce-style="margin: 0px;" align="justify"><span style="font-size: 10pt;" data-mce-style="font-size: 10pt;">Founta AM, Djouvas C, Chatzakou D, Leontiadis I, Blackburn J, Stringhini G, Vakali A, Sirivianos M, Kourtellis N. Large scale crowdsourcing and characterization of Twitter abusive behavior. ICWSM. 2018.</span></p><p style="margin:0px" data-mce-style="margin: 0px;" align="justify"><span style="font-size: 10pt;" data-mce-style="font-size: 10pt;">Liu X., He P., Chen W., and Gao J. “Multi-Task Deep Neural Networks for Natural Language Understanding.” In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4487-4496. 2019.</span></p><p style="margin:0px" data-mce-style="margin: 0px;" align="justify"><span style="font-size: 10pt;" data-mce-style="font-size: 10pt;">Worsham J., and Jugal K. “Multi-task learning for natural language processing in the 2020s: where are we going?.” Pattern Recognition Letters, 2020.</span></p><p style="margin:0px" data-mce-style="margin: 0px;" align="justify"><span style="font-size: 10pt;" data-mce-style="font-size: 10pt;">Wulczyn E, Thain N, Dixon L. Ex machina: Personal attacks seen at scale. InProceedings of the 26th International Conference on World Wide Web 2017.</span></p><br></div></div><br><div>Best Regards,<br>Ashwin Geet D'Sa</div></div><!--EndFragment--><div><br data-mce-bogus="1"></div></div></body></html>