Giuseppe Carenini

  • Professor
  • University of British Columbia

Unlimited discourse structures in the era of distant supervision, pre-trained language models and autoencoders

 

Abstract:  Historically, discourse processing relies on human annotated corpora that are very small and lack diversity, often leading to overfitting, poor performance in domain transfer, and minimal success of modern deep-learning solutions. So, wouldn’t it be great if we could generate an unlimited amount of discourse structures for both monologues and dialogues, across genres, without involving human annotation? In this talk, I will present some preliminary results on possible strategies to achieve this goal: by either leveraging natural text annotations (like sentiment and summaries), by extracting discourse information from pre-trained and fine-tuned language models, or by inducing discourse trees from task-agnostic autoencoding learning objectives. Besides the many remaining challenges and open issues, I will discuss the potential of these novel approaches not only to boost the performance of discourse parsers (NLU) and text planners (NLG), but also lead to more explanatory and useful data-driven theories of discourse.

Bio: Giuseppe Carenini is a Professor in Computer Science and Director of the Master in Data Science at UBC (Vancouver, Canada). His work on natural language processing and information visualization to support decision making has been published in over 140 peer-reviewed papers (including best paper at UMAP-14 and ACM-TiiS-14). Dr. Carenini was the area chair for many conferences including recently for ACL’21 in “Natural language Generation”, as well as Senior Area Chair for NAACL’21 in “Discourse and Pragmatics”.  Dr. Carenini  was also the Program Co-Chair for IUI 2015 and  for SigDial 2016. In 2011, he published a co-authored book on “Methods for Mining and Summarizing Text Conversations”.  In his work, Dr. Carenini has also extensively collaborated with industrial partners, including Microsoft and IBM. He was awarded a Google Research Award in 2007 and a Yahoo Faculty Research Award in 2016.

Sessions