2019 The 5th Workshop on Noisy User-generated Text (W-NUT)

Nov 4, 2019, Hong Kong (at EMNLP 2019)

The WNUT workshop focuses on Natural Language Processing applied to noisy user-generated text, such as that found in social media, online reviews, crowdsourced data, web forums, clinical records and language learner essays. The workshop hashtag is #wnut.

We again have best paper award(s) sponsored by Google this year.

NEW! We received 89 long and short paper submissions this year.

NEW! Best paper awards:

Workshop Organizers

Invited Speakers


Monday, November, 4, 2019
9:00 - 9:05    Opening
9:05 - 9:50    Invited Talk: Isabelle Augenstein -- Tracking False Information Online
Digital media enables fast sharing of information and discussions among users. While this comes with many benefits to today’s society, such as broadening information access, the manner in which information is disseminated also has obvious downsides. Since fast access to information is expected by many users and news outlets are often under financial pressure, speedy access often comes at the expense of accuracy, which leads to misinformation. Moreover, digital media can be misused by campaigns to intentionally spread false information, i.e. disinformation, about events, individuals or governments. In this talk, I will present on different ways false information is spread online, including misinformation and disinformation. I will then report findings from our recent and ongoing work on automatic fact checking, stance detection and framing attitudes.
9:50 - 10:35    Oral Session I
9:50 - 10:05    Weakly Supervised Attention Networks for Fine-Grained Opinion Mining and Public Health
Giannis Karamanolakis, Daniel Hsu, Luis Gravano
Columbia University
10:05 - 10:20    Formality Style Transfer for Noisy, User-generated Conversations: Extracting Labeled, Parallel Data from Unlabeled Corpora
Isak Czeresnia Etinger and Alan W Black
Carnegie Mellon University
10:20 - 10:35    Multilingual Whispers: Generating Paraphrases with Translation
Christian Federmann1, Oussama Elachqar2, Chris Quirk2
1Microsoft, 2Microsoft Research AI
10:35 - 11:00    Coffee Break
11:00 - 12:30    Oral Session II
11:00 - 11:15    Exploiting BERT for End-to-End Aspect-based Sentiment Analysis
Xin Li1, Lidong Bing2, Wenxuan Zhang3, Wai Lam3
1Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, 2Alibaba DAMO Academy, 3The Chinese University of Hong Kong
11:15 - 11:30    Training on Synthetic Noise Improves Robustness to Natural Noise in Machine Translation
vladimir karpukhin1, Omer Levy2, Jacob Eisenstein3, Marjan Ghazvininejad2
1Facebook Artificial Intelligence Research, 2Facebook AI Research, 3Georgia Institute of Technology
11:30 - 11:45    Character-Based Models for Adversarial Phone Extraction: Preventing Human Sex Trafficking
Nathanael Chambers, Timothy Forman, Catherine Griswold, Kevin Lu, Yogaish Khastgir, Stephen Steckler
US Naval Academy
11:45 - 12:00    Personalizing Grammatical Error Correction: Adaptation to Proficiency Level and L1
Maria Nadejde1 and Joel Tetreault2
1Grammarly Inc, 2Grammarly
12:00 - 12:15    Tkol, Httt, and r/radiohead: High Affinity Terms in Reddit Communities
Abhinav Bhandari and Caitrin Armstrong
McGill University
12:15 - 2:00    Lunch
2:00 - 3:00    Lightning Talks
   Large Scale Question Paraphrase Retrieval with Smoothed Deep Metric Learning
Daniele Bonadiman2, Anjishnu Kumar1, Arpit Mittal3
1Amazon Alexa, 2University of Trento, 3Amazon
   Hey Siri. Ok Google. Alexa: A topic modeling of user reviews for smart speakers
Hanh Nguyen and Dirk Hovy
Bocconi University
   Predicting Algorithm Classes for Programming Word Problems
vinayak athavale1, aayush naik2, rajas vanjape3, Manish Shrivastava4
1International Institute of Information Technology, Hyderabad, 2independent, 3iiit hyd, 4International Institute of Information Technology Hyderabad
   Automatic identification of writers’ intentions: Comparing different methods for predicting relationship goals in online dating profile texts
Chris van der Lee, Tess van der Zanden, Emiel Krahmer, Maria Mos, Alexander Schouten
Tilburg University
   Contextualized Word Representations from Distant Supervision with and for NER
Abbas Ghaddar and Phillippe Langlais
Université de Montréal
   Extract, Transform and Filling: A Pipeline Model for Question Paraphrasing based on Template
Yunfan Gu1, yang yuqiao2, Zhongyu Wei2
1Fudan University, 2School of Data Science, Fudan University
   An In-depth Analysis of the Effect of Lexical Normalization on the Dependency Parsing of Social Media
Rob van der Goot
University of Groningen
   Who wrote this book? A challenge for e-commerce
Béranger Dumont, Simona Maggio, Ghiles Sidi Said, Quoc-Tien Au
Rakuten Institute of Technology
   Mining Tweets that refer to TV programs with Deep Neural Networks
Takeshi Kobayakawa, Taro Miyazaki, Hiroki Okamoto, Simon Clippingdale
   Normalising Non-standardised Orthography in Algerian Code-switched User-generated Data
Wafia Adouane1, Jean-Philippe Bernardy2, Simon Dobnik2
1Department of Philosophy, Linguistics and Theory of Science- Gothenburg University, 2University of Gothenburg
   Dialect Text Normalization to Normative Standard Finnish
Niko Partanen, Mika Hämäläinen, Khalid Alnajjar
University of Helsinki
   A Cross-Topic Method for Supervised Relevance Classification
Jiawei Yong
   Exploring Multilingual Syntactic Sentence Representations
Chen Liu, Anderson De Andrade, Muhammad Osama
   FASPell: A Fast, Adaptable, Simple, Powerful Chinese Spell Checker Based On DAE-Decoder Paradigm
Yuzhong Hong, Xianguo Yu, Neng He, Nan Liu, Junhui Liu
iQIYI, Inc.
   Latent semantic network induction in the context of linked example senses
Hunter Heidenreich and Jake Williams
Drexel University
   SmokEng: Towards Fine-grained Classification of Tobacco-related Social Media Text
Kartikey Pant1, Venkata Himakar Yanamandra2, Alok Debnath3, Radhika Mamidi2
1IIIT, Hyderabad, 2IIIT Hyderabad, 3International Institute of Information Technology: Hyderabad
   Modelling Uncertainty in Collaborative Document Quality Assessment
Aili Shen1, Daniel Beck2, Bahar Salehi1, Jianzhong Qi2, Timothy Baldwin1
1The University of Melbourne, 2University of Melbourne
   Conceptualisation and Annotation of Drug Nonadherence Information for Knowledge Extraction from Patient-Generated Texts
Anja Belz1, Richard Hoile2, Elizabeth Ford3, Azam Mullick1
1University of Brighton, 2Sussex Partnership NHS Foundation Trust, 3Brighton and Sussex Medical School
   What A Sunny Day ☂️: Toward Emoji-Sensitive Irony Detection
Shirley Anugrah Hayati, Aditi Chaudhary, Naoki Otani, Alan W Black
Carnegie Mellon University
   Geolocation with Attention-Based Multitask Learning Models
Tommaso Fornaciari and Dirk Hovy
Bocconi University
   Dense Node Representation for Geolocation
Tommaso Fornaciari and Dirk Hovy
Bocconi University
   Identifying Linguistic Areas for Geolocation
Tommaso Fornaciari and Dirk Hovy
Bocconi University
   Robustness to Capitalization Errors in Named Entity Recognition
Sravan Bodapati1, Hyokun Yun1, Yaser Al-Onaizan2
1Amazon, 2IBM T.J. Watson Research Center
   Extending Event Detection to New Types with Learning from Keywords
Viet Dac Lai and Thien Nguyen
University of Oregon
   Distant Supervised Relation Extraction with Separate Head-Tail CNN
Rui Xing and Jie Luo
Beihang University
   Discovering the Functions of Language in Online Forums
Youmna Ismaeil, Oana Balalau, Paramita Mirza
Max Planck Institute for Informatics
   Incremental processing of noisy user utterances in the spoken language understanding task
Stefan Constantin1, Jan Niehues2, Alex Waibel1
1Karlsruhe Institute of Technology, 2Maastricht University
   Benefits of Data Augmentation for NMT-based Text Normalization of User-Generated Content
Claudia Matos Veliz1, Orphee De Clercq2, Veronique Hoste1
1Ghent University, 2LT3, Language and Translation Technology Team, Ghent University
   Contextual Text Denoising with Masked Language Model
Yifu Sun1 and Haoming Jiang2
1Tencent, 2Georgia Institute of Technology
   Towards Automated Semantic Role Labelling of Hindi-English Code-Mixed Tweets
Riya Pal1 and Dipti Sharma2
1International Institute of Information Technology, Hyderabad, 2IIIT, Hyderabad
   Enhancing BERT for Lexical Normalization
Benjamin Muller1, Benoit Sagot1, Djamé Seddah2
1INRIA, 2Université Paris Sorbonne (Paris IV)
   No, you’re not alone: A better way to find people with similar experiences on Reddit
Zhilin Wang, Elena Rastorgueva, Weizhe Lin, Xiaodong Wu
University of Cambridge
   Improving Multi-label Emotion Classification by Integrating both General and Domain-specific Knowledge
Wenhao Ying1, Rong Xiang1, Qin Lu2
1The Hong Kong Polytechnic University, 2The Hong Kong Polytechnic Univeristy
   Adapting Deep Learning Methods for Mental Health Prediction on Social Media
Ivan Sekulic and Michael Strube
Heidelberg Institute for Theoretical Studies
   Improving Neural Machine Translation Robustness via Data Augmentation: Beyond Back-Translation
Zhenhao Li and Lucia Specia
Imperial College London
   An Ensemble of Humour, Sarcasm, and Hate Speechfor Sentiment Classification in Online Reviews
Rohan Badlani, Nishit Asnani, Manan Rai
Stanford University
   Grammatical Error Correction in Low-Resource Scenarios
Jakub Náplava1 and Milan Straka2
1Charles University, Institute of Formal and Applied Linguistics, 2Charles University
   Minimally-Augmented Grammatical Error Correction
Roman Grundkiewicz1 and Marcin Junczys-Dowmunt2
1School of Informatics, University of Edinburgh, 2Microsoft
   A Social Opinion Gold Standard for the Malta Government Budget 2018
Keith Cortis and Brian Davis
Maynooth University
   The Fallacy of Echo Chambers: Analyzing the Political Slants of User-Generated News Comments in Korean Media
Jiyoung Han1, Youngin Lee1, Junbum Lee2, Meeyoung Cha3
1Korea Advanced Institute of Science and Technology (KAIST), 2Seoul National Univ. of Education, 3Institute for Basic Science (IBS)
   Y’all should read this! Identifying Plurality in Second-Person Personal Pronouns in English Texts
Gabriel Stanovsky1 and Ronen Tamari2
1University of Washington, Allen Institute for Artificial Intelligence, 2Hebrew University of Jerusalem
   An Edit-centric Approach for Wikipedia Article Quality Assessment
Edison Marrese-Taylor1, Pablo Loyola2, Yutaka Matsuo3
1The University of Tokyo, 2IBM Research, 3University of Tokyo
   Additive Compositionality of Word Vectors
Yeon Seonwoo1, Sungjoon Park2, Dongkwan Kim3, Alice Oh3
1KAIST, Korea Advanced Institute of Science and Technology, 2Korea Advanced Institute of Science and Technology, 3KAIST
   Contextualized context2vec
Kazuki Ashihara1, Tomoyuki Kajiwara1, Yuki Arase1, Satoru Uchida2
1Osaka University, 2Kyushu University
   Phonetic Normalization for Machine Translation of User Generated Content
José Carlos Rosales Núñez1, Djamé Seddah2, Guillaume Wisniewski3
1LIMSI-CNRS / Inria Paris, 2Université Paris Sorbonne (Paris IV), 3Université Paris Sud and LIMSI
   Normalization of Indonesian-English Code-Mixed Twitter Data
Anab Maulana Barik, Rahmad Mahendra, Mirna Adriani
Universitas Indonesia
   Unsupervised Neologism Normalization Using Embedding Space Mapping
Nasser Zalmout1, Kapil Thadani2, Aasish Pappu3
1NYU Abu Dhabi, 2Yahoo Research, 3Spotify Research
   Lexical Features Are More Vulnerable, Syntactic Features Have More Predictive Power
Jekaterina Novikova1, Aparna Balagopalan2, Ksenia Shkaruta3, Frank Rudzicz2
1Heriot Watt University, 2University of Toronto, 3Georgia Tech
   Simple Discovery of Aliases from User Comments [1-page abstract]
Abram Handler1 and Brian Clifton2
1University of Massachusetts Amherst, 2BuzzFeed
   Towards Actual (Not Operational) Textual Style Transfer Auto-Evaluation [1-page abstract]
Richard Yuanzhe Pang
New York University
   CodeSwitch-Reddit: Exploration of Written Multilingual Discourse in Online Discussion Forums [1-page abstract]
Ella Rabinovich, Masih Sultani, Suzanne Stevenson
University of Toronto
3:00 - 4:30    Poster Session (all papers above)
4:30 - 4:55    Coffee Break
5:00 - 5:45    Invited Talk: Jing Jiang -- Multimodal Sentiment Analysis from User-Generated Content
In recent years we have observed that multimedia user-generated content is almost dominating social media on platforms such as Facebook, Instagram, Twitter and Snapchat. The analysis of noisy user-generated content now has to take into consideration of not only text but also other modalities of data such as images and videos. In this talk, I will share some recent progress we have made on entity-level multimodal sentiment classification. I will present two pieces of work we have done and also discuss some future directions in the end.
5:45 - 6:00    Closing and Best Paper Awards

Call for Papers

We seek submissions of long and short papers on original and unpublished work (same page limit EMNLP main conference). 1-page abstracts on work-in-progress or work published elsewhere are also welcome and will *not* be included in the conference proceedings. All accepted submissions will be presented as posters. Additionally, selected submissions will be presented orally.

Topics of interest include but are not limited to:

All submissions should conform to EMNLP 2018 style guidelines. Long and short paper submissions must be anonymized. Abstract submissions should include author information (and where the work was published in a footnote on the front page, if applicable). Please submit your papers at the SoftConf link.

Double Submission Policy: Papers that have been or will be submitted to other meetings or publications must indicate at submission time. Authors of a paper accepted for presentation must notify the workshop organizers by the camera-ready deadline as to whether the paper will be presented or withdrawn. (Exception: 1-page abstracts can be work-in-progress or work published elsewhere.)

