W-NUT 2019: Workshop on Noisy User-generated Text (at EMNLP)

2019 The 5th Workshop on Noisy User-generated Text (W-NUT)

Nov 4, 2019, Hong Kong (at EMNLP 2019)

The WNUT workshop focuses on Natural Language Processing applied to noisy user-generated text, such as that found in social media, online reviews, crowdsourced data, web forums, clinical records and language learner essays. The workshop hashtag is #wnut.

We again have best paper award(s) sponsored by Google this year.

NEW! We received 89 long and short paper submissions this year.

NEW! Best paper awards:

Nathanael Chambers, Timothy Forman, Catherine Griswold, Kevin Lu, Yogaish Khastgir, Stephen Steckler
Character-Based Models for Adversarial Phone Extraction: Preventing Human Sex Trafficking
Abhinav Bhandari and Caitrin Armstrong
Tkol, Httt, and r/radiohead: High Affinity Terms in Reddit Communities

Workshop Organizers

Alan Ritter (Ohio State University)
Wei Xu (Ohio State University)
Tim Baldwin (University of Melbourne)
Afshin Rahimi (University of Melbourne)

Invited Speakers

Isabelle Augenstein (University of Copenhagen)
Jing Jiang (Singapore Management University)

Program

Monday, November, 4, 2019

9:00 - 9:05

Opening

9:05 - 9:50

Invited Talk: Isabelle Augenstein -- Tracking False Information Online
Digital media enables fast sharing of information and discussions among users. While this comes with many benefits to today’s society, such as broadening information access, the manner in which information is disseminated also has obvious downsides. Since fast access to information is expected by many users and news outlets are often under financial pressure, speedy access often comes at the expense of accuracy, which leads to misinformation. Moreover, digital media can be misused by campaigns to intentionally spread false information, i.e. disinformation, about events, individuals or governments. In this talk, I will present on different ways false information is spread online, including misinformation and disinformation. I will then report findings from our recent and ongoing work on automatic fact checking, stance detection and framing attitudes.

9:50 - 10:35

Oral Session I

9:50 - 10:05

Weakly Supervised Attention Networks for Fine-Grained Opinion Mining and Public Health
Giannis Karamanolakis, Daniel Hsu, Luis Gravano
Columbia University

10:05 - 10:20

Formality Style Transfer for Noisy, User-generated Conversations: Extracting Labeled, Parallel Data from Unlabeled Corpora
Isak Czeresnia Etinger and Alan W Black
Carnegie Mellon University

10:20 - 10:35

Multilingual Whispers: Generating Paraphrases with Translation
Christian Federmann¹, Oussama Elachqar², Chris Quirk²
¹Microsoft, ²Microsoft Research AI

10:35 - 11:00

Coffee Break

11:00 - 12:30

Oral Session II

11:00 - 11:15

Exploiting BERT for End-to-End Aspect-based Sentiment Analysis
Xin Li¹, Lidong Bing², Wenxuan Zhang³, Wai Lam³
¹Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, ²Alibaba DAMO Academy, ³The Chinese University of Hong Kong

11:15 - 11:30

Training on Synthetic Noise Improves Robustness to Natural Noise in Machine Translation
vladimir karpukhin¹, Omer Levy², Jacob Eisenstein³, Marjan Ghazvininejad²
¹Facebook Artificial Intelligence Research, ²Facebook AI Research, ³Georgia Institute of Technology

11:30 - 11:45

Character-Based Models for Adversarial Phone Extraction: Preventing Human Sex Trafficking
Nathanael Chambers, Timothy Forman, Catherine Griswold, Kevin Lu, Yogaish Khastgir, Stephen Steckler
US Naval Academy

11:45 - 12:00

Personalizing Grammatical Error Correction: Adaptation to Proficiency Level and L1
Maria Nadejde¹ and Joel Tetreault²
¹Grammarly Inc, ²Grammarly

12:00 - 12:15

Tkol, Httt, and r/radiohead: High Affinity Terms in Reddit Communities
Abhinav Bhandari and Caitrin Armstrong
McGill University

12:15 - 2:00

Lunch

2:00 - 3:00

Lightning Talks

Large Scale Question Paraphrase Retrieval with Smoothed Deep Metric Learning
Daniele Bonadiman², Anjishnu Kumar¹, Arpit Mittal³
¹Amazon Alexa, ²University of Trento, ³Amazon

Hey Siri. Ok Google. Alexa: A topic modeling of user reviews for smart speakers
Hanh Nguyen and Dirk Hovy
Bocconi University

Predicting Algorithm Classes for Programming Word Problems
vinayak athavale¹, aayush naik², rajas vanjape³, Manish Shrivastava⁴
¹International Institute of Information Technology, Hyderabad, ²independent, ³iiit hyd, ⁴International Institute of Information Technology Hyderabad

Automatic identification of writers’ intentions: Comparing different methods for predicting relationship goals in online dating profile texts
Chris van der Lee, Tess van der Zanden, Emiel Krahmer, Maria Mos, Alexander Schouten
Tilburg University

Contextualized Word Representations from Distant Supervision with and for NER
Abbas Ghaddar and Phillippe Langlais
Université de Montréal

Extract, Transform and Filling: A Pipeline Model for Question Paraphrasing based on Template
Yunfan Gu¹, yang yuqiao², Zhongyu Wei²
¹Fudan University, ²School of Data Science, Fudan University

An In-depth Analysis of the Effect of Lexical Normalization on the Dependency Parsing of Social Media
Rob van der Goot
University of Groningen

Who wrote this book? A challenge for e-commerce
Béranger Dumont, Simona Maggio, Ghiles Sidi Said, Quoc-Tien Au
Rakuten Institute of Technology

Mining Tweets that refer to TV programs with Deep Neural Networks
Takeshi Kobayakawa, Taro Miyazaki, Hiroki Okamoto, Simon Clippingdale
NHK

Normalising Non-standardised Orthography in Algerian Code-switched User-generated Data
Wafia Adouane¹, Jean-Philippe Bernardy², Simon Dobnik²
¹Department of Philosophy, Linguistics and Theory of Science- Gothenburg University, ²University of Gothenburg

Dialect Text Normalization to Normative Standard Finnish
Niko Partanen, Mika Hämäläinen, Khalid Alnajjar
University of Helsinki

A Cross-Topic Method for Supervised Relevance Classification
Jiawei Yong
RICOH COMPANY,LTD

Exploring Multilingual Syntactic Sentence Representations
Chen Liu, Anderson De Andrade, Muhammad Osama
Wattpad

FASPell: A Fast, Adaptable, Simple, Powerful Chinese Spell Checker Based On DAE-Decoder Paradigm
Yuzhong Hong, Xianguo Yu, Neng He, Nan Liu, Junhui Liu
iQIYI, Inc.

Latent semantic network induction in the context of linked example senses
Hunter Heidenreich and Jake Williams
Drexel University

SmokEng: Towards Fine-grained Classification of Tobacco-related Social Media Text
Kartikey Pant¹, Venkata Himakar Yanamandra², Alok Debnath³, Radhika Mamidi²
¹IIIT, Hyderabad, ²IIIT Hyderabad, ³International Institute of Information Technology: Hyderabad

Modelling Uncertainty in Collaborative Document Quality Assessment
Aili Shen¹, Daniel Beck², Bahar Salehi¹, Jianzhong Qi², Timothy Baldwin¹
¹The University of Melbourne, ²University of Melbourne

Conceptualisation and Annotation of Drug Nonadherence Information for Knowledge Extraction from Patient-Generated Texts
Anja Belz¹, Richard Hoile², Elizabeth Ford³, Azam Mullick¹
¹University of Brighton, ²Sussex Partnership NHS Foundation Trust, ³Brighton and Sussex Medical School

What A Sunny Day ☂️: Toward Emoji-Sensitive Irony Detection
Shirley Anugrah Hayati, Aditi Chaudhary, Naoki Otani, Alan W Black
Carnegie Mellon University

Geolocation with Attention-Based Multitask Learning Models
Tommaso Fornaciari and Dirk Hovy
Bocconi University

Dense Node Representation for Geolocation
Tommaso Fornaciari and Dirk Hovy
Bocconi University

Identifying Linguistic Areas for Geolocation
Tommaso Fornaciari and Dirk Hovy
Bocconi University

Robustness to Capitalization Errors in Named Entity Recognition
Sravan Bodapati¹, Hyokun Yun¹, Yaser Al-Onaizan²
¹Amazon, ²IBM T.J. Watson Research Center

Extending Event Detection to New Types with Learning from Keywords
Viet Dac Lai and Thien Nguyen
University of Oregon

Distant Supervised Relation Extraction with Separate Head-Tail CNN
Rui Xing and Jie Luo
Beihang University

Discovering the Functions of Language in Online Forums
Youmna Ismaeil, Oana Balalau, Paramita Mirza
Max Planck Institute for Informatics

Incremental processing of noisy user utterances in the spoken language understanding task
Stefan Constantin¹, Jan Niehues², Alex Waibel¹
¹Karlsruhe Institute of Technology, ²Maastricht University

Benefits of Data Augmentation for NMT-based Text Normalization of User-Generated Content
Claudia Matos Veliz¹, Orphee De Clercq², Veronique Hoste¹
¹Ghent University, ²LT3, Language and Translation Technology Team, Ghent University

Contextual Text Denoising with Masked Language Model
Yifu Sun¹ and Haoming Jiang²
¹Tencent, ²Georgia Institute of Technology

Towards Automated Semantic Role Labelling of Hindi-English Code-Mixed Tweets
Riya Pal¹ and Dipti Sharma²
¹International Institute of Information Technology, Hyderabad, ²IIIT, Hyderabad

Enhancing BERT for Lexical Normalization
Benjamin Muller¹, Benoit Sagot¹, Djamé Seddah²
¹INRIA, ²Université Paris Sorbonne (Paris IV)

No, you’re not alone: A better way to find people with similar experiences on Reddit
Zhilin Wang, Elena Rastorgueva, Weizhe Lin, Xiaodong Wu
University of Cambridge

Improving Multi-label Emotion Classification by Integrating both General and Domain-specific Knowledge
Wenhao Ying¹, Rong Xiang¹, Qin Lu²
¹The Hong Kong Polytechnic University, ²The Hong Kong Polytechnic Univeristy

Adapting Deep Learning Methods for Mental Health Prediction on Social Media
Ivan Sekulic and Michael Strube
Heidelberg Institute for Theoretical Studies

Improving Neural Machine Translation Robustness via Data Augmentation: Beyond Back-Translation
Zhenhao Li and Lucia Specia
Imperial College London

An Ensemble of Humour, Sarcasm, and Hate Speechfor Sentiment Classification in Online Reviews
Rohan Badlani, Nishit Asnani, Manan Rai
Stanford University

Grammatical Error Correction in Low-Resource Scenarios
Jakub Náplava¹ and Milan Straka²
¹Charles University, Institute of Formal and Applied Linguistics, ²Charles University

Minimally-Augmented Grammatical Error Correction
Roman Grundkiewicz¹ and Marcin Junczys-Dowmunt²
¹School of Informatics, University of Edinburgh, ²Microsoft

A Social Opinion Gold Standard for the Malta Government Budget 2018
Keith Cortis and Brian Davis
Maynooth University

The Fallacy of Echo Chambers: Analyzing the Political Slants of User-Generated News Comments in Korean Media
Jiyoung Han¹, Youngin Lee¹, Junbum Lee², Meeyoung Cha³
¹Korea Advanced Institute of Science and Technology (KAIST), ²Seoul National Univ. of Education, ³Institute for Basic Science (IBS)

Y’all should read this! Identifying Plurality in Second-Person Personal Pronouns in English Texts
Gabriel Stanovsky¹ and Ronen Tamari²
¹University of Washington, Allen Institute for Artificial Intelligence, ²Hebrew University of Jerusalem

An Edit-centric Approach for Wikipedia Article Quality Assessment
Edison Marrese-Taylor¹, Pablo Loyola², Yutaka Matsuo³
¹The University of Tokyo, ²IBM Research, ³University of Tokyo

Additive Compositionality of Word Vectors
Yeon Seonwoo¹, Sungjoon Park², Dongkwan Kim³, Alice Oh³
¹KAIST, Korea Advanced Institute of Science and Technology, ²Korea Advanced Institute of Science and Technology, ³KAIST

Contextualized context2vec
Kazuki Ashihara¹, Tomoyuki Kajiwara¹, Yuki Arase¹, Satoru Uchida²
¹Osaka University, ²Kyushu University

Phonetic Normalization for Machine Translation of User Generated Content
José Carlos Rosales Núñez¹, Djamé Seddah², Guillaume Wisniewski³
¹LIMSI-CNRS / Inria Paris, ²Université Paris Sorbonne (Paris IV), ³Université Paris Sud and LIMSI

Normalization of Indonesian-English Code-Mixed Twitter Data
Anab Maulana Barik, Rahmad Mahendra, Mirna Adriani
Universitas Indonesia

Unsupervised Neologism Normalization Using Embedding Space Mapping
Nasser Zalmout¹, Kapil Thadani², Aasish Pappu³
¹NYU Abu Dhabi, ²Yahoo Research, ³Spotify Research

Lexical Features Are More Vulnerable, Syntactic Features Have More Predictive Power
Jekaterina Novikova¹, Aparna Balagopalan², Ksenia Shkaruta³, Frank Rudzicz²
¹Heriot Watt University, ²University of Toronto, ³Georgia Tech

Simple Discovery of Aliases from User Comments [1-page abstract]
Abram Handler¹ and Brian Clifton²
¹University of Massachusetts Amherst, ²BuzzFeed

Towards Actual (Not Operational) Textual Style Transfer Auto-Evaluation [1-page abstract]
Richard Yuanzhe Pang
New York University

CodeSwitch-Reddit: Exploration of Written Multilingual Discourse in Online Discussion Forums [1-page abstract]
Ella Rabinovich, Masih Sultani, Suzanne Stevenson
University of Toronto

3:00 - 4:30

Poster Session (all papers above)

4:30 - 4:55

Coffee Break

5:00 - 5:45

Invited Talk: Jing Jiang -- Multimodal Sentiment Analysis from User-Generated Content
In recent years we have observed that multimedia user-generated content is almost dominating social media on platforms such as Facebook, Instagram, Twitter and Snapchat. The analysis of noisy user-generated content now has to take into consideration of not only text but also other modalities of data such as images and videos. In this talk, I will share some recent progress we have made on entity-level multimodal sentiment classification. I will present two pieces of work we have done and also discuss some future directions in the end.

5:45 - 6:00

Closing and Best Paper Awards

Important Dates

Submission Deadline: ~~Monday, August 19~~ Wednesday, August 21 (anywhere on earth)
Reviews Due: Monday, September 9
Acceptance Notification: Monday, September 16
Camera-Ready: Monday, September 30
Workshop day: November 4

Call for Papers

We seek submissions of long and short papers on original and unpublished work (same page limit EMNLP main conference). 1-page abstracts on work-in-progress or work published elsewhere are also welcome and will *not* be included in the conference proceedings. All accepted submissions will be presented as posters. Additionally, selected submissions will be presented orally.

Topics of interest include but are not limited to:

NLP Preprocessing of Noisy Text

Part of speech tagging
Named entity tagging, including a wide range of categories, e.g. product names
Chunking of user-generated text
Parsing

Text Normalization and Error Correction

Normalizing noisy text for downstream tasks and for human readability
Error detection and correction

Robustness to Noise, both Natural and Adversarial
Multilingual NLP in noisy text
Machine Translation of Noisy Text
Sentiment analysis
Crowdsourcing of text data
User prediction, e.g. gender, age, etc
Stylistics, e.g. formality, politeness, etc
Colloquial language, e.g. code-switching, idiom detection
Bilingual translation of the noisy text
Paraphrase identification and semantic similarity of short text or noisy text
Information extraction from noisy text
Domain adaptation to user-generated text
Geolocation prediction
Global and regional trend detection and event extraction
Detecting rumors, contradictory information, sarcasm and humor on social media
Extracting user demographics, profiles, and major life events
Temporal aspects of user-generated content (resolving time expressions, concept drift, diachronic analyses, etc...)

All submissions should conform to EMNLP 2018 style guidelines. Long and short paper submissions must be anonymized. Abstract submissions should include author information (and where the work was published in a footnote on the front page, if applicable). Please submit your papers at the SoftConf link.

Double Submission Policy: Papers that have been or will be submitted to other meetings or publications must indicate at submission time. Authors of a paper accepted for presentation must notify the workshop organizers by the camera-ready deadline as to whether the paper will be presented or withdrawn. (Exception: 1-page abstracts can be work-in-progress or work published elsewhere.)

Program Committee

Mostafa Abdou (University of Copenhagen)
Muhammad Abdul-Mageed (University of British Columbia)
Željko Agić (Corti)
Gustavo Aguilar (University of Houston)
Hadi Amiri (Harvard University)
Rahul Aralikatte (University of Copenhagen)
Eiji Aramaki (NAIST)
Roy Bar-Haim (IBM)
Francesco Barbieri (UPF Barcelona)
Cosmin Bejan (Vanderbilt University)
Eric Bell (PNNL)
Adrian Benton (JHU)
Eduardo Blanco (University of North Texas)
Su Lin Blodgett (UMass Amherst)
Matko Bošnjak (University College London)
Julian Brooke (University of British Columbia)
Annabelle Carrell (JHU)
Xilun Chen (Cornell University)
Anne Cocos (University of Pennsylvania)
Arman Cohan (AI2)
Nigel Collier (University of Cambridge)
Paul Cook (University of New Brunswick)
Marina Danilevsky (IBM Research)
Leon Derczynski (IT University of Copenhagen)
Seza Doğruöz (Tilburg University)
Jay DeYoung (Northeastern University)
Eduard Dragut (Temple University)
Xinya Du (Cornell University)
Heba Elfardy (Amazon)
Micha Elsner (Ohio State University)
Sindhu Kiranmai Ernala (Georgia Tech)
Manaal Faruqui (Google Research)
Lisheng Fu (New York University)
Yoshinari Fujinuma (University of Colorado, Boulder)
Dan Garrette (Google Research)
Kevin Gimpel (TTIC)
Dan Goldwasser (Purdue University)
Amit Goyal (Criteo)
Nizar Habash (NYU Abu Dhabi)
Masato Hagiwara (Duolingo)
Bo Han (Kaplan)
Abe Handler (University of Massachusetts Amherst)
Shudong Hao (University of Colorado, Boulder)
Devamanyu Hazarika (National University of Singapore)
Jack Hessel (Cornell University)
Dirk Hovy (Bocconi University)
Xiaolei Huang (University of Colorado, Boulder)
Sarthak Jain (Northeastern University)
Kenny Joseph (University at Buffalo)
David Jurgens (University of Michigan)
Nobuhiro Kaji (Yahoo! Research)
Pallika Kanani (Oracle)
Dongyeop Kang (Carnegie Mellon University)
Emre Kiciman (Microsoft Research)
Svetlana Kiritchenko (National Research Council Canada)
Roman Klinger (University of Stuttgart)
Ekaterina Kochmar (University of Cambridge)
Vivek Kulkarni (University of California Santa Barbara)
Jonathan Kummerfeld (University of Michigan)
Ophélie Lacroix (Siteimprove)
Wuwei Lan (Ohio State University)
Chen Li (Tencent)
Jing Li (Tencent AI)
Jessy Junyi Li (University of Texas Austin)
Yitong Li (University of Melbourne)
Nut Limsopatham (University of Glasgow)
Patrick Littell (National Research Council Canada)
Zhiyuan Liu (Tsinghua University)
Fei Liu (University of Melbourne)
Nikola Ljubešić (University of Zagreb)
Wei-Yun Ma (Academia Sinica)
Mounica Maddela (Ohio State University)
Suraj Maharjan (University of Houston)
Aaron Masino (The Children's Hospital of Philadelphia)
Paul Michel (CMU)
Shachar Mirkin (Xerox Research)
Saif M. Mohammad (National Research Council Canada)
Ahmed Mourad (RMIT University)
Günter Neumann (DFKI)
Vincent Ng (University of Texas at Dallas)
Eric Nichols (Honda Research Institute)
Xing Niu (University of Maryland, College Park)
Benjamin Nye (Northeastern University)
Alice Oh (KAIST)
Naoki Otani (CMU)
Patrick Pantel (Microsoft Research)
Umashanthi Pavalanathan (Georgia Tech)
Yuval Pinter (Georgia Tech)
Barbara Plank (IT University of Copenhagen)
Christopher Potts (Stanford University)
Daniel Preoţiuc-Pietro (Bloomberg)
Chris Quirk (Microsoft Research)
Ella Rabinovich (University of Toronto)
Dianna Radpour (University of Colorado Boulder)
Preethi Raghavan (IBM Research)
Revanth Rameshkumar (Microsoft)
Sudha Rao (Microsoft Research)
Marek Rei (University of Cambridge)
Roi Reichart (Technion)
Adithya Renduchintala (JHU)
Carolyn Penstein Rose (CMU)
Alla Rozovskaya (City University of New York)
Koustuv Saha (Georgia Tech)
Keisuke Sakaguchi (Allen Institute for Artificial Intelligence)
Maarten Sap (University of Washington)
Natalie Schluter (IT University of Copenhagen)
Andrew Schwartz (Stony Brook University)
Djamé Seddah (University Paris-Sorbonne)
Amirreza Shirani (University of Houston)
Dan Simonson (BlackBoiler)
Evangelia Spiliopoulou (Carnegie Mellon University)
Jan Šnajder (University of Zagreb)
Gabriel Stanovsky (Allen Institute for Artificial Intelligence)
Ian Stewart (Georgia Tech)
Jeniya Tabassum (Ohio State University)
Joel Tetreault (Grammarly)
Sara Tonelli (FBK)
Rob van der Goot (University of Groningen)
Rob Voigt (Stanford University)
Byron Wallace (Northeastern University)
Xiaojun Wan (Peking University)
Zeerak Waseem (University of Sheffield)
Zhongyu Wei (Fudan University)
Diyi Yang (Georgia Tech)
Yi Yang (ASAPP)
Guido Zarrella (MITRE)
Justine Zhang (Cornell University)
Jason Shuo Zhang (University of Colorado, Boulder)
Shi Zong (Ohio State University)

Sponsored by

Anti-harassment Policy