W-NUT 2018: Workshop on Noisy User-generated Text (at EMNLP)

2018 The 4th Workshop on Noisy User-generated Text (W-NUT)

Nov 1, 2018, Brussels, Belgium (at EMNLP 2018)

The WNUT workshop focuses on Natural Language Processing applied to noisy user-generated text, such as that found in social media, online reviews, crowdsourced data, web forums, clinical records and language learner essays. The workshop hashtag is #wnut.

We again have best paper award(s) sponsored by Microsoft Research this year.

NEW! WNUT 2019 will be co-located again with EMNLP! (Hong Kong, Nov 2-7)

NEW! We received 44 paper submissions this year.

NEW! Best paper awards:

Philippa Shoemark, James Kirby and Sharon Goldwater
Inducing a lexicon of sociolinguistic variables from code-mixed text
Junchao Zheng, Courtney Napoles and Joel Tetreault
How do you correct run-on sentences it’s not as easy as it seems

Workshop Organizers

Wei Xu (Ohio State University)
Alan Ritter (Ohio State University)
Tim Baldwin (University of Melbourne)
Afshin Rahimi (University of Melbourne)

Invited Speakers

Leon Derczynski (IT-University of Copenhagen)
Daniel Preoţiuc-Pietro (Bloomberg)
Diyi Yang (Carnegie Mellon University)

Program

Thursday, November, 1, 2018
9:00–9:05	Opening
9:05–9:50	Invited Talk: Leon Derczynski Dimensions of Variation in User-generated Text
9:50–10:35	Oral Session I
9:50–10:05	Inducing a lexicon of sociolinguistic variables from code-mixed text Philippa Shoemark, James Kirby and Sharon Goldwater
10:05–10:20	Twitter Geolocation using Knowledge-Based Methods Taro Miyazaki, Afshin Rahimi, Trevor Cohn and Timothy Baldwin
10:20–10:35	Content Extraction and Lexical Analysis from Customer-Agent Interactions Sergiu Nisioi, Anca Bucur and Liviu P. Dinu
10:35–11:00	Tea Break
11:00–12:30	Oral Session II
11:00–11:15	Assigning people to tasks identified in email: The EPA dataset for addressee tagging for detected task intent Revanth Rameshkumar, Peter Bailey, Abhishek Jha and Chris Quirk
11:15–11:30	How do you correct run-on sentences it’s not as easy as it seems Junchao Zheng, Courtney Napoles and Joel Tetreault
11:30–11:45	A POS Tagging Model Designed for Learner English Ryo Nagata, Tomoya Mizumoto, Yuta Kikuchi, Yoshifumi Kawasaki and Kotaro Funakoshi
11:45–12:00	Normalization of Transliterated Words in Code-Mixed Data Using Seq2Seq Model & Levenshtein Distance Soumil Mandal and Karthick Nanmaran
12:00–12:15	Robust Word Vectors: Context-Informed Embeddings for Noisy Texts Valentin Malykh, Taras Khakhulin and Varvara Logacheva
12:15–12:30	Paraphrase Detection on Noisy Subtitles in Six Languages Eetu Sjöblom, Mathias Creutz and Mikko Aulamo
12:30–14:00	Lunch
14:00–14:45	Invited Talk: Diyi Yang Modeling Members' Social Roles and their Conversational Acts in Online Communities
14:45–15:15	Lightning Talks
	Geocoding Without Geotags: A Text-based Approach for reddit Keith Harrigian
	Distantly Supervised Attribute Detection from Reviews Lisheng Fu and Pablo Barrio
	Using Wikipedia Edits in Low Resource Grammatical Error Correction Adriane Boyd
	Empirical Evaluation of Character-Based Model on Neural Named-Entity Recognition in Indonesian Conversational Texts Kemal Kurniawan and Samuel Louvan
	Orthogonal Matching Pursuit for Text Classification Konstantinos Skianis, Nikolaos Tziortziotis and Michalis Vazirgiannis
	Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data R. Andrew Kreek and Emilia Apostolova
	Detecting Code-Switching between Turkish-English Language Pair Zeynep Yirmibeşoğlu and Gülşen Eryiğit
	Language Identification in Code-Mixed Data using Multichannel Neural Networks and Context Capture Soumil Mandal and Anil Kumar Singh
	Modeling Student Response Times: Towards Efficient One-on-one Tutoring Dialogues Luciana Benotti, Jayadev Bhaskaran, Sigtryggur Kjartansson and David Lang
	Preferred Answer Selection in Stack Overflow: Better Text Representations ... and Metadata, Metadata, Metadata Steven Xu, Andrew Bennett, Doris Hoogeveen, Jey Han Lau and Timothy Baldwin
	Word-like character n-gram embedding Geewook Kim, Kazuki Fukui and Hidetoshi Shimodaira
	Classification of Tweets about Reported Events using Neural Networks Kiminobu Makino, Yuka Takei, Taro Miyazaki and Jun Goto
	Learning to Define Terms in the Software Domain Vidhisha Balachandran, Dheeraj Rajagopal, Rose Catherine Kanjirathinkal and William Cohen
	FrameIt: Ontology Discovery for Noisy User-Generated Text Dan Iter, Alon Halevy and Wang-Chiew Tan
	Using Author Embeddings to Improve Tweet Stance Classification Adrian Benton and Mark Dredze
	Low-resource named entity recognition via multi-source projection: Not quite there yet? Jan Vium Enghoff, Søren Harrison and Željko Agić
	A Case Study on Learning a Unified Encoder of Relations Lisheng Fu, Bonan Min, Thien Huu Nguyen and Ralph Grishman
	Convolutions Are All You Need (For Classifying Character Sequences) Zach Wood-Doughty, Nicholas Andrews and Mark Dredze
	MTNT: A Testbed for Machine Translation of Noisy Text Paul Michel and Graham Neubig
	A Robust Adversarial Adaptation for Unsupervised Word Translation Kazuma Hashimoto, Ehsan Hosseini-Asl, Caiming Xiong and Richard Socher
	A Comparative Study of Embeddings Methods for Hate Speech Detection from Tweets Shashank Gupta and Zeerak Waseem
	Step or Not: Discriminator for The Real Instructions in User-generated Recipes Shintaro Inuzuka, Takahiko Ito and Jun Harashima
	Named Entity Recognition on Noisy Data using Images and Text Diego Esteves
	Handling Noise in Distributional Semantic Models for Large Scale Text Analytics and Media Monitoring Peter Sumbler, Nina Viereckel, Nazanin Afsarmanesh and Jussi Karlgren
	Combining Human and Machine Transcriptions on the Zooniverse Platform Daniel Hanson and Andrea Simenstad
	Predicting Good Twitter Conversations Zach Wood-Doughty, Prabhanjan Kambadur and Gideon Mann
	Automated opinion detection analysis of online conversations Yuki M Asano, Niccolo Pescetelli and Jonas Haslbeck
	Classification of Written Customer Requests: Dealing with Noisy Text and Labels Viljami Laurmaa and Mostafa Ajallooeian
15:15–16:30	Poster Session
16:30–17:15	Invited Talk: Daniel Preoţiuc-Pietro User Trait Expression and Portrayal through Social Media
17:15–17:30	Closing and Best Paper Awards

Important Dates

Submission Deadline (long & short papers): ~~Wednesday, July 25~~
Submission Deadline (1-page unarchived abstracts): ~~Wednesday, August 15~~
Acceptance Notification: ~~Friday, August 17~~
Camera-Ready: ~~Friday, August 31~~
Workshop day: Thursday, November 1

Call for Papers

We seek submissions of long and short papers on original and unpublished work (same page limit EMNLP main conference). 1-page abstracts on work-in-progress or work published elsewhere are also welcome and will *not* be included in the conference proceedings. All accepted submissions will be presented as posters. Additionally, selected submissions will be presented orally.

Topics of interest include but are not limited to:

NLP Preprocessing of Noisy Text

Part of speech tagging
Named entity tagging, including a wide range of categories, e.g. product names
Chunking of user-generated text
Parsing

Text Normalization and Error Correction

Normalizing noisy text for downstream tasks and for human readability
Error detection and correction

Multilingual NLP in noisy text
Sentiment analysis
Crowdsourcing of text data
User prediction, e.g. geolocation, gender, age, etc
Stylistics, e.g. formality, politeness, etc
Colloquial language, e.g. code-switching, idiom detection
Bilingual translation of the noisy text
Paraphrase identification and semantic similarity of short text or noisy text
Information extraction from noisy text
Domain adaptation to user-generated text
Geolocation prediction
Global and regional trend detection and event extraction
Detecting rumors, contradictory information, sarcasm and humors on social media
Extracting user demographics, profiles, and major life events
Temporal aspects of user-generated content (resolving time expressions, concept drift, diachronic analyses, etc...)

All submissions should conform to EMNLP 2018 style guidelines. Long and short paper submissions must be anonymized. Abstract submissions should include author information (and where the work was published in a footnote on the front page, if applicable). Please submit your papers at the SoftConf link.

Double Submission Policy: Papers that have been or will be submitted to other meetings or publications must indicate at submission time. Authors of a paper accepted for presentation must notify the workshop organizers by the camera-ready deadline as to whether the paper will be presented or withdrawn. (Exception: 1-page abstracts can be work-in-progress or work published elsewhere.)

Program Committee

Muhammad Abdul-Mageed (University of British Columbia)
Nikolaos Aletras (University of Sheffield)
Hadi Amiri (Harvard University)
Anietie Andy (University of Pennsylvania)
Eiji Aramaki (NAIST)
Isabelle Augenstein (University of Copenhagen)
Francesco Barbieri (UPF Barcelona)
Cosmin Bejan (Vanderbilt University)
Eduardo Blanco (University of North Texas)
Su Lin Blodgett (UMass Amherst)
Xilun Chen (Cornell University)
Colin Cherry (Google Translate)
Jackie Chi Kit Cheung (McGill University)
Anne Cocos (University of Pennsylvania)
Arman Cohan (AI2)
Paul Cook (University of New Brunswick)
Marina Danilevsky (IBM Research)
Leon Derczynski (IT-University of Copenhagen)
Seza Doğruöz (Tilburg University)
Xinya Du (Cornell University)
Heba Elfardy (Amazon)
Dan Garrette (Google Research)
Dan Goldwasser (Purdue University)
Masato Hagiwara (Duolingo)
Bo Han (Kaplan)
Hua He (Amazon)
Yulan He (Aston University)
Jack Hessel (Cornell University)
Jing Jiang (Singapore Management University)
Kristen Johnson (Purdue University)
David Jurgens (University of Michigan)
Nobuhiro Kaji (Yahoo! Research)
Arzoo Katiyar (Cornell University)
Emre Kiciman (Microsoft Research)
Svetlana Kiritchenko (National Research Council Canada)
Roman Klinger (University of Stuttgart)
Vivek Kulkarni (University of California Santa Barbara)
Jonathan Kummerfeld (University of Michigan)
Wuwei Lan (Ohio State University)
Piroska Lendvai (University of Göttingen)
Jing Li (Tencent AI)
Jessy Junyi Li (University of Texas Austin)
Maria Liakata (University of Warwick)
Nut Limsopatham (University of Glasgow)
Patrick Littell (National Research Council Canada)
Zhiyuan Liu (Tsinghua University)
Nikola Ljubešić (University of Zagreb)
Wei-Yun Ma (Academia Sinica)
Nitin Madnani (Educational Testing Service)
Héctor Martínez Alonso (INRIA)
Aaron Masino (The Children's Hospital of Philadelphia)
Chandler May (Johns Hopkins University)
Rada Mihalcea (University of Michigan)
Smaranda Muresan (Columbia University)
Preslav Nakov (Qatar Computing Research Institute)
Courtney Napoles (Grammarly)
Vincent Ng (University of Texas at Dallas)
Eric Nichols (Honda Research Institute)
Alice Oh (KAIST)
Naoaki Okazaki (Tohoku University)
Myle Ott (Facebook AI)
Michael Paul (University of Colorado Boulder)
Umashanthi Pavalanathan (Georgia Tech)
Ellie Pavlick (Brown University)
Barbara Plank (University of Groningen)
Daniel Preoţiuc-Pietro (Bloomberg)
Ashequl Qadir (Philips Research)
Preethi Raghavan (IBM Research)
Marek Rei (University of Cambridge)
Roi Reichart (Technion)
Alla Rozovskaya (City University of New York)
Mugizi Rwebangira (Howard University)
Keisuke Sakaguchi (Johns Hopkins University)
Maarten Sap (University of Washington)
Andrew Schwartz (Stony Brook University)
Djamé Seddah (University Paris-Sorbonne)
Satoshi Sekine (New York University)
Hiroyuki Shindo (NAIST)
Jan Šnajder (University of Zagreb)
Thamar Solorio (University of Houston)
Richard Sproat (Google Resarch)
Gabriel Stanovsky (AI2)
Ian Stewart (Georgia Tech)
Jeniya Tabassum (Ohio State University)
Oren Tsur (Harvard University/Northeastern University)
Rob van der Goot (University of Groningen)
Svitlana Volkova (Pacific Northwest National Laboratory)
Byron Wallace (Northeastern University)
Xiaojun Wan (Peking University)
Zhongyu Wei (Fudan University)
Diyi Yang (Carnegie Mellon University)
Yi Yang (Bloomberg)
Guido Zarrella (MITRE)
Justine Zhang (Cornell University)

Sponsored by

Anti-harassment Policy