W-NUT 2017: Workshop on Noisy User-generated Text (at EMNLP)

2017 The 3rd Workshop on Noisy User-generated Text (W-NUT)

September 7th, Copenhagen (at EMNLP 2017)

NEW! WNUT 2018 will be co-located again with EMNLP! (Brussels, Belgium on Oct 31 or Nov 1, 2018)

The WNUT workshop focuses on Natural Language Processing applied to noisy user-generated text, such as that found in social media, online reviews, crowdsourced data, web forums, clinical records and language learner essays. This year, there will be one shared task on Entity Recognition - details below.

The workshop hashtag is #wnut.

We're excited about our two joint best paper winners! Thank you to Snap Inc. for the prize donation. In alphabetical order:

Francesco Barbieri, Luis Espinosa Anke, Miguel Ballesteros, Juan Soler and Horacio Saggion:
Towards the Understanding of Gaming Audiences by Modeling Twitch Emotes
Su Lin Blodgett, Johnny Wei and Brendan O’Connor:
A Dataset and Classifier for Recognizing Social Media English

Workshop Organizers

Leon Derczynski (The University of Sheffield)
Wei Xu (The Ohio State University)
Alan Ritter (The Ohio State University)
Tim Baldwin (The University of Melbourne)

Invited Speakers

Bill Dolan (Microsoft Research)
Dirk Hovy (University of Copenhagen)
Miles Osborne (Bloomberg)

Program

9:00–9:05	Opening
9:05–9:50	Invited Talk: Common Sense Knowledge as an Emergent Property of Neural Conversational Models (Bill Dolan)
9:50–10:35	Oral Session I
9:50–10:05	Boundary-based MWE segmentation with text partitioning Jake Williams
10:05–10:20	Towards the Understanding of Gaming Audiences by Modeling Twitch Emotes Francesco Barbieri, Luis Espinosa Anke, Miguel Ballesteros, Juan Soler and Horacio Saggion
10:20–10:35	Churn Identification in Microblogs using Convolutional Neural Networks with Structured Logical Knowledge Mourad Gridach, Hatem Haddad and Hala Mulki
10:35–11:00	Coffee Break
11:00–12:30	Oral Session II
11:00–11:15	To normalize, or not to normalize: The impact of normalization on Part-of-Speech tagging Rob van der Goot, Barbara Plank and Malvina Nissim
11:15–11:30	Constructing an Alias List for Named Entities during an Event Anietie Andy, Mark Dredze, Mugizi Rwebangira and Chris Callison-Burch
11:30–11:45	Incorporating Metadata into Content-Based User Embeddings Linzi Xing and Michael J. Paul
11:45–12:00	Simple Queries as Distant Labels for Predicting Gender on Twitter Chris Emmery, Grzegorz Chrupała and Walter Daelemans
12:00–12:15	A Dataset and Classifier for Recognizing Social Media English Su Lin Blodgett, Johnny Wei and Brendan O’Connor
12:15–12:30	Evaluating hypotheses in geolocation on a very large sample of Twitter Bahar Salehi and Anders Søgaard
12:30–14:00	Lunch
14:00–14:45	Invited Talk: Tweets in Finance (Miles Osborne)
14:45–14:55	Lightning Talks
	The Effect of Error Rate in Artificially Generated Data for Automatic Preposition and Determiner Correction Fraser Bowen, Jon Dehdari and Josef Van Genabith
	An Entity Resolution Approach to Isolate Instances of Human Trafficking Online Chirag Nagpal, Kyle Miller, Benedikt Boecking and Artur Dubrawski
	Noisy Uyghur Text Normalization Osman Tursun and Ruket Cakici
	Crowdsourcing Multiple Choice Science Questions Johannes Welbl, Nelson F. Liu and Matt Gardner
	A Text Normalisation System for Non-Standard English Words Emma Flint, Elliot Ford, Olivia Thomas, Andrew Caines and Paula Buttery
	Huntsville, hospitals, and hockey teams: Names can reveal your location Bahar Salehi, Dirk Hovy, Eduard Hovy and Anders Søgaard
	Improving Document Clustering by Removing Unnatural Language Myungha Jang, Jinho D. Choi and James Allan
	Lithium NLP: A System for Rich Information Extraction from Noisy User Generated Text on Social Media Preeti Bhargava, Nemanja Spasojevic and Guoning Hu
14:55–15:30	Shared Task Session
14:55–15:10	Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition Leon Derczynski, Eric Nichols, Marieke van Erp and Nut Limsopatham
15:10–15:20	A Multi-task Approach for Named Entity Recognition in Social Media Data Gustavo Aguilar, Suraj Maharjan, Adrian Pastor López Monroy and Thamar Solorio
15:20–15:30	Distributed Representation, LDA Topic Modelling and Deep Learning for Emerging Named Entity Recognition from Social Media Patrick Jansson and Shuhua Liu
15:30–15:35	Shared Task Lightning Talks
	Multi-channel BiLSTM-CRF Model for Emerging Named Entity Recognition in Social Media Bill Y. Lin, Frank Xu, Zhiyi Luo and Kenny Zhu
	Transfer Learning and Sentence Level Features for Named Entity Recognition on Tweets Pius von Däniken and Mark Cieliebak
	Context-Sensitive Recognition for Emerging and Rare Entities Jake Williams and Giovanni Santia
	A Feature-based Ensemble Approach to Recognition of Emerging and Rare Named Entities Utpal Kumar Sikdar and Björn Gambäck
15:35–16:30	Poster Session
16:30–17:15	Invited Talk: Modeling Language as a Social Construct (Dirk Hovy)
17:15–17:30	Closing and Best Paper Awards

Important Dates

Submission Deadline: Friday, June 9
Reviews Due: Tuesday, June 30
Notification: Friday, July 2
Camera-Ready: Friday, July 14

Call for Papers

We seek submissions of regular papers on original and unpublished work (same page limit EMNLP main conference). 1-page abstracts on work-in-progress or work published elsewhere are also welcome and will *not* be included in the conference proceedings. All accepted submissions will be presented as posters. Additionally, selected submissions will be presented orally. Shared task participants are also encouraged (but not required) to submit system description papers and present posters; the top systems will be invited (but not required) to present orally.

Topics of interest include but are not limited to:

NLP Preprocessing of Noisy Text

Part of speech tagging
Named entity tagging, including a wide range of categories, e.g. product names
Chunking of user-generated text
Parsing

Text Normalization and Error Correction

Normalizing noisy text for downstream tasks and for human readability
Error detection and correction

Multilingual NLP in noisy text
Sentiment analysis
Crowdsourcing of text data
User prediction, e.g. geolocation, gender, age, etc
Stylistics, e.g. formality, politeness, etc
Colloquial language, e.g. code-switching, idiom detection
Bilingual translation of noisy text
Paraphrase identification and semantic similarity of short text or noisy text
Information extraction from noisy text
Domain adaptation to user-generated text
Geolocation prediction
Global and regional trend detection and event extraction
Detecting rumors, contradictory information, sarcasms and humors on social media
Extracting user demographics, profiles and major life events
Temporal aspects of user-generated content (resolving time expressions, concept drift, diachronic analyses, etc...)

All submissions should conform to EMNLP 2017 style guidelines. Long and short paper submissions must be anonymized. Abstract submissions should include author information (and where the work was published in a footnote on front page, if applicable). Please submit your papers at the softconf link

Shared task: Novel and Emerging Entity Recognition

This shared task focuses on identifying unusual, previously-unseen entities in the context of emerging discussions. Named entities form the basis of many modern approaches to other tasks (like event clustering and summarisation), but recall on them is a real problem in noisy text - even among annotators. This drop tends to be due to novel entities and surface forms. Take for example the tweet “so.. kktny in 30 mins?” - even human experts find entity kktny hard to detect and resolve. This task will evaluate the ability to detect and classify novel, emerging, singleton named entities in noisy text.

Organisers: Leon Derczynski (University of Sheffield), Marieke van Erp (VU University Amsterdam), Nut Limsopatham (University of Cambridge), Eric Nichols (Honda Research Institute, Japan)

Registration form here

Full details, dates, data etc are on the Emerging and Rare Entity Recognition task page.

Program Committee

Anietie Andy (Howard University/UPenn)
Su Lin Blodgett (UMass Amherst)
Colin Cherry (National Research Council Canada)
Paul Cook (University of New Brunswick)
Marina Danilevsky (IBM Research)
Seza Doğruöz (Tilburg University)
Heba Elfardy (Columbia University)
Dan Garrette (Google Research)
Weiwei Guo (LinkedIn)
Masato Hagiwara (Duolingo)
Hua He (University of Maryland)
Yulan He (Aston University)
Dirk Hovy (University of Copenhagen)
Jing Jiang (Singapore Management University)
Nobuhiro Kaji (Yahoo! Research)
Piroska Lendvai (University of Göttingen)
Wuwei Lan (Ohio State University)
Jessy Li (UPenn / UT Austin)
Sujian Li (Peking University)
Jiwei Li (Stanford University)
Chen Li (University of Texas at Dallas)
Patrick Littell (Carnegie Mellon University)
Huan Liu (Arizona State University)
Zhiyuan Liu (Tsinghua University)
Wei-Yun Ma (Academia Sinica)
Héctor Martínez Alonso (INRIA)
Chandra May (Johns Hopkins University)
Rada Mihalcea (University of Michigan)
Preslav Nakov (Qatar Computing Research Institute)
Eric Nichols (Honda Research Institute)
Brendan O'Connor (Umass Amherst)
Naoaki Okazaki (Tohoku University)
Siddharth Patwardhan (Apple)
Ellie Pavlick (University of Pennsylvania)
Bryan Perozzi (Google Research)
Barbara Plank (University of Groningen)
Daniel Preoţiuc-Pietro (University of Pennsylvania)
Preethi Raghavan (IBM Research)
Afshin Rahimi (The University of Melbourne)
Roi Reichart (Technion)
Alla Rozovskaya (City University of New York)
Mugizi Rwebangira (Howard University)
Djamé Seddah (University Paris-Sorbonne)
Hiroyuki Shindo (NAIST)
Richard Sproat (Google Research)
Veselin Stoyanov (Facebook)
Jeniya Tabassum (Ohio State University)
Marlies van der Wees (University of Amsterdam)
Svitlana Volkova (Pacific Northwest National Laboratory)
Byron Wallace (Northeastern University)
Diyi Yang (Carnegie Mellon University)
Yi Yang (Georgia Tech)
Guido Zarrella (MITRE)

Best Paper Award Sponsored by

Anti-harassment Policy