2017 The 3rd Workshop on Noisy User-generated Text (W-NUT)

September 7th, Copenhagen (at EMNLP 2017)

NEW! WNUT 2018 will be co-located again with EMNLP! (Brussels, Belgium on Oct 31 or Nov 1, 2018)

The WNUT workshop focuses on Natural Language Processing applied to noisy user-generated text, such as that found in social media, online reviews, crowdsourced data, web forums, clinical records and language learner essays. This year, there will be one shared task on Entity Recognition - details below.

The workshop hashtag is #wnut.

We're excited about our two joint best paper winners! Thank you to Snap Inc. for the prize donation. In alphabetical order:

Workshop Organizers

Invited Speakers


9:05–9:50Invited Talk: Common Sense Knowledge as an Emergent Property of Neural Conversational Models (Bill Dolan)
9:50–10:35Oral Session I
9:50–10:05Boundary-based MWE segmentation with text partitioning
Jake Williams
10:05–10:20Towards the Understanding of Gaming Audiences by Modeling Twitch Emotes
Francesco Barbieri, Luis Espinosa Anke, Miguel Ballesteros, Juan Soler and Horacio Saggion
10:20–10:35Churn Identification in Microblogs using Convolutional Neural Networks with Structured Logical Knowledge
Mourad Gridach, Hatem Haddad and Hala Mulki
10:35–11:00Coffee Break
11:00–12:30Oral Session II
11:00–11:15To normalize, or not to normalize: The impact of normalization on Part-of-Speech tagging
Rob van der Goot, Barbara Plank and Malvina Nissim
11:15–11:30Constructing an Alias List for Named Entities during an Event
Anietie Andy, Mark Dredze, Mugizi Rwebangira and Chris Callison-Burch
11:30–11:45Incorporating Metadata into Content-Based User Embeddings
Linzi Xing and Michael J. Paul
11:45–12:00Simple Queries as Distant Labels for Predicting Gender on Twitter
Chris Emmery, Grzegorz Chrupała and Walter Daelemans
12:00–12:15A Dataset and Classifier for Recognizing Social Media English
Su Lin Blodgett, Johnny Wei and Brendan O’Connor
12:15–12:30Evaluating hypotheses in geolocation on a very large sample of Twitter
Bahar Salehi and Anders Søgaard
14:00–14:45Invited Talk: Tweets in Finance (Miles Osborne)
14:45–14:55Lightning Talks
 The Effect of Error Rate in Artificially Generated Data for Automatic Preposition and Determiner Correction
Fraser Bowen, Jon Dehdari and Josef Van Genabith
 An Entity Resolution Approach to Isolate Instances of Human Trafficking Online
Chirag Nagpal, Kyle Miller, Benedikt Boecking and Artur Dubrawski
 Noisy Uyghur Text Normalization
Osman Tursun and Ruket Cakici
 Crowdsourcing Multiple Choice Science Questions
Johannes Welbl, Nelson F. Liu and Matt Gardner
 A Text Normalisation System for Non-Standard English Words
Emma Flint, Elliot Ford, Olivia Thomas, Andrew Caines and Paula Buttery
 Huntsville, hospitals, and hockey teams: Names can reveal your location
Bahar Salehi, Dirk Hovy, Eduard Hovy and Anders Søgaard
 Improving Document Clustering by Removing Unnatural Language
Myungha Jang, Jinho D. Choi and James Allan
 Lithium NLP: A System for Rich Information Extraction from Noisy User Generated Text on Social Media
Preeti Bhargava, Nemanja Spasojevic and Guoning Hu
14:55–15:30Shared Task Session
14:55–15:10Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition
Leon Derczynski, Eric Nichols, Marieke van Erp and Nut Limsopatham
15:10–15:20A Multi-task Approach for Named Entity Recognition in Social Media Data
Gustavo Aguilar, Suraj Maharjan, Adrian Pastor López Monroy and Thamar Solorio
15:20–15:30Distributed Representation, LDA Topic Modelling and Deep Learning for Emerging Named Entity Recognition from Social Media
Patrick Jansson and Shuhua Liu
15:30–15:35Shared Task Lightning Talks
 Multi-channel BiLSTM-CRF Model for Emerging Named Entity Recognition in Social Media
Bill Y. Lin, Frank Xu, Zhiyi Luo and Kenny Zhu
 Transfer Learning and Sentence Level Features for Named Entity Recognition on Tweets
Pius von Däniken and Mark Cieliebak
 Context-Sensitive Recognition for Emerging and Rare Entities
Jake Williams and Giovanni Santia
 A Feature-based Ensemble Approach to Recognition of Emerging and Rare Named Entities
Utpal Kumar Sikdar and Björn Gambäck
15:35–16:30Poster Session
16:30–17:15Invited Talk: Modeling Language as a Social Construct (Dirk Hovy)
17:15–17:30Closing and Best Paper Awards


Important Dates

Call for Papers

We seek submissions of regular papers on original and unpublished work (same page limit EMNLP main conference). 1-page abstracts on work-in-progress or work published elsewhere are also welcome and will *not* be included in the conference proceedings. All accepted submissions will be presented as posters. Additionally, selected submissions will be presented orally. Shared task participants are also encouraged (but not required) to submit system description papers and present posters; the top systems will be invited (but not required) to present orally.

Topics of interest include but are not limited to:

All submissions should conform to EMNLP 2017 style guidelines. Long and short paper submissions must be anonymized. Abstract submissions should include author information (and where the work was published in a footnote on front page, if applicable). Please submit your papers at the softconf link

Shared task: Novel and Emerging Entity Recognition

This shared task focuses on identifying unusual, previously-unseen entities in the context of emerging discussions. Named entities form the basis of many modern approaches to other tasks (like event clustering and summarisation), but recall on them is a real problem in noisy text - even among annotators. This drop tends to be due to novel entities and surface forms. Take for example the tweet “so.. kktny in 30 mins?” - even human experts find entity kktny hard to detect and resolve. This task will evaluate the ability to detect and classify novel, emerging, singleton named entities in noisy text.

Organisers: Leon Derczynski (University of Sheffield), Marieke van Erp (VU University Amsterdam), Nut Limsopatham (University of Cambridge), Eric Nichols (Honda Research Institute, Japan)

Registration form here

Full details, dates, data etc are on the Emerging and Rare Entity Recognition task page.

Program Committee