2017 The 3rd Workshop on Noisy User-generated Text (W-NUT)

Hosted by EMNLP 2017 (last year at COLING)

The WNUT workshop focuses on Natural Language Processing applied to noisy user-generated text, such as that found in social media, web forums, online reviews, clinic records and language learner essays. This year, there will be two shared tasks - details to be announced.

Workshop Organizers


Invited Speakers


Important Dates


Call for Papers

We seek submissions of regular papers on original and unpublished work (same page limit EMNLP main conference). 1-page abstracts on work-in-progress or work published elsewhere are also welcome and will *not* be included in the conference proceedings. All accepted submissions will be presented as posters. Additionally, selected submissions will be presented orally. The shared-task participants are also encouraged (but not required) to submit system description papers and present posters; the top systems will be invited (but not required) to present orally.

Topics of interest include but are not limited to:

All submissions should conform to EMNLP 2017 style guidelines. Long and short paper submissions must be anonymized. Abstract submissions should include author information (and where the work was published in a footnote on front page, if applicable). Please submit your papers at the softconf link (TBA).

Shared task #1: Paraphrases and Semantic Similarity in Twitter

In this shared-task, we will provide a common ground for development and comparison of Paraphrase Identification and Semantic Similarity systems for the Twitter data. These two tasks are critical to many NLP applications, such as summarization, sentiment analysis, textual entailment and information extraction etc.

Shared task #2: Novel and Emerging Entity Recognition

This shared task focuses on identifying unusual, previously-unseen entities in the context of emerging discussions. Named entities form the basis of many modern approaches to other tasks (like event clustering and summarisation), but recall on them is a real problem in noisy text - even among annotators. This drop tends to be due to novel entities and surface forms. Take for example the tweet “so.. kktny in 30 mins?” - even human experts find entity kktny hard to detect and resolve. This task will evaluate the ability to detect and classify novel, emerging, singleton named entities in noisy text.

Organisers: Leon Derczynski (University of Sheffield), Marieke van Erp (VU University Amsterdam), Nut Limsopatham (University of Cambridge), Eric Nichols (Honda Research Institute, Japan)


Program Committee (draft)



Workshop and prize sponsors to be announced