2017 The 3rd Workshop on Noisy User-generated Text (W-NUT)
Hosted by EMNLP 2017 (last year at COLING)
The WNUT workshop focuses on Natural Language Processing applied to noisy user-generated text, such as that found in social media, web forums, online reviews, clinic records and language learner essays. This year, there will be two shared tasks - details to be announced.
Call for Papers
We seek submissions of
regular papers on original and unpublished work (same page limit EMNLP main conference).
1-page abstracts on work-in-progress or work published elsewhere are also welcome and will *not* be included in the conference proceedings.
All accepted submissions will be presented as posters. Additionally, selected submissions will be presented orally. The shared-task participants are also encouraged (but not required) to submit
system description papers and present posters; the top systems will be invited (but not required) to present orally.
Topics of interest include but are not limited to:
- NLP Preprocessing of Noisy Text
- Part of speech tagging
- Named entity tagging, including a wide range of categories, e.g. product names
- Chunking of user-generated text
- Text Normalization and Error Correction
- Normalizing noisy text for downstream tasks and for human readability
- Error detection and correction
- Paraphrase identification and semantic similarity of short text or noisy text
- User prediction, e.g. geolocation, gender, age, etc
- Bilingual translation of noisy text
- Information extraction from noisy text
- Multilingual NLP in noisy text
- Colloquial language, e.g. idiom detection
- Domain adaptation to user-generated text
- Geolocation prediction
- Global and regional trend detection and event extraction
- Extracting user demographics, profiles and major life events
- Detecting rumors, contradictory information, sarcasms and humors on social media
- Sentiment analysis
- Temporal aspects of user-generated content (resolving time expressions, concept drift, diachronic analyses, etc...)
All submissions should conform to EMNLP 2017 style guidelines
. Long and short paper submissions must be anonymized. Abstract submissions should include author information (and where the work was published in a footnote on front page, if applicable). Please submit your papers at the softconf link
Shared task #1: Paraphrases and Semantic Similarity in Twitter
In this shared-task, we will provide a common ground for development and comparison of Paraphrase Identification and Semantic Similarity systems for the Twitter data. These two tasks are critical to many NLP applications, such as summarization, sentiment analysis, textual entailment and information extraction etc.
Shared task #2: Novel and Emerging Entity Recognition
This shared task focuses on identifying unusual, previously-unseen entities in the context of emerging discussions. Named entities form the basis of many modern approaches to other tasks (like event clustering and summarisation), but recall on them is a real problem in noisy text - even among annotators. This drop tends to be due to novel entities and surface forms. Take for example the tweet “so.. kktny in 30 mins?” - even human experts find entity kktny hard to detect and resolve. This task will evaluate the ability to detect and classify novel, emerging, singleton named entities in noisy text.
Organisers: Leon Derczynski (University of Sheffield), Marieke van Erp (VU University Amsterdam), Nut Limsopatham (University of Cambridge), Eric Nichols (Honda Research Institute, Japan)
Program Committee (draft)
Workshop and prize sponsors to be announced