July 31 2015, CNCC 303A, Beijing, China.
This workshop focuses on core Natural Language Processing tasks applied to noisy user-generated text, such as that found in social media, web forums, online reviews and language learner essays. The workshop will host two shared tasks: 1) Named Entity Recognition in Twitter and 2) Normalization of Noisy Text.
We would like to thank the speakers, presenters and attendees for making WNUT-2015 a success (printable poster). See you in 2016.
Best Paper Awards:
Challenges of studying and processing dialects in social media
Anna Jørgensen, Dirk Hovy and Anders Søgaard
Five Shades of Noise: Analyzing Machine Translation Errors in User-Generated Text
Marlies van der Wees, Arianna Bisazza and Christof Monz
Title: Text Mining of Social Media: Going beyond the Text and Only the Text [slides]
Abstract:
As a field, NLP is inevitably fixated on the analysis of textual content, but
this is all too often to the exclusion of context, whether textual or
otherwise. Text mining over social media sources provides a myriad of
opportunities to integrate text and context analysis, as I will explore in
this talk.
Title: Are Minority Dialects "Noisy Text"?: Implications of Social and Linguistic Diversity for Social Media NLP [slides]
Abstract: Social media, SMS, and other genres of online conversational text look different than newspapers or academic papers. They feature tremendous lexical diversity, alternate spellings, and grammatical constructs not seen in standard English. Is this “noisy” text? Do we see this because users are lazy? Because of different input affordances? Or different speech acts and performative contexts? I argue that a simple look at the data reveals a largely ignored source of linguistic diversity: the massive *sociodemographic* diversity of these texts’ authors, and the fact they use these forums to communicate in ways consistent with these markers of social identity.
Title: Where is language?
Abstract: In NLP, we typically want to show that system A is better than system B for some language L, but in practice, that’s never what we do. This talk questions whether the enterprise makes sense, but also proposes a more robust framework for evaluating NLP systems.
Title: Automated Grammatical Error Co rrection for Language Learners: Where are we, and where do we go from there?
Abstract: A fast growing area in Natural Language Processing is the use of
automated tools for identifying and correcting grammatical errors made
by language learners.
While there have been many exciting developments in GEC over the last few years,
there is still considerable room for improvement as state-of-the-art
performance in detecting and correcting several important error types
is still inadequate for many real world applications. In this talk, I
will provide an overview of the field of automated grammatical error
correction, including its history, leading methodologies and its
particular set of challenges.
Friday, July 31, 2015 | |
9:00–10:30 | Invited Talks |
9:00–9:45 | Text Mining of Social Media: Going beyond the Text and Only the Text Tim Baldwin |
9:45–10:30 | Where is Language? Anders Søgaard |
10:30–11:00 | Coffee Break |
11:00–12:30 | Long Papers and Abstracts |
11:00–11:15 | Learning finite state word representations for unsupervised Twitter adaptation of POS taggers Julie Wulff and Anders Søgaard |
11:15–11:30 | Towards POS Tagging for Arabic Tweets Fahad Albogamy and Allan Ramasy |
11:30–11:45 | Minority Language Twitter: Part-of-Speech Tagging and Analysis of Irish Tweets Teresa Lynn, Kevin Scannell and Eimear Maguire |
11:45–11:00 | Challenges of studying and processing dialects in social media Anna Jørgensen, Dirk Hovy and Anders Søgaard |
12:00–12:15 | Toward Tweets Normalization Using Maximum Entropy Mohammad Arshi Saloot, Norisma Idris, Liyana Shuib, Ram Gopal Raj and AiTi Aw |
12:15–12:30 | Five Shades of Noise: Analyzing Machine Translation Errors in User-Generated Text Marlies van der Wees, Arianna Bisazza and Christof Monz |
12:30–14:00 | Poster Session and Lunch |
Learning finite state word representations for unsupervised Twitter adaptation of POS taggers Julie Wulff and Anders Søgaard | |
Towards POS Tagging for Arabic Tweets Fahad Albogamy and Allan Ramasy | |
Minority Language Twitter: Part-of-Speech Tagging and Analysis of Irish Tweets Teresa Lynn, Kevin Scannell and Eimear Maguire | |
Challenges of studying and processing dialects in social media Anna Jørgensen, Dirk Hovy and Anders Søgaard | |
Toward Tweets Normalization Using Maximum Entropy Mohammad Arshi Saloot, Norisma Idris, Liyana Shuib, Ram Gopal Raj and AiTi Aw | |
Five Shades of Noise: Analyzing Machine Translation Errors in User-Generated Text Marlies van der Wees, Arianna Bisazza and Christof Monz | |
A Normalizer for UGC in Brazilian Portuguese Magali Sanches Duran, Maria das Graças Volpe Nunes and Lucas Avanço | |
USFD: Twitter NER with Drift Compensation and Linked Data Leon Derczynski, Isabelle Augenstein and Kalina Bontcheva | |
Enhancing Named Entity Recognition in Twitter Messages Using Entity Linking Ikuya Yamada, Hideaki Takeda and Yoshiyasu Takefuji | |
Improving Twitter Named Entity Recognition using Word Representations Zhiqiang Toh, Bin Chen and Jian Su | |
NRC: Infused Phrase Vectors for Named Entity Recognition in Twitter Colin Cherry, Hongyu Guo and Chengbi Dai | |
IITP: Multiobjective Differential Evolution based Twitter Named Entity Recognition Md Shad Akhtar, Utpal Kumar Sikdar and Asif Ekbal | |
Multimedia Lab @ ACL WNUT NER Shared Task: Named Entity Recognition for Twitter Microposts using Distributed Word Representations Fréderic Godin, Baptist Vandersmissen, Wesley De Neve and Rik Van de Walle | |
Data Adaptation for Named Entity Recognition on Tweets with Features-Rich CRF Tian Tian, Marco Dinarelli and Isabelle Tellier | |
Hallym: Named Entity Recognition on Twitter with Word Representation Eun-Suk Yang and Yu-Seop Kim | |
IHS_RD: Lexical Normalization for English Tweets Dmitry Supranovich and Viachaslau Patsepnia | |
Bekli:A Simple Approach to Twitter Text Normalization. Russell Beckley | |
NCSU-SAS-Ning: Candidate Generation and Feature Engineering for Supervised Lexical Normalization Ning Jin | |
DCU-ADAPT: Learning Edit Operations for Microblog Normalisation with the Generalised Perceptron Joachim Wagner and Jennifer Foster | |
LYSGROUP: Adapting a Spanish microtext normalization system to English. Yerai Doval Mosquera, Jesús Vilares and Carlos Gómez-Rodríguez | |
IITP: Hybrid Approach for Text Normalization in Twitter Md Shad Akhtar, Utpal Kumar Sikdar and Asif Ekbal | |
NCSU_SAS_WOOKHEE: A Deep Contextual Long-Short Term Memory Model for Text Normalization Wookhee Min and Bradford Mott | |
NCSU_SAS_SAM: Deep Encoding and Reconstruction for Normalization of Noisy Text Samuel Leeman-Munk, James Lester, and James Cox | |
USZEGED: Correction Type-sensitive Normalization of English Tweets Using Efficiently Indexed n-gram Statistics Gábor Berend and Ervin Tasnádi | |
14:00–15:30 | Shared Task Session |
14:00–14:30 | Shared Tasks of the 2015 Workshop on Noisy User-generated Text: Twitter Lexical Normalization and Named Entity Recognition Timothy Baldwin, Marie-Catherine de Marneffe, Bo Han, Young-Bum Kim, Alan Ritter and Wei Xu |
14:30–14:45 | Enhancing Named Entity Recognition in Twitter Messages Using Entity Linking Ikuya Yamada, Hideaki Takeda and Yoshiyasu Takefuji |
14:45–15:00 | Improving Twitter Named Entity Recognition using Word Representations Zhiqiang Toh, Bin Chen and Jian Su |
15:00–15:15 | Multimedia Lab @ ACL WNUT NER Shared Task: Named Entity Recognition for Twitter Microposts using Distributed Word Representations Fréderic Godin, Baptist Vandersmissen, Wesley De Neve and Rik Van de Walle |
15:15–15:30 | NCSU_SAS_SAM: Deep Encoding and Reconstruction for Normalization of Noisy Text Samuel Leeman-Munk, James Lester and James Cox |
15:30–16:00 | Coffee Break |
16:00–17:30 | Invited Talks |
16:00–16:45 | Automated Grammatical Error Correction for Language Learners: Where are we, and where do we go from there? Joel Tetreault |
16:45–17:30 | Are Minority Dialects "Noisy Text"?: Implications of Social and Linguistic Diversity for Social Media NLP Brendan O’Connor |
We seek submissions of long papers on original and unpublished work (up to 8 pages of content plus 2 extra pages for references). Abstracts (2-4 pages including references) on work-in-progress or work published elsewhere are also welcome and will *not* be included in the conference proceedings. All accepted submissions will be presented as posters. Additionally, a small number of selected submissions will be presented orally.
Topics of interest include but are not limited to: