2018 The 4th Workshop on Noisy User-generated Text (W-NUT)

Nov 1, 2018, Brussels, Belgium (at EMNLP 2018)

The WNUT workshop focuses on Natural Language Processing applied to noisy user-generated text, such as that found in social media, online reviews, crowdsourced data, web forums, clinical records and language learner essays. The workshop hashtag is #wnut.

We again have best paper award(s) sponsored by Microsoft Research this year.

NEW! WNUT 2019 will be co-located again with EMNLP! (Hong Kong, Nov 2-7)

NEW! We received 44 paper submissions this year.

NEW! Best paper awards:

Workshop Organizers

Invited Speakers


Thursday, November, 1, 2018

9:05–9:50Invited Talk: Leon Derczynski
Dimensions of Variation in User-generated Text
9:50–10:35Oral Session I
9:50–10:05Inducing a lexicon of sociolinguistic variables from code-mixed text
Philippa Shoemark, James Kirby and Sharon Goldwater
10:05–10:20Twitter Geolocation using Knowledge-Based Methods
Taro Miyazaki, Afshin Rahimi, Trevor Cohn and Timothy Baldwin
10:20–10:35Content Extraction and Lexical Analysis from Customer-Agent Interactions
Sergiu Nisioi, Anca Bucur and Liviu P. Dinu
10:35–11:00Tea Break
11:00–12:30Oral Session II
11:00–11:15Assigning people to tasks identified in email: The EPA dataset for addressee tagging for detected task intent
Revanth Rameshkumar, Peter Bailey, Abhishek Jha and Chris Quirk
11:15–11:30How do you correct run-on sentences it’s not as easy as it seems
Junchao Zheng, Courtney Napoles and Joel Tetreault
11:30–11:45A POS Tagging Model Designed for Learner English
Ryo Nagata, Tomoya Mizumoto, Yuta Kikuchi, Yoshifumi Kawasaki and Kotaro Funakoshi
11:45–12:00Normalization of Transliterated Words in Code-Mixed Data Using Seq2Seq Model & Levenshtein Distance
Soumil Mandal and Karthick Nanmaran
12:00–12:15Robust Word Vectors: Context-Informed Embeddings for Noisy Texts
Valentin Malykh, Taras Khakhulin and Varvara Logacheva
12:15–12:30Paraphrase Detection on Noisy Subtitles in Six Languages
Eetu Sjöblom, Mathias Creutz and Mikko Aulamo
14:00–14:45Invited Talk: Diyi Yang
Modeling Members' Social Roles and their Conversational Acts in Online Communities
14:45–15:15Lightning Talks
 Geocoding Without Geotags: A Text-based Approach for reddit
Keith Harrigian
 Distantly Supervised Attribute Detection from Reviews
Lisheng Fu and Pablo Barrio
 Using Wikipedia Edits in Low Resource Grammatical Error Correction
Adriane Boyd
 Empirical Evaluation of Character-Based Model on Neural Named-Entity Recognition in Indonesian Conversational Texts
Kemal Kurniawan and Samuel Louvan
 Orthogonal Matching Pursuit for Text Classification
Konstantinos Skianis, Nikolaos Tziortziotis and Michalis Vazirgiannis
 Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data
R. Andrew Kreek and Emilia Apostolova
 Detecting Code-Switching between Turkish-English Language Pair
Zeynep Yirmibeşoğlu and Gülşen Eryiğit
 Language Identification in Code-Mixed Data using Multichannel Neural Networks and Context Capture
Soumil Mandal and Anil Kumar Singh
 Modeling Student Response Times: Towards Efficient One-on-one Tutoring Dialogues
Luciana Benotti, Jayadev Bhaskaran, Sigtryggur Kjartansson and David Lang
 Preferred Answer Selection in Stack Overflow: Better Text Representations ... and Metadata, Metadata, Metadata
Steven Xu, Andrew Bennett, Doris Hoogeveen, Jey Han Lau and Timothy Baldwin
 Word-like character n-gram embedding
Geewook Kim, Kazuki Fukui and Hidetoshi Shimodaira
 Classification of Tweets about Reported Events using Neural Networks
Kiminobu Makino, Yuka Takei, Taro Miyazaki and Jun Goto
 Learning to Define Terms in the Software Domain
Vidhisha Balachandran, Dheeraj Rajagopal, Rose Catherine Kanjirathinkal and William Cohen
 FrameIt: Ontology Discovery for Noisy User-Generated Text
Dan Iter, Alon Halevy and Wang-Chiew Tan
 Using Author Embeddings to Improve Tweet Stance Classification
Adrian Benton and Mark Dredze
 Low-resource named entity recognition via multi-source projection: Not quite there yet?
Jan Vium Enghoff, Søren Harrison and Željko Agić
 A Case Study on Learning a Unified Encoder of Relations
Lisheng Fu, Bonan Min, Thien Huu Nguyen and Ralph Grishman
 Convolutions Are All You Need (For Classifying Character Sequences)
Zach Wood-Doughty, Nicholas Andrews and Mark Dredze
 MTNT: A Testbed for Machine Translation of Noisy Text
Paul Michel and Graham Neubig
 A Robust Adversarial Adaptation for Unsupervised Word Translation
Kazuma Hashimoto, Ehsan Hosseini-Asl, Caiming Xiong and Richard Socher
 A Comparative Study of Embeddings Methods for Hate Speech Detection from Tweets
Shashank Gupta and Zeerak Waseem
 Step or Not: Discriminator for The Real Instructions in User-generated Recipes
Shintaro Inuzuka, Takahiko Ito and Jun Harashima
 Named Entity Recognition on Noisy Data using Images and Text
Diego Esteves
 Handling Noise in Distributional Semantic Models for Large Scale Text Analytics and Media Monitoring
Peter Sumbler, Nina Viereckel, Nazanin Afsarmanesh and Jussi Karlgren
 Combining Human and Machine Transcriptions on the Zooniverse Platform
Daniel Hanson and Andrea Simenstad
 Predicting Good Twitter Conversations
Zach Wood-Doughty, Prabhanjan Kambadur and Gideon Mann
 Automated opinion detection analysis of online conversations
Yuki M Asano, Niccolo Pescetelli and Jonas Haslbeck
 Classification of Written Customer Requests: Dealing with Noisy Text and Labels
Viljami Laurmaa and Mostafa Ajallooeian
15:15–16:30Poster Session
16:30–17:15Invited Talk: Daniel Preoţiuc-Pietro
User Trait Expression and Portrayal through Social Media
17:15–17:30Closing and Best Paper Awards

Important Dates

Call for Papers

We seek submissions of long and short papers on original and unpublished work (same page limit EMNLP main conference). 1-page abstracts on work-in-progress or work published elsewhere are also welcome and will *not* be included in the conference proceedings. All accepted submissions will be presented as posters. Additionally, selected submissions will be presented orally.

Topics of interest include but are not limited to:

All submissions should conform to EMNLP 2018 style guidelines. Long and short paper submissions must be anonymized. Abstract submissions should include author information (and where the work was published in a footnote on the front page, if applicable). Please submit your papers at the SoftConf link.

Double Submission Policy: Papers that have been or will be submitted to other meetings or publications must indicate at submission time. Authors of a paper accepted for presentation must notify the workshop organizers by the camera-ready deadline as to whether the paper will be presented or withdrawn. (Exception: 1-page abstracts can be work-in-progress or work published elsewhere.)

Program Committee