Nov 19, 2020 -- WNUT workshop is going virtual together with EMNLP 2020
The WNUT workshop focuses on Natural Language Processing applied to noisy user-generated text, such as that found in social media, online reviews, crowdsourced data, web forums, clinical records and language learner essays. The workshop hashtag is #wnut.
News! We will hold our workshop completely live online (registration for EMNLP 2020 is now open) -- 4 live invited talks with QA, 1-min or 5-min live talks for 33 regular papers, as well as interactive social event for two different time zones (4:00-8:00 GMT and 15:00-19:00 GMT -- click for a more detailed schedule). We accepted 33 regular workshop papers and 47 shared-task papers.
We are organizing three shared-tasks:
(1) Entity and relation recognition over wet-lab protocols. Data is released on June 08, 2020! Official evaluation will be August 31 ~ September 4 (entity) and September 9 ~ September 15 (relation), 2020.
(2) Identification of informative COVID-19 English Tweets. Data is released on June 21, 2020! Official evaluation will be August 17 ~ 21, 2020.
(3) COVID-19 Event Extraction from Twitter. Data is released on June 22, 2020! Official evaluation will be September 7 ~ 11, 2020.
Congratulations to the winners of the best paper awards, which are sponsored by Twitter this year:
|May I Ask Who’s Calling? Named Entity Recognition on Call Center Transcripts for Privacy Law Compliance|
|"Did you really mean what you said?" : Sarcasm Detection in Hindi-English Code-Mixed Data using Bilingual Word Embeddings|
Akshita Aggarwal, Anshul Wadhawan, Anshima Chaudhary and Kavita Maurya
|Noisy Text Data: Achilles’ Heel of BERT|
Ankit Kumar, Piyush Makhija and Anuj Gupta
|Determining Question-Answer Plausibility in Crowdsourced Datasets Using Multi-Task Learning|
Rachel Gardner, Maya Varma, Clare Zhu and Ranjay Krishna
|Combining BERT with Static Word Embeddings for Categorizing Social Media|
Israa Alghanmi, Luis Espinosa Anke and Steven Schockaert
|Enhanced Sentence Alignment Network for Efficient Short Text Matching|
Zhe Hu, Zuohui Fu, Cheng Peng and Weiwei Wang
|PHINC: A Parallel Hinglish Social Media Code-Mixed Corpus for Machine Translation|
Vivek Srivastava and Mayank Singh
|Cross-lingual sentiment classification in low-resource Bengali language|
|The Non-native Speaker Aspect: Indian English in Social Media|
Rupak Sarkar, Sayantan Mahinder and Ashiqur KhudaBukhsh
|Sentence Boundary Detection on Line Breaks in Japanese|
Yuta Hayashibe and Kensuke Mitsuzawa
|Non-ingredient Detection in User-generated Recipes using the Sequence Tagging Approach|
Yasuhiro Yamaguchi, Shintaro Inuzuka, Makoto Hiramatsu and Jun Harashima
|Generating Fact Checking Summaries for Web Claims|
Rahul Mishra, Dhruv Gupta and Markus Leippold
|Intelligent Analyses on Storytelling for Impact Measurement|
Koen Kicken, Tessa De Maesschalck, Bart Vanrumste, Tom De Keyser and Hee Reen Shim
|An Empirical Analysis of Human-Bot Interaction on Reddit|
Ming-Cheng Ma and John P. Lalor
|Detecting Trending Terms in Cybersecurity Forum Discussions|
Jack Hughes, Seth Aycock, Andrew Caines, Paula Buttery and Alice Hutchings
|Service registration chatbot: collecting and comparing dialogues from AMT workers and service’s users|
Luca Molteni, Mittul Singh, Juho Leinonen, Katri Leino, Mikko Kurimo and Emanuele Della Valle
|Automated Assessment of Noisy Crowdsourced Free-text Answers for Hindi in Low Resource Setting|
Dolly Agarwal, Somya Gupta and Nishant Baghel
|Punctuation Restoration using Transformer Models for Resource-Rich and -Poor Languages|
Tanvirul Alam, Akib Khan and Firoj Alam
|Truecasing German user-generated conversational text|
Yulia Grishina, Thomas Gueudre and Ralf Winkler
|Fine-Tuning MT systems for Robustness to Second-Language Speaker Variations|
Md Mahfuz Ibn Alam and Antonios Anastasopoulos
|Impact of ASR on Alzheimer’s Disease Detection: All Errors are Equal, but Deletions are More Equal than Others|
Aparna Balagopalan, Ksenia Shkaruta and Jekaterina Novikova
|Detecting Entailment in Code-Mixed Hindi-English Conversations|
Sharanya Chakravarthy, Anjana Umapathy and Alan W Black
|Detecting Objectifying Language in Online Professor Reviews|
Angie Waller and Kyle Gorman
|Annotation Efficient Language Identification from Weak Labels|
Shriphani Palakodety and Ashiqur KhudaBukhsh
|Fantastic Features and Where to Find Them: Detecting Cognitive Impairment with a Subsequence Classification Guided Approach|
Ben Eyre, Aparna Balagopalan and Jekaterina Novikova
|Quantifying the Evaluation of Heuristic Methods for Textual Data Augmentation|
Omid Kashefi and Rebecca Hwa
|An Empirical Survey of Unsupervised Text Representation Methods on Twitter Data|
Lili Wang, Chongyang Gao, Jason Wei, Weicheng Ma, Ruibo Liu and Soroush Vosoughi
|Civil Unrest on Twitter (CUT): A Dataset of Tweets to Support Research on Civil Unrest|
Justin Sech, Alexandra DeLucia, Anna L Buczak and Mark Dredze
|Tweeki: Linking Named Entities on Twitter to a Knowledge Graph|
Bahareh Harandizadeh and Sameer Singh
|Representation learning of writing style|
Julien Hay, Bich-Lien Doan, Fabrice Popineau and Ouassim AIT ELHARA
|"A Little Birdie Told Me ... " - Social Media Rumor Detection|
Karthik Radhakrishnan, Tushar Kanakagiri, Sharanya Chakravarthy and Vidhisha Balachandran
|Paraphrase Generation via Adversarial Penalizations|
Gerson Vizcarra and Jose Ochoa-Luna
|WNUT-2020 Task 1 Overview: Extracting Entities and Relations from Wet Lab Protocols|
Jeniya Tabassum, Wei Xu and Alan Ritter
|WNUT-2020 Task 2: Identification of Informative COVID-19 English Tweets|
Dat Quoc Nguyen, Thanh Vu, Afshin Rahimi, Mai Hoang Dao, Linh The Nguyen and Long Doan
We seek submissions of
long and short papers on original and unpublished work (same page limit EMNLP main conference). All accepted submissions will be presented as pre-recorded talks at the workshop, following the EMNLP 2020 main conference (more details here).
Topics of interest include but are not limited to:
Lab protocols specify steps in performing a lab procedure. They are noisy, dense, and domain-specific. Automatic or semi-automatic conversion of protocols into machine-readable format benefits biological research. In this task, system entries are invited for event recognition and relation extraction over these lab protocols. Note that these protocols are written by researchers and lab technicians worldwide, some of which may contain non-standard language or spelling errors. Here's a sample of the input data:
Initial data is released on June 8, 2020. Please register here to receive future data for the official evaluation (Aug 31 - Sep 4, 2020).
Details on the shared task are here. Contacts: Jeniya Tabassum, Wei Xu, Alan Ritter.
The goals of this shared task are: (1) To develop a language processing task that potentially impacts research and downstream applications, and (2) To provide the community with a new dataset for identifying informative COVID-19 English Tweets.
For this task, participants are asked to develop systems that automatically identify whether an English Tweet related to the novel coronavirus (COVID-19) is informative or not. Such informative Tweets provide information about recovered, suspected, confirmed and death cases as well as location or travel history of the cases. The dataset and systems developed for this shared task will be beneficial for the development of COVID-19 related monitoring systems.
Details on the shared task are here. Contacts: Dat Quoc Nguyen, Thanh Vu, Afshin Rahimi.
People usually share a wide variety of information related to COVID-19 publicly on social media. For example, Twitter users often indicate when they might be at increased risk of COVID-19 due to a coworker or other close contact testing positive for the virus, or when they have symptoms but were denied access to testing. In this shared task, participants are invited to develop systems that automatically extract COVID-19 related events from Twitter using our newly built corpus. Here is an example of our annotated data:
Initial data has been released on June 22, 2020. Please register here to receive future data for the official evaluation (Sep 7 - Sep 11, 2020).
Details on the shared task are here. Contacts: Shi Zong, Wei Xu, Alan Ritter.