From Noisy Text to Real-World Applications of LLMs
The WNUT workshop on language by people. We focus on language as it occurs in the real world, from noisy text to real-world applications of LLMs.
Shared Task
This year, we host MultiLexNorm2026: a shared task on multi-lingual lexical normalization with a focus on non Indo-european languages. After the success of our first MultiLexNorm shared task held in 2021, we have extended our benchmark to more varied languages. More information about MultiLexNorm2026.
Important Dates
- 25-Jul
- TBA
- 10-Sep
- TBA
- TBA
| Date | Event |
|---|---|
|
|
Submission Deadline (anytime on earth; dual-submission allowed) |
|
|
ARR Commitment Date |
|
|
Acceptance Notification |
|
|
EMNLP 2026 Findings Deadline |
|
|
Workshop Day |
Call for Papers
We seek submissions of long and short papers on original and unpublished work (same page limit as the EMNLP 2026 main conference). All accepted submissions will be presented as talks and/or posters at the workshop, following the EMNLP 2026 main conference.
We welcome submissions addressing (but not limited to) the following areas:
Classical NLP Tasks on Noisy Text
- NLP of noisy text, e.g., POS and NER tagging, parsing
- Text normalization and error correction
- Paraphrase identification and semantic similarity of short text or noisy text
- Extracting user demographics, profiles, and major life events
- Machine translation and multilingual NLP over noisy text
- Information extraction from noisy text and event extraction
- Colloquial language, e.g., idiom detection
- Domain adaptation to user-generated text
- Detecting rumors, contradictory information, sarcasm, and humor on social media
- Sentiment analysis
- Temporal aspects of user-generated content (resolving time expressions, concept drift, etc.)
- Representing and mining language variation in user-generated content
LLMs and Noisy Text
- Robustness of LLMs to noisy, ungrammatical, or informal input
- Training and fine-tuning of LLMs on user-generated text
- Evaluation methodologies for LLMs on noisy text
- Domain adaptation of LLMs to specific user-generated text domains
- LLM-generated noisy text: detection, characteristics, and implications
- Instruction-following in LLMs when faced with noisy or ambiguous prompts
- Retrieval-augmented generation (RAG) with noisy documents
Real-World LLM Applications
- LLM performance on text from social media, forums, and messaging platforms
- Handling code-switching, slang, and emerging language in LLMs
- LLMs for content moderation and safety in user-generated contexts
- Multilingual and cross-lingual LLM applications on informal text
- LLMs for assisting language learners and processing learner text
- Bias, fairness, and representation in LLMs trained on user-generated text
We particularly encourage submissions that address multilingual challenges, low-resource languages, cross-platform variations, and the unique characteristics of user-generated text across different communities.
Submissions should conform to the ACL style guidelines. Long and short paper submissions must be anonymized. Please submit your papers via:
Submit via OpenReview Commit via ARR
Double Submission Policy: Papers that have been or will be submitted to other meetings or publications must indicate at submission time. Authors of a paper accepted for presentation must notify the workshop organizers by the camera-ready deadline as to whether the paper will be presented or withdrawn.
Workshop Programme
The workshop programme will be announced after acceptance notifications.