Preliminary Program

Multilingual Whispers: Generating Paraphrases with Translation

Christian Federmann¹, Oussama Elachqar², Chris Quirk²
¹Microsoft, ²Microsoft Research AI

Abstract

Naturally occurring paraphrase data, such as multiple news stories about the same event, is a useful but rare resource. This paper explores translation-based paraphrase gathering, using human, automatic, or hybrid techniques, and compares to monolingual experts and non-experts. We gather translations, paraphrases, and empirical human quality assessments of these tactics. Neural machine translation techniques, especially when pivoting through related languages, provide a relatively robust source of paraphrases with diversity comparable to expert human paraphrases. Surprisingly, human translators do not reliably outperform neural systems. The resulting data release will act as not only a useful test set, but allow additional explorations in translation and paraphrase quality assessments and relationships.