Automatic universal translation has long been a science-fiction dream. A new paper from artificial intelligence researchers at Facebook’s parent company Meta claims to take a step toward that goal.
The paper shows that machine learning, the technology behind AI, can translate 204 different languages, twice as many as ever before attempted, at a higher quality than previously achieved.
That includes more than a hundred rarely-spoken languages, such as the languages of the Acehnese people of Indonesia and the Chokwe people of Central and Southern Africa, which have always been hard for computers to translate because they have very little presence online.
It’s the latest development in artificial intelligence, a controversial area of science recently in the spotlight after a Google engineer was put on leave after claiming a chatbot could express thoughts and feelings.
“The paper presents impressive work to push production-level translation quality to 200 languages,” said Professor Philipp Koehn of Johns Hopkins University, one of 38 academics and Meta researchers who collaborated on the work.
“There also will be a lot of resources released that allow everybody to use this model and retrain it on their own, fostering research in that area.”
“So yeah, I think it is a pretty big deal.”
Yet although the paper claimed it was “laying the important groundwork towards realising a universal translation system,” computer scientists who weren’t involved in the project stressed that it was a small step on a long and winding path, with no obvious end in sight.
‘An impressive engineering feat’
The paper’s central machine learning technique, a model known by the baroque term Sparsely Gated Mixture of Experts, was not in itself new, said Dr Alexandra Birch-Mayne, Reader in Natural Language Processing at the University of Edinburgh.
Its biggest contribution, she said, was pulling together, cleaning and presenting new data on languages which did not appear widely on the internet, the main source of data for machine translation.
“It’s an impressive engineering feat. It’s not necessarily a breakthrough in terms of the fundamental science,” Dr Birch-Mayne told Sky News.
In addition to translating languages with fewer speakers, the paper also claimed to set a new bar for the quality of translation.
Data and algorithms to be made publicly available
Measuring progress in machine learning is a challenging task, but by a metric known as BLEU, the Meta paper improved the quality of translation over the previous state-of-the-art by 44%.
“BLEU is an imperfect metric,” said Dr Diptesh Kanojia, Lecturer in Artificial Intelligence for Natural Language Processing at the University of Surrey. “However, it is a standard practice in natural language processing research to quote BLEU scores.”
“If we look at this improvement purely in statistical terms, 44% improvement is quite significant.”
Although the work will be used to improve Facebook’s software, the language data and the algorithms used to translate it will be made publicly available, meaning that for the first time there will be authoritative datasets on the languages such as Eastern Yiddish, Northern Kurdish and Cape Verdean Creole for other researchers to use.
Crucially, the Meta researchers found native speakers to check their translations, a time-consuming task which helps safeguard both the quality of the algorithm and the underlying language data.
“What’s admirable is engaging with the community. They’re not necessarily initiating this trend, but they’re following good practice,” said Dr Birch-Mayne, while noting the limitations of the effort, which involved native speakers in Europe and the US rather than in the languages’ home countries.
Some researchers criticised the fact that the paper had been released without peer review, accusing Meta of practising “peer review by media”.
Professor Koehn defended the approach, saying it was “common practice in the field… for better or worse” and helped improve the speed of communication of research results.
Advances in machine learning
The paper is one of a number of recent advances in machine learning, which is improving at a far faster rate than researchers expected. A model released last week by Google solved a third of MIT undergraduate maths problems with 50% accuracy, a dramatic increase in performance.
But although each fresh breakthrough brings speculation about new forms of consciousness, most people in the field believe that AI systems are neither sentient nor intelligent, saying they do little more than mimic the data they’re given. A robot uprising is not on the cards.
The bigger danger of AI systems is that they will lead to disaster by giving humans false confidence in their still very limited abilities – a very real prospect given the sensitivity of the tasks potentially involving translation at Facebook, which has been criticised in the past for failing to have native-speaking moderators to spot calls for violence on its platform.
Mr Zuckerberg promised that “the advances here will enable more than 25 billion translations every day across our apps,” something Facebook said could include spotting harmful content, securing elections and curbing online sexual exploitation.
Dr Birch-Mayne, who has just finished a three-year project on 17 languages in Africa and India, working alongside the BBC, cautioned against using machine translation for anything where accuracy really matters.
“You can’t rely on these systems,” she said. “It might be right, but it might not.”