110 new languages on Google Translate, including Southern Franconian

Google's automatic translation service will one day support every seventh language. Now 110 additional languages are being added, the biggest step yet.

Save to Pocket listen Print view
Section of a computer keyboard; instead of letters, flags of various countries can be seen on the keys

(Image: cybrain / Shutterstock.com)

4 min. read
This article was originally published in German and has been automatically translated.

Google can translate more languages automatically. The translation service Google Translate is being almost doubled from 133 to 243 languages, starting with the source languages. This was announced by the data company on Thursday. The spectrum ranges from languages with very many speakers, such as Cantonese or Punjabi (in the Shahmukhi script variant), to less common dialects such as Southern Low Franconian or Riograndense Hunsrück, which is used in southern Brazil.

"We are now using artificial intelligence to expand the range of supported languages," writes Googler Isaac Caswell in the company blog. "Thanks to our Large Language Model PaLM2, we're introducing 110 new languages to Google Translate, our biggest expansion yet." Just last year, Google Translate added 33 languages to its vocabulary, bringing the total to 131 plus the traditional and simplified variants of the Chinese written language. This time, Cantonese has been added separately. This was particularly difficult to train, reports Caswell, "because Cantonese often overlaps with Mandarin in writing". This does not make it easy to automatically find Cantonese texts and incorporate them into the Large Language Model (LLM).

Romanes, which has been spoken in Germany and Austria for centuries, also presented the programmers with a challenge, as it is spoken in numerous dialects across Europe. The result is an LLM that produces a mixture that is not spoken in this way: It is based on Southern Vlax, but also contains elements of the Northern branch and the Balkan branches.

A quarter of the new languages are of African origin. There is now a separate variant for Portuguese, which is only spoken by a minority of all Lusophones: the language version spoken in Portugal. Several creole languages are also included, for example from Jamaica, Mauritius, Papua New Guinea and the Seychelles. Politically controversial newcomers include Tibetan, Ossetian and the language of the Crimean Tatars. Google has published a list of the newly supported languages.

The new offerings cannot yet be selected in the user interfaces. This means that translations from the new languages are possible, but not yet into the new languages. Offline translation packages are also not yet available in the app. It remains to be seen how Google will design the interface with so many languages. The company has set itself the goal of one day automatically translating 1,000 languages into each other. This would mean that it would support around one in seven languages.

Although languages are constantly dying out, there are still more than 7,000 of them. However, the exact distinction between a language and a dialect is difficult and often (politically) contested. In Germany, Yiddish, North Frisian, Romany, Saterlandic, Sorbian and South Jutish are particularly threatened with extinction.

According to Google, it has finally fulfilled the long-standing demand of the Faeroese people to be supported by Google Translate. But the language recognition does not yet work properly in practice. The Faroese sentence "Mær gongst væl, takk, og tygum?" ("I'm fine, thank you, and you?") is currently being misinterpreted by Google Translate as Icelandic. The result is this Dadaist translation: "I'll please moan and let's chew?".

Strange things happen when Google Translate is mistakenly or erroneously given the wrong source language. Google has not yet taught its AI the courage to leave gaps, for example in the form of a "don't understand" error message. If "Mær gongst væl, takk, og tygum?" is to be translated from "German" into English, the output reads: "We're counting, right, and wondering?" Conversely, the system hallucinates this German surprise when English is incorrectly specified: "What will become of you, your father and your son?"

(ds)