Machine Translation: the Scale of the Challenge


FIGS was a fairly common acronym in machine translation circles in the 1970’s and 80’s. It stands for French, Italian, German and Spanish, and it was used to designate the principal target languages that most commercial MT systems sought to cover. (At the time, English was of course the dominant source language.)  Why these particular language combinations?  For the simple reason that the target market of commercial MT in those years was principally made up of large corporations who did business with Europe’s major economic powers.  What was it that made MT was so expensive back then that only large corporations could afford it? For one thing, the hardware that systems like Systran and Logos ran on was expensive – either mainframe or mini-computers.(1) For another, the development of those rule-based MT systems was a lengthy, labour-intensive undertaking that required large teams of specially trained linguists and computer scientists. The two factors combined put machine translation well beyond the reach of individual translators and most SME’s. 

How things have changed since then! No longer can MT vendors content themselves with an offering that is limited to a few major European languages, adding perhaps Japanese and Chinese to the mix. Even in Europe, all member states of the European Union have the legal right to communicate with the institutions of the Community in their own language, and expect that official documents submitted by any other member states will be translated into their language. Let us take a moment to calculate the magnitude of this European translation challenge. There are currently twenty-three official EU languages.(2) In order to satisfy the aforementioned requirement, the translation service of the European Commission would need to have access to a sufficient number of highly qualified translators for no fewer than 506 language pairs! On the one hand, for many such pairs, these qualified translators simply do not exist. (Maltese to Finnish anyone? How about Irish to Slovene?) And even where they do, the complex operations of the EU cannot afford to function at the plodding pace of even the most rapid human translators. Machine translation offers the only hope of coping with the EC’s enormous translation workload, as well as the only hope of restraining in its ballooning translation costs.(3) The EU has recognized this, which is why it has been funding the ambitious Euromatrix project, the goal of which is to develop MT systems for all combinations of European languages.

And, of course, what is true for Europe is increasingly true for the entire world. If they are not to be left utterly behind in the new century’s global economy, emerging third-world nations have an urgent need to translate vast quantities of technological, scientific and cultural material so that it can be made available to their local, often monolingual population. And here, it is not just machine translation that offers the only hope, but rather statistical MT. Because many third-world countries don’t have the linguistic and computational experts required to develop rule-based system, whereas SMT allows for the automated development of systems using machine-learning methods applied to previously translated corpora.

(1) This was before the appearance of the first PC-based MT systems; and even when the first of those did appear, it was generally understood that they couldn’t offer the same level of quality as  the larger mainframe-based systems.

(2) And currently twenty-seven member states.  Several states share the same language, e.g. German is the official language of both Germany and Austria; Belgium’s national languages are French and Dutch.

(3) Already in 2006, when the EU had just 20 official languages, the annual budget of its translation and interpreting services was estimated to be over a billion euros.