Translation is a computer science problem that every CS college student since the 1960s has been trained on. On face value it seems like an ideal computer challenge that should be easy to solve. Languages have a structure and we have ample amounts of sample data to work with. However, language translation has proven to be a difficult problem to crack. It’s inherently resistant to the algorithmic approach and all efforts now lie with the probabilistic approach to solving language translation. Human languages are littered with idioms, exceptions and inventiveness and provide a unique challenge to software engineers.

Languages have idioms and subtle contextual differences which mean that we still need some form of human assistance to provide an adequate form of reliable language translation. In this article I’ll go through some of the typical approaches that technology companies are using to provide translation services in as automated a way as is currently practicable. I’ll explain how machine learning translation works and why we still need a human in the loop in order to provide this kind of service.

The Technology Stack:
Before we get into that, lets discuss the current machine learning technologies that are being used by language translation start-ups.
From a database perspective, NoSql is usually the database architecture of choice with MongoDB being the most popular. In terms of programming languages, unsurprisingly, Python is the most popular choice for almost all aspects of the system apart from the front end which depends on the interface (Android, iOS, ReactJs Web). For deep learning there are three main choices which are Torch, Theano and Googles Tensor Flow. It’s difficult to get information on which specific deep learning tools are being used by which language companies, but it’s likely that Torch or Tensor Flow would be the most popular choice.
In terms of platform providers, again it comes down to personal choice as the competition is so fierce that each of the large players can provide the infrastructure needed. AWS and Google Cloud are likely to be the go to providers. Companies will often keep their training servers in-house rather than using a cloud provider because the cloud cost is much to high for processor/memory intensive deep learning that requires 12Gb or more of RAM on their GPUs. It’s likely this situation will change and it will eventually become economically viable to use cloud machines for this work.

What Kinds of Startups are Doing Language Translation:
It should be noted that Amazon, Google and IBM provide their own APIs for translation and unsurprisingly are big players in this area and it’s common for Language Translation startups to use these APIs when they don’t have coverage for specific languages in their own systems. The startups in this area then aim to out perform these big general translation APIs in whatever domain or niche they are specializing in. But they will avoid this whenever possible because this data cannot be used in a feedback loop and cannot be used to improve their own deep learing models (for legal reasons).

We’re Always 10 Years Away!
Computer scientists have been intensively working on language translation since the 1960’s. The promise is always that we’re 5 or 10 years away from a practical, fully automated solution, but so far that promise remains unfulfilled. Machine translation was one of the original problems that created the field of Artificial Intelligence(AI). The first approach taken was similar to code breaking but proved ineffective. Language requires a lot of our brain capacity and is the closest expression of our intelligence and in hindsight it’s not surprising that the solution has remained elusive.
In the 1990’s there was a breakthrough with statistical models and then later with neural models which has led to the current hybrid machine-human method that is being used by most of the startups that are at the cutting edge in this field. The quality, speed and scale that is required to produce a good enough black-box of translation still requires significant human intervention.

Humans in the Loop:
Typically, the way this black-box works is that text comes in and is machine translated, the translation is quality estimated using a machine neural model that determines if the output of the translation is good enough or if it needs further refinement by a human translator. If the translation needs further attention it is usually forwarded to a human that knows the source and destination languages. The human translator will also ideally have an interest in the specific domain that the inputted text belongs to. For example, if the document for translation is about architecture, then a translator will be chosen that has also tagged themselves are being interested in architecture. So the most ideal human translator is not just proficient in the original and translated language but are also domain experts in whatever is the topic of the document being translated. In the best black-box models, there are even two human translators. One checks the output of the machine translation and enhances and repairs it and the second person checks the output of the first person and similarly enhances and repairs their work. Finally it gets sent back to the customer and the overall system learns from this output and the original machine language translation process is optimised. Of course the end goal is to reach a level of quality of machine translation that will allow removal of one of the human translators and to ultimately lead to the removal of both of the human translators. Whether we ever reach this goal (before we reach the infamous singularity) is unknown as this point.

Underestimating Language:
Considering the volume of translation learning data that exists and that is generated every day, you would assume that we would very quickly reach a point where we could rely solely on the machine aspect of language translation. However, it’s easy for us to underestimate the complexity and frequency of change and evolution of our languages. If you took a group of 100 people and isolated them in a separate community for 50 years, they would have generated a language that would be almost indistinguishable from their original language. Teenagers are also a good example of language evolution. Language is the primary way that teenagers differentiate themselves from their parents. Each generation produces it’s own subset of phrases and idioms which are not easily adapted to by machine learning without the quality checks provided by one or more humans injected into the language translation feedback process. This is why translation provides a unique challenge to machine learning.

Pairing Models:
Another surprising aspect of computer translation is in the models that have proven to be the most effective. One would assume that a base substrate language could be chosen as the source from which all other languages can be translated to and from, thus providing an elegant structure around which translation can occur. For example, if English was chosen as the base substrate language then a translation from Chinese to Spanish would go as follows: Chinese to English, English to Spanish. Using this model would mean that we could have a relatively small set of translation pairs between languages. However, in practice, this has proven to be ineffective and instead a pairing model has been used for every language to every other language. There are two reasons for this. Firstly the cost of using an intermediate universal language would be too much, because the biggest cost by far right now is with the human aspect of the system. So from Chinese to Spanish, instead of having one translator that knows both languages, you would instead need two translators, one for Chinese to English and another for English to Spanish. The second reason is due to the loss in fidelity or context that would result. There are idioms and nuances that would be lost more easily because of the use of the intermediate universal language. For this reason, a universal language or substrate has not been utilised successfully in machine based language translation.
There’s an exception to this approach. Sometimes there won’t be a human translator available that knows the specific language pair that is required. In this case, the translation may have to undergo a double translation using an intermediate language such as English, but this is a compromise due to lack of machine knowledge or human resources and not an architectural choice.

The Language Translation space is an exciting one and like many machine learning domains, it’s evolving at a frantic pace. It will be interesting to see if a purely machine based approach will come to fruition before the singularity solves all our problems!