People have lots of questions about machine translation (MT). Everything from the simple, “What is it?” to the more complicated, “Why doesn’t it work better?” At some point, just about every company that translates content thinks about adding MT to their localization toolkit. But where, exactly, should MT be used? And where, exactly, should it be avoided at all costs?
With or Without the Human?
Machine translation can be used with or without a person correcting the translations. What do I mean by this? Well, if you aren’t in the localization industry, you might think that translations that are done by machine are just that. The content goes into the engine on one side, the translations come out the other, and then you use them. This is the model of Google translate or Bing translate. You type or speak what you want translated and it does its best to give you the text or sound in the requested language. No humans are involved.
But that’s not how MT is used commercially. In commercial applications, people (translators) actually edit the content once is it translated into the target language. The process is called post-editing. Theoretically, it should take translators less time to post-edit content that has been translated by MT than to translate all of the content without MT.
As MT gets better, the idea of using MT plus human post-editors is becoming more appealing. In their article “Challenges in Predicting Machine Translation Utility for Human Post Editors,” Michael Denkowski and Alon Lavie of Carnegie Mellon University state:
As MT quality continues to improve, the historically under-explored idea of using automatic translation to assist human translators becomes more attractive. Recent work has explored the possibilities of integrating MT into human translation workflows by providing automatic translation as a starting point for translators to correct, saving time compared to translating source sentences from scratch.
How Good is Machine Translation These Days?
Admittedly, MT is getting better. Google translate is getting better. The reason for the improvements in Google translate and Bing translate has to do with the algorithms used by their statistical machine translation (SMT) engines. The website statmt.org defines statistical MT as:
The translation of text from one human language to another by a computer that learned how to translate from vast amounts of translated text.
The key words in this definition are “learned” and “vast amounts of translated text.” An SMT engine is trained how to translate. The training process involves putting a great deal of content and its accompanying translation through the engine. In effect, you teach it how to do its job. Then, when someone sends new content to be translated, the SMT engine uses a statistical algorithm to match one of its known translations to the new source content. As you can imagine, it takes a lot of content to train SMT software. Over time, though, SMT engines do improve.
Measuring Machine Translation Quality
MT quality is measured on two scales. The first is adequacy. As defined by Denkowski and Lavie, adequacy is: “the degree to which MT output captures the meaning of a reference translation.” In other words, how close is the translation to the meaning of the source?
The second measure is called fluency. Fluency is, “the degree to which MT output is grammatically correct in the target language.”
To be considered high-quality, the translation coming out of the MT engine needs to mean the same thing as the source content and it needs to be be grammatically correct. MT is becoming more adequate and more fluent as time goes on.
Where Should I Use Machine Translation, Then?
Unfortunately, even with gains in quality, current MT engines do not produce translations that are adequate and fluent enough for most commercial applications. Let’s look at some content categories.
There is no company that I know of today that is willing to risk using MT alone for mission-critical information. We simply cannot trust what is produced when the translation is of extreme importance.
In these cases, some companies may opt for MT plus human post-editing. After all, at the end of the process, there is a person who must sign off on the viability of the translation. That said, most of my customers who are in the life sciences industry (for example) are not using MT at all and have no plans to do so in the near future.
Repetitive, Non-Critical Information
MT makes the most sense in situations where the content is repetitive, non-critical, and rather “dry”. By this I mean content that is factual and straight-forward, not content that is emotional or nuanced in any way. For example, installation instructions for a line of bookcases would be good candidates for MT plus human post-editing. Another example would be recipes and other instructions.
Specifications are another type of content that is suitable for MT plus human post-editing. Again, the information is not factual and, as long as the specifications are not for a mission-critical or life-saving device, using MT plus humans could be a good option.
Emotional and Nuanced Content
Content that seeks to create an emotional response is not a good candidate for MT. Emotions do not always translate. Capturing and translating the meaning of emotional content takes a well-trained translator who is intimate with the cultures of the source language and target reader.
Nuanced or colloquial content is another category that does not work well with MT. In the case of SMT, if the engine has not been trained with colloquialisms and jargon, the resulting translations will suffer.
Can I Ever Use Machine Translation without a Human?
So, what type of content is suitable for MT without any human post-editing? At this point in time, the only type of source content that should be used with MT alone is unimportant content. For example, a letter from your great aunt in Italy. Most tweets (since they are barely readable in the source language) can be translated using MT alone, as long as you don’t really care about the accuracy of the translation.
We hosted three Japanese exchange students last year. I communicated with them using the voice aspect of Google translate. We had many good laughs with the resulting translations. Good fun. Nothing important.
My Crystal Ball
My crystal ball is a bit cloudy on the topic of MT without humans in the commercial arena. I fully expect that some day we will have MT engines that are so adequate and fluent that we won’t need human post-editing for accuracy. Hey, we put people on the moon and sent vehicles to Mars. Surely we’ll solve the MT problem one day. I just can’t tell you when. Until then, you are best saving free MT and MT-only solutions for content that isn’t mission-critical and follows the rules of grammar.