The Ultimate Glossary of Machine Translation Terminology
In recent years, machine translation has become increasingly mainstream due to advances in Artificial Intelligence (AI). Machine translation, often abbreviated as MT, refers to a set of digital tools you can use to translate text from one language to another. A well-known example is Google Translate.
In this blog, we’ll dive into important terms used in machine translation, such as localization, internationalization, normalization dictionaries, and more.
Translation refers to converting words or text from one language to another. Translation can be done manually by humans or by computer software, called Machine Translation. With the explosion of information that needs to be translated for marketing, training, customer service, collaboration purposes and more, an increasing number of companies have started turning to machine translation (MT) to translate text from one language to another.
A step beyond translation, localization refers to the whole process of adapting content or products to a specific niche, market, or location.
Localization involves translation, as well as adjusting other aspects of content or products, such as:
• Changing the formatting for phone numbers, addresses, dates, etc.
• Changing design, colors, and graphics so the translated text can be read and seen clearly
• Changing content to suit local sensibilities and preferences
• Converting currencies and measurements to the local units
• Covering local legal regulations and requirements as needed
Localizing your content and products is vital to appeal to your target audience because it provides the look and feel they expect, and limits surprise or confusion. According to a 2020 Common Sense Advisory report, 65 percent of consumers said they prefer exclusively buying products with information in their native language. Another Common Sense Advisory finding revealed that 40 percent of consumers will not buy if the information is in another language.
Localization will not only help you appeal to a broader audience, but it will also help you increase sales, build credibility, and increase loyalty among existing customers.
What Materials Should You Localize?
To attract people from different countries to your products and services, you need to localize more than just your website. You should also localize:
• TV, print, radio, social media ads, and other marketing materials
• User interfaces
• Service materials
• Product warranty manuals and materials
• Product manuals
• Quick start guides
• Online help and chat
Translation Management System
Another term you’ll regularly encounter in the realm of machine translation is “translation management system” (TMS). A translation management system, also known as translation management software, is software created to help manage and automate the translation and localization of assets.
You should use this system when you need to translate a large amount of content. The more translations you produce, the more difficult it is to manage all the languages and dialects involved in your project. A translation management system will help you manage all these efficiently, effectively, and collaboratively.
In a translation management system, much of the initial translation occurs automatically with the help of MT and looking for matches in translation memories. After that, human translators will often revise and review the translated content and fix it as needed. Beyond translating text, this system will also help you contextualize the translated content to maintain your brand identity and vision.
A TMS doesn’t only help with translating. It also focuses on the project-management aspects of the process, such as scheduling, coordinating with stakeholders, finding the post-editing human resources (freelance translators and language service providers), estimating costs and paying them based on set criteria, running quality assessments, and working with deadlines.
Sometimes, a computer aided translation (CAT) tool may be considered a TMS depending on how thoroughly it handles helping you and the individual translator, as well as completing translation work.
Internationalization may sound similar to localization, however, the two concepts are quite distinct. While localization focuses on how end-users consume content in one language at a time, internationalization involves engineering and developing an application or product to support multiple languages – making it ready for “going international.”
Internationalization plays an essential role in successful international marketing, i.e., global enterprises. It requires developers to account for localization into each target language or region within the application’s architecture from the beginning, enabling a smoother process down the road.
Some best practices for internationalization include:
• Support for cultural, local, and regional references
• Presenting and sorting lists
• Handling location names and personal names
• Making sure text can be written in multiple ways, like left to right, right to left, and vertically
• Separating user interface (UI) elements from source code of content
There are several ambiguities and challenges that come with something seemingly simple, such as a company name that sounds a lot like an edible fruit. For example, in chat, emails, audio and other recorded conversations of sporting events, “Mister Friendly” might be referred to by just his last name: “Friendly.” And if he’s a boxer, he might be known to not be so friendly — Friendly is not friendly. In English titles, it will get more confusing if we capitalize most of it: Friendly was not Friendly. When should you translate it as “Amistoso” or “Amicalmente?” When should you keep it as “Friendly”?
The trick, especially in internationalization, is to create better source content in the first place. Get rid of ambiguities, slang and colloquialism, or improving it on-the-fly during translation with “normalizations,” so it works better for other regions and language. Get rid of abbreviations and acronyms, provide details, important context and call him “Mister Friendly.” Then it’s still señor Friendly who won the match.
Globalization is a broader concept than internationalization and localization, although all three terms center around the international exchanges of ideas and commerce.
Specifically, globalization refers to anything that brings cultures, people, and economies of different countries together. This can include a wide range of activities, from marketing to product design to app building.
An example of successful globalization is Netflix. This popular streaming service operates in more than 190 countries and customizes content offerings for individual markets with subtitles and programming in local languages.
Companies engage in globalization whenever they want to expand into different cultures. There are many benefits of globalizing your business, and in 2021, it’s easier than ever to go global. With social media, the internet, and powerful machine translation tools, expanding your reach globally can increase your revenue and provide more opportunities to explore product niches with less competition.
A few years before the neural MT revolution, knowledge base (KB) articles were among the early adopters of MT. If you played a global video game — with versions for Russia, China, France or Brazil — and had questions for the support team, receiving answers in your native language would be immensely gratifying. The same goes with other products such as user guides for multimedia software. In the end, whether the phrases were linguistically perfect mattered less than if the article could solve the issue.
Nowadays, MT powers most online self-help resources. Even documentation for OpenNMT, the neural MT framework, was machine translated by itself.
Translation memories (TMs) refer to databases containing paired sentences that have already been translated. It’s a memory of how a sentence was translated in the past, and it serves a specific purpose when recorded in a TM.
Translation software, such as SYSTRAN, create and use TMs to suggest similar and identical matches as it translates new documents. By utilizing TMs, the software can skip segments of text previously translated. Using TMs will help you improve the speed, quality, consistency, and efficiency of your translations.
TMs have long been used by CAT tools too. Anything that doesn’t closely match a TM search would go through human translation. Nowadays, MT fills these gaps first, then human validation kicks in for final review and post editing.
Increasingly, TMs also serve a new purpose — training or specializing the MT engine. Even before neural MT, the SMT (Statistical MT) approach (Moses engine) used translation “engines” that were trained from bilingual aligned content found in a TM of millions of sentences.
Post-editing refers to a human translator revising or editing text that machine translation software already translated.
You may need post-editing if you want to make your machine translations sound more human and accurate or stylistically different. Sometimes, machine translations are too rigid, technical and don’t have the right tone for the target market. In such a case, you may need to invest in a human post-editor to look through the machine translations and make edits as needed. It also might be a case of choosing an informal mode vs. a formal mode. Or different capitalization. British English vs. American English and other cultural or region-specific differences.
While human translators used to carry the brunt of translation work, current developments in machine translation, such as Neutral Machine Translation (NMT), have enhanced the role of human translators and allowed them to focus on elements that require a human touch.
Example: A properly trained neural MT engine may have been specialized and be aware the French expression of “Mais ce n’est pas vrai!” could be translated as “But it is not true!” in a number of cases but had been taught to prefer: “But it isn’t true!” But when translating for an audience in the US Southern States, it helps to have a human post editor from the region who changes it to “But that ain’t right!”
Exact vs. Fuzzy Matches
Machine translators don’t always find an exact match for the word you want to translate, even with the help of TMs.
Fuzzy matches occur when TMs suggest pre-translated matches that are similar but not equivalent to the word you’re translating. In other words, they are “fuzzy” because they aren’t exact matches to the word or phrase.
They are triggered by:
• A couple of words (around five) in a sentence that differ from exact matches
• Different word orders
• Unique or unexpected punctuation
• Changes in sentences tags, such as new fonts, italic, links, bolds, etc.
Fuzzy matches primarily come in two types:
1. High fuzzy matches: Only a small amount of the text and/or formatting is changed and requires minimal editing to fix
2. Low fuzzy matches: Has several critical differences between the incoming source sentence to be translated and sentences in the TM. Even the closest match will need changes for the final output to be considered correctly translated.
While fuzzy matches sound inconvenient, they can help your translation team estimate the amount of time they will need to translate your projects manually.
Other Uses of Fuzzy Matches
A new technology recently appeared that combines fuzzy matches found in a TM search with a neural MT engine’s ability to perform non-reminiscent, on-the-fly specialization training. This neural fuzzy adaptation (NFA) technique allows fuzzy matches to make a significant contribution in the way the NMT engine determines the best possible translation. It adds more significant candidates for terminologies and expressions. This adds context that can significantly improve translations. The combination of the imperfect fuzzy matches and the powerful NMT engine makes a perfectly working solution.
Multilingual Text Translation
Multilingual text translation involves translating a single document that contains text in multiple languages, not just one.
To translate such texts, you need software solutions that can detect the different languages within the document and translate each accurately and efficiently.
Examples of texts that require multilingual text translation include:
• An email thread or chat log with messages from people who write or speak different languages
• One document containing several languages that need to all end up in English
• A document that has a mix of foreign and some already-translated content, the latter of which should not be retranslated.
• A speech where some phrases are in Latin, Klingon, Swahili, etc. which should remain untouched.
• Audio recordings, such as depositions in court cases or criminal forensics investigations, with parties speaking different languages along with their interpreters.
Becoming more common in the globalized work of business, multilingual text translation is extremely powerful because it eliminates redundancies associated with running the same document or email thread through a translation service multiple times to address different languages therein.
Also known as a “glossary,” companies use a user dictionary to ensure every defined term is used and translated consistently and correctly.
Created in collaboration between the client company and a Language Service Provider (LSP), user dictionaries contain specialized terminology with approved translations in all target languages. It may also include trademark terms, acronyms, and names.
User dictionaries are especially useful for companies in niche or highly technical industries like law or medical device manufacturing. Without user dictionaries, you will find it very difficult to ensure all terms and phrases are translated and used correctly, unless the translation engine was specifically trained (specialized) to such terminologies.
While user dictionaries or glossary features are fairly common among MT providers today, they are not universal, especially on free or base-level paid tools. Discuss with your chosen provider about the specific capabilities of their user dictionary and how it integrates with a translation memory.
Unlike a job-based setup, continuous localization means your software will constantly translate new content added to your product. Continuous localization is a good choice for digital products such as apps, websites, games, and software that update often. Since they are never “finished,” you will always have new content to translate.
Adopting this form of localization will eliminate the manual and repetitive parts of the process and get the product to end-users much faster. It simplifies the project management aspects.
Additionally, continuous localization can help you create quicker releases that happen multiple times a day as it enables you to avoid interruptions.
Language Service Providers
Language service providers (LSPs) specialize in social coaching, language, localization, interpretation, and translation solutions. They can be agencies, individuals, or companies.
Services they offer include:
• Copy editing
• Training for CAT tools
• Translated segment evaluation
If your project contains multiple languages, has specialized content such as technical documentation, patents, or software code, you have the option to hire a full-service LSP to take care of all these needs.
Before you hire an LSP, however, make sure you know whether they’re a good match for you in terms of services offered, target languages, and specialties.
With small firms, you may get more personalized services and specialties, while large LSPs are usually able to take on large volumes of translations in multiple target languages.
In our increasingly interconnected world, many software and tech companies have become integral to LSP ecosystems, especially as MT solutions, as well as CAT and AI tools, become turn-key solutions.
Normalization dictionaries (NDs) are another type of dictionary or glossary that machine translators use to ensure the consistency and quality of translations. They are helpful for regional differences in punctuation, spelling, and expanding abbreviations.
They’re also instrumental for fixing common typos and misspellings or other glitches, such as OCR from low-quality scanned images with high and lossy compression and other forms of noise. When transcribing audio from a noisy recording it can provide value as a pre-processing step during translation.
There are two types of NDs: source normalization and target normalization.
Source normalization is applied to a source file before translation begins. It’s used to:
• Standardize terminology in the source text. For instance, you can define that “neighbour” should be normalized to “neighbor.”
• Expand abbreviations. This can be helpful if you’re working with chats and emails. For example, “r u” can be expanded into “are you” before translation so it’ll be processed correctly by the machine translator.
• Fix typographical errors
• Turn fully capitalized words into lowercase versions if more meaningful
• Adjust OCR text (example: Spanish: avión (airplane) may have been falsely recognized as avi6n)
• Disambiguate an acronym which has multiple meanings especially if they turn into other acronyms. For example: UPS may stand for United Parcel Services but if it was meant to be Uninterruptable Power Supply, translating that to German may not always produce the expected USV instead of UPS.
In essence, this is a form of pre-processing.
Target normalization adapts translation output to user needs for consistency’s sake. It’s also used to replace sequences created by the software with user-defined sequences. This is a form of post-processing, after the translation, before you get to see it.
Some examples include:
• An expression that is too long after translation and should be shortened or abbreviated to better fit into a narrow space on a form.
• If you want to mask (anonymize) specific names and replace them with “xxxxxxx.” This can be applied to several things that you might want to hide.
• Choose an alternate meaning based on target audience. For example, translate a verb like “to help” into Spanish as “ayudar” in general, but as “sopportar” in translations of a tech support department’s communications.
• Turn an informal expression to a formal equivalent, like turning “you have” in German from “du hast” to“Sie haben.” Ideally, the trained NMT engine will do this by itself, but when it needs extra help, the target normalization dictionary may be able help.
Normalization dictionaries are applied before and after translation, so you can use the coding category “sequence” without breaking up the sentence analysis. SYSTRAN does offer normalization dictionaries as a part of our solution, but many translation providers do not include this incredibly helpful pre or post-processing tool as a part of their standard package.
BLEU Scores vs. Human Evaluation
First recommended as a cost-effective alternative to human evaluations of machine translation in a July 2002 report from IBM, BLEU stands for “Bilingual Evaluation Understudy.”
BLEU measures the degree of difference between human and machine translations using a simple algorithm. The algorithm, first, compares individual segments or sentences before assigning the entire text an average BLEU score.
The closer the machine translation is to the human translation reference, the higher the score. A scale of zero to one is typically used, with one meaning identical to the human translation and zero signaling the machine translation did not match the human translation.
Although BLEU can be useful in many situations, it can be a rather shortsighted way to evaluate machine translations because:
• BLEU can’t assess whether an error impacts the meaning of a sentence. It merely compares to a given reference and if it sees a difference, small or big, the score is affected. To us humans, however, it might make no difference if something is translated as “this is great,” or “that’s perfect,” or even “I really like it!”
• A properly translated sentence can still receive a low score.
While you shouldn’t use BLEU scores to determine whether your translations need improvement, you can utilize them for efficient benchmarking and quick evaluations. Don’t use it as an absolute measure. Instead, use it as an indication of a trend or a measure of relative improvement — for example, when working towards customizing a translation system with your glossaries and normalizations or specializing a new trained engine.
BLEU scores, like other scoring metrics for automatic quality assessments, are of value in a relative use case. In other words, BLEU can only answer questions such as, “Is the current system better than the previous one?” Only humans can analyze the details manually.
Machine translation lingo can be complex and intimidating. From localization to globalization to BLEU scores, machine translation requires dedication and patience to understand and implement. However, it’s worth it in the long run. Machine translation software will give you more time and energy to spend on tasks that require human attention by simplifying and automating repetitive tasks.
If you’re looking for powerful machine translation software to integrate into your workflow, check out SYSTRAN. With SYSTRAN, you can translate in more than 55 languages and 140 combinations using ultra-accurate industry-specific translation models. You can also customize your translation by importing translation memories, dictionaries, and more.
Start your free 14-day trial of SYSTRAN Translate PRO today. To find out more about us, contact us here.