This post is part of the Let’s Talk Terminology series.
In my last post on terminology management, I outlined the reasons why an organization should manage terminology. In this post, I’m going to talk about the available methodologies to do it.
Before we begin, I want to remind everyone that I am discussing source language terminology ONLY. Multilingual terminology management is a critical and extensive topic in the world of translation. And there are a number of really excellent tools out there to manage multilingual terminology. But that’s not what we’re going to look at here. Instead, we are going to look at how to manage the terminology that is used when we create content the first time. By this I mean single-language, source terminology, only.
Essentially there are two components in the “how” of terminology management:
- Storage
- Retrieval
Storing Terminology
There are a couple of different ways to store your terminology:
- Flat file
- Database
Flat File Storage
A flat file is a storage system that has no inherent hierarchy. It is simply a file that you can read, add to, and search within. Depending on the tool you use, it can contain cross-references either internally or externally. But, it is not a database.
Over 80% of the companies that I work with use a flat file system for managing their terminology (that is, if they manage their terminology at all). The two most commons file types are MS Word and MS Excel. Almost always terminology is stored in one or more tables – rarely have I seen companies use numbered or bulleted lists. It would be too unwieldy even for the smallest term bank.
A flat file table can be very convenient. It is:
- Easy to create and maintain using standard tools.
- Inexpensive, as there is no investment in new software.
- Simple for anyone in the company to use and modify, as they see fit.
It is also, perhaps, the worst method for storing terminology. It is:
- Difficult to find terms if you don’t know exactly what you are looking for.
- Painful to manage and maintain as the number of terms increases.
- Inefficient to use.
- Potentially difficult to control.
Database Storage
The other way to store terminology is using a database. A database is a collection of data that is organized and categorized. A database is a much more powerful way to store terminology. Using a database, you can:
- Categorize terms easily and extensively.
- Search for one or more terms based on a variety of criteria.
- Act on search results in a variety of ways.
- Use different applications to work with the terms.
- Have better control over who can access the terms, what each person can do, and so on.
On the other hand, managing terminology using a database has potential disadvantages:
- The software costs money.
- Databases are more complicated than a flat file.
- Someone has to administer the database.
- There is a learning curve to use the database.
If you have more than 50 terms in your terminology, using a database really is the best choice. If you have only a few terms to manage, a database is probably overkill for your needs.
Retrieving Terminology
There are two different methods for retrieving terms from a terminology database:
- Pull
- Push
Pulling Terms
A pull method for terminology retrieval is just how it sounds. When you need information about a term, you have to pull the information from either the database or the flat file. In its simplest form, you need to:
- Think about the term you are about to use when you write your sentence.
- Decide whether or not that term is managed.
- If you think it is managed, you need to locate your terminology (either by opening up the flat file or launching your database interface).
- Search for the term.
- If the term exists, look up the information you need about the term – is it permitted? Is it disallowed? Are there special usage rules?
- Go back to your sentence and write it using the information you found.
If you have more than 50 terms, this process can be inconvenient at best and a tedious waste of time at worst. In most environments today, we are under tremendous pressure to deliver our content as quickly as possible. We continue to do more with less – less time, less people, and so on. How often do we have the time to stop what we are doing, think about a particular word, look it up, and act on it?
The answer is rarely, if ever. While we all strive to deliver the best possibly quality in our content, rarely do we have the luxury of time to look up words in a terminology list. Using a database is certainly better. Still, if we are using a pull method to retrieve the information, rarely do we have the luxury of time to do so.
Pushing Terms
Using a push method for retrieving terms is exactly as it sounds, too. The information about the term is automatically pushed to you during the writing or editing process. You don’t have to stop what you are doing, do the look up dance, and act on it. Instead, that information is delivered to you with the touch of a button, most conveniently from within your authoring tool.
The best push technologies deliver the information that you need right inside your authoring tool, whenever you need it. For example, if you have written a term that is disallowed, the push tool can highlight the term, tell you it is deprecated, and suggest the preferred term (or terms). You don’t have to stop what you are doing. You don’t have to find your term list. You don’t have to leave your application. You don’t have to look anything up. All of it is delivered to you, instantaneously.
More advanced push technologies are based on complicated, highly configurable natural language processors. Some of the tools understand parts of speech. Let’s look at an example.
At your company, you have decided to standardize on the term verify. You have decided to disallow all of the synonyms of the term verify, such as check, ensure, make sure, and so on.
- Verify is the preferred term
- All other synonyms are disallowed
A writer creates the sentence:
Check that the lights flash once per second.
Using push technology, the software automatically flags the term check, says it is disallowed (or deprecated) and suggests the term verify. Simple enough.
Later on in the document, a writer creates the sentence:
Pay for the software update by sending a check to the following address:
In this case, the term check is not a verb. It is a noun. Sophisticated natural language processors know this. Therefore, the term check is not flagged in this sentence and the term verify is not suggested. After all, “Pay for the software update by sending a verify to the following address:” makes absolutely no sense.
Push technology is neither cheap nor simple. However, it is the absolute gold standard when it comes to managing your source terminology. The gains in efficiency, time, and dollars saved both in development and translation add up quickly. Usually the ROI for a push-based terminology management system is achieved in about a year. After that, the savings simply multiply. And the quality of the content improves exponentially.
Summing It Up
The combination of database storage and push retrieval is, by far, the gold standard for terminology management. Together, these methods make the job of managing and using terms fast, efficient, simple, and consistent. The writer saves a ton of time. Translation is faster, cheaper and better. And the quality of the content in every language is vastly improved. Customers are happy. Management is happy. You are happy. We all win.
It is really unfortunate that 80% of my customers continue to use an Excel spreadsheet or MS Word table to manage terms. More time is spent keeping these tables up to date than using the tables during the content development process. We all know this is true. It’s convincing upper management that, by using more sophisticated technology, we will save money, save time, and improve quality that remains the challenge.
- Preparing Content for AI: 6 Reasons Why You’re Not Ready - August 29, 2024
- How to Be Inclusive in the Workplace: My Experience as a Hard of Hearing Person - August 12, 2024
- How to Improve AI Performance? Do This One Thing to Your Content - May 20, 2024