Getting Started with Terminology Management

Terminology management requires a certain initial investment of time and resources that will not be immediately recovered. But don’t let that deter you from incorporating this best practice into your localization efforts, and the sooner the better!

Terminology management makes or breaks the success of globalization and localization efforts in terms of both budgets and sales. Identifying and standardizing terminology reduces the need for expensive queries management and rework on jobs. Terminology is also a crucial aspect of user experience design. Despite its strategic value, however, many are unaware that terminology is key to producing solid, well-performing products. And once they are aware, many don’t know where to start.

If you find yourself needing a little help in this area as well, then read on for some guidelines for getting started with terminology management. You’ll learn about key concepts, strategies for unpacking how the meaning of terms will be interpreted, and ways to write new terms to increase the likelihood that the audience will adopt the intended messaging.

Important Theory to Terminology Work

Theory is intimidating for some, but it doesn’t have to be that way. Learning a few key concepts is a great way to get started in this highly sought-after specialization. It also enables practitioners to better sell terminology strategy in terms that key decision makers will understand (usually in terms of money). At the core of terminology management is an understanding of how people conceptualize objects and ideas and how that can be leveraged to influence audiences and lead them to take action.

To illustrate, let’s turn to the semantic triangle, which explains how terminology impacts people’s mental processes. (See Figure 1.) Each point of the triangle represents an aspect of how meaning is identified in language. In the left corner, we find “objects,” which, according to the International Organization for Standardization (ISO) standard ISO 704 Terminology Work—Principles and Methods1, are “anything perceived or conceived.” In the top corner, we find “concepts,” or, as defined by ISO 704, “mental representations of objects.” In the right corner, we find “designations,” or the labels in language people use to represent the objects they perceive and abstract in their minds into concepts.

Figure 1. The Semantic Triangle

The shared designations for objects used by communities of speakers are what makes human communication within societies possible. In English, for example, if we didn’t all understand that the word “triangle” refers to a three-sided, three-cornered, one-dimensional shape, we certainly wouldn’t be able to discuss abstract concepts such as the semantic triangle, much less carry out a basic conversation on shapes. That said, it’s important to remember that just because communities of speakers use common language doesn’t mean that those speakers share the exact same conceptualization of each object that’s designated.

For instance, an often-considered example within terminology management is the conceptualization of a table. The traditional definition of a table that most people think of is a flat surface upon which humans eat and work. It’s an object that stands on four legs. Yet, that definition is not accurate because sometimes tables have three legs, sometimes six, sometimes the flat surface isn’t set upon legs at all but upon a pedestal, and sometimes the table doesn’t touch the ground but is suspended from a ceiling. If asked to draw their conceptualization of a table, no two individual speakers of a language will draw the same one from the mental inventory of those they’ve encountered or imagined in their lives. This is significant because those outside the language services and localization professions assume that meaning is held within the designation, but words themselves are arbitrary and empty. Meaning truly is in the eye or brain of the beholder. As such, we need to be careful about assuming that everyone shares the same understanding of what constitutes an object, especially when the objects being referred to are abstract concepts like quality assurance in translation.

How Terminology Influences Human Perception

The ISO 704 standard is fundamental to terminology work. The standard undergoes regularly scheduled reviews, which is common practice for all ISO standards. This review process is doing exactly what standards are designed to do—spurring passionate discussion among participants from worldwide markets and a competition of ideas. These are great activities that push innovation forward. One piece of advice in ISO 704 that I’ve always particularly enjoyed is found in the discussion of objects:

“In the course of producing a terminology, philosophical discussions on whether an object actually exists in reality are unproductive and should be avoided. Attention should be focused on how one deals with objects for the purposes of communication.”2

This advice may seem funny, but I can say from experience that when getting started with terminology work it can be easy to stray into metaphysical discussions about objects that—while enjoyable—may not necessarily push the terminology work forward. Advice I would make explicit from the above is that terminology work is at the forefront of the evolution of human language. That said, the politics behind language must also be acknowledged when conducting terminology work. In certain cases, you’ll want to be prescriptive about terminology use. Prescribing the words speakers should use, rather than describing the words commonly used by speakers, is a strategy used when establishing company slogans, branding, and product names. In other instances, just because a term isn’t acknowledged doesn’t mean that the associated concept doesn’t exist, though the results of that lack of acknowledgement often include inequality and discrimination.

Gender is a great example of the use of prescriptive and descriptive language in the 21st century. Legal recognition of non-binary gender identity is increasing around the world, including in Argentina in 2012, in Germany, Taiwan, and in U.S. states like California. ATA member Ártemis López, who has been translating and interpreting for queer, trans, and non-binary communities since 2011, addresses this topic in the article “You, Me, Hir, and Non-Binary Language,” published in Intercambios, the newsletter of ATA’s Spanish Language Division.3 López notes that the translation and interpreting industry will certainly need to learn how to respond to the natural shifts in language taking place as people become legally required to use language that promotes inclusivity and as societies become increasingly less accepting of the use of language to exclude and deny people’s existence. (Note: Language practitioners will need to keep up to date with the terminology to accurately handle assignments involving civil status documents.)

Just as women won the right to officially have their marital status disassociated from their form of address with the addition of Ms. to Mrs. and Mr. in English vocabulary, the fight for progress in the terminology surrounding identity has resulted in language that serves the function of describing rather than prescribing people’s gender identities. (See Figure 2.)

Figure 2. Gender Identities: Prescriptive versus Descriptive

How Terminology Leads to Action

Now that we’ve covered some important definitions, let’s re-examine the semantic triangle to better understand how terminology leads to action. By turning the semantic triangle on its side, we can see how the relationships among objects, concepts, and designations help achieve the desired result.

Using the illustration provided in Figure 3, we’ll work with the concept of “coffee to go,” represented by coffee beans and Styrofoam cups. Now let’s think about the idea of “coffee to go” in reverse order. In other words, not as something that someone sees and then attaches a concept and a label to, but as something that someone thinks about without having to see the related object.

Here’s the scenario: You’re out running errands after the children kept you up late. You think, “I need to get a few more things done today, but I could really use a bit of a pick-me-up. I need to keep moving and a coffee to go would sure help.”

Figure 3. Terminology to Action

When I think of coffee to go, I think about Starbucks, and I’m sure I’m not the only one. For me, the idea of “coffee to go” is mapped onto the term “Starbucks,” where Starbucks may be shifting in its lexical function. This shift is also referred to as a coinage.

“Just like new things are constantly invented, so are new words. The process of inventing a completely new word is referred to as coinage, and the same term is applied to the result of that process. Coinages often enter the language as trade names for commercial products, and over time they become general words referring to any version or variation of the original product. Examples of coinages that are used as terms: aspirin, nylon, kleenex, teflon, xerox; to google, to photoshop.”4

Although the brand name Starbucks is not a coinage yet, it’s starting to become easy to see how the company has, through brand recognition, conveniently made me and many others around the world associate the concept of “coffee to go” with the designation Starbucks. This results in many Starbucks products in the hands of customers and many millions of dollars in sales. That’s powerful.

Establishing Specifications for Terminology Workflows

To establish terminology workflows that drive results, the key is customization. Still, certain aspects of your workflow will always be the same, like the need to define the specifications for your project. (See Figure 4.) Here are some steps to help in this process.

Figure 4. Terminology Workflow

Step 1. Define the subject field. The subject field (or subject area, domain, vertical, or whatever you choose to call it) will always need to be defined. For example, the letters ICE stand for very different things by simply changing the domain from engineering (internal combustion engine) to immigration (U.S. Immigration and Customs Enforcement). Associations with the word impact perception. Thinking of ice cubes in a refreshing soda certainly has a different effect than if “ice” makes you think of the polar ice caps melting.

Step 2. Define the languages and audience. The audience and languages also need to be defined. This may seem obvious, but if we’re preparing terminology for a project intended for an American audience, it will make a big difference if the label “American” is used to refer to someone from the Americas or to someone from the U.S.

For example, estadounidenses (Americans) might seem like a specific enough label to use in marketing terminology geared toward a Spanish-speaking audience, but we also need to be aware that both Mexico and the U.S. consist of united states. Once we consider this, we realize that the label estadounidenses is not specific enough. Choosing English as the only target language for a project for an “American” audience is inaccurate. For starters, Spanish is an important American dialect for the U.S., but so are the hundreds of other languages spoken in this country. The important takeaway is that no matter the language chosen, individual Americans are more likely to purchase products geared specifically to them (i.e., marketed using familiar language they use every day at work and at home).

Step 3. Define the purpose of the project. Different strategies will be followed depending on whether the terminology is being collected for a specific subject field (e.g., technical terminology that a community of specialists use to communicate with one another about their craft) or a specific company. If the purpose of the project is not defined, unhappy results are more likely.

Build a Representative Universe of Texts

Once specifications have been established, corpora of technical texts within a specific domain are collected. It’s important that these texts are written by subject matter experts (SMEs) who are native speakers of each of the project languages.

When identifying texts written by “native” speakers, a major challenge and source of bias is the use of stereotypical identification methods that too narrowly define who gets to be considered “native.” For instance, a person with the last name Smith may or may not speak English as a native language. They may also speak Spanish as a native language, or any other language spoken on the planet. Assuming who a native speaker is based on superficial data like people’s names does not and should not suffice for professional terminology work.

Here are some tips to achieve well-formed corpora free of bias:

Concentrate on the volume of words. To substantiate whether a term is indeed part of the special language used by a community of SMEs, a wide variety of texts is needed. Terminologists should look for an unbiased sample of texts, size being one indication of this. According to Khurshid Ahmad, professor of computer science at Trinity College Dublin, and Margaret Rogers, professor of translation and terminology studies and director of the Centre for Translation Studies at the University of Surrey, empirical studies suggest that the vocabulary used in special-language texts is much smaller than the vocabulary in general-language texts. Therefore, starting off with a small corpus in a technical field is a good way to avoid bias.5

Make sure the texts included are not translations. Why is it important not to include translations in your corpora? According to Mona Baker, emeritus professor of translation and intercultural studies and modern languages and cultures at the University of Manchester, universal characteristics of translation include the tendency to lean toward explicit communication, simplified language, and a safe middle between covert and overt translation.6 These characteristics mean that translations do not replicate the way SMEs communicate together in a single language and are therefore not good candidates for corpora intended to produce technical terminology. (When producing client-specific terminology based upon past translations, however, a successful project is not possible unless past translations are consulted.)

Include many authors, and make sure those authors are SMEs writing for other SMEs. Why is this important? Well, if Microsoft documents are the only documents included in a corpus intended to produce IT terminology, the term extraction results will be biased toward Microsoft jargon rather than producing the shared special language used by IT professionals no matter their company affiliation. Additionally, SMEs employ simplified language in texts in which they explain their trade to lay people. When SMEs address other SMEs, they use the special language of their trade naturally, so corpora should be filled with documents produced with this audience in mind to more accurately produce the technical terminology of a trade.

Terminology Extraction: Human Validation Required

Terminology extraction is more successfully conducted when one can critically engage with the types of challenges being navigated, which machines cannot do at this point. To understand these challenges, we’ll start by looking at how extraction is carried out by a machine.

During machine extraction, the large batch of words in the corpus is analyzed for groups of words that follow the patterns of how terms are normally expressed in that language. Most terms are nouns. Those nouns are single words or compounds. So, when teaching a machine to carry out extraction for English, the machine would be taught to identify frequent occurrences of single words or compounds that follow these patterns (among others): noun + noun; noun + noun + noun; noun + of + noun, etc.

Despite claims about the ability of artificial intelligence to replicate and even replace humans, it’s important to keep in mind that automatic extraction results are currently far from a point at which they can be used without human validation. For languages in which the patterns used to construct terms are well identified and have been taught successfully to machines, automatic extraction produces many collocations, or groupings of words that frequently appear in a corpus but are not actually terms. Figure 5 shows the results from a very small corpus of computer-assisted translation (CAT)-related documentation. The results include a number of invalidated terms, including “in the document” and “the number of” among additional collocations. As you can tell, the configuration that produced the extraction results has simply not yet been refined by a qualified human to produce higher quality results.

Figure 5. Unvalidated Terminology Extraction in English by Sketch Engine

Out-of-the-box term extractors are currently not widely available in all languages. Anecdotally, I’ve observed poorer results from automatic extractions for languages like Korean and Thai, if extraction is available at all. On the bright side, plenty of invigorating linguistic analysis still needs to be done and then taught to the machines to reach greater levels of automation, and that will keep specialists working far into the future.

On small projects, or when carrying out term extraction for the first time, a great way to get started is by manually eliminating any words from your primary content that you know for sure are not special terms. This will help you develop “stop lists,” which you can use to teach your term extraction tool to automatically filter these words out from your results. A resource like this is obviously one with great potential for robust growth over time.

Figure 6 below shows an example of a selection of text within the domain of localization and the sub-domain of CAT technology that has undergone this process. Repetitions are highlighted using various colors. Synonyms appear in orange (one pair), and potential terms that were discarded are crossed out in red.

This selection of text produces the following list of terms. All terms in the list are presented in singular form and lowercase (unless the term is a proper noun), in keeping with terminological best practices. The relationships and hierarchy among the terms are indicated with bullets.

The list above contains compound nouns, a major challenge in terminology work. According to the Simplified Technical English specifications of the AeroSpace and Defense Industries Association of Europe, compound nouns longer than three words should generally be avoided. This is because they impede understanding as readers have to pause to unpack the modifications taking place.7 In this case, it will be fairly clear to specialists that the five-word compound phrase in the text actually consists of at least two compound nouns: “translation memory” + “net rate scheme.” Then again, it could also consist of three nouns: “translation memory” + “net rate” + “scheme,” where “scheme” is modified by both the terms “translation memory” and “net rate.” This identification is not something that’s easy for a machine to get right. Someone who isn’t qualified might also misinterpret this grouping or translate it inconsistently throughout a document, which will result in costly rework in later stages of production.

Figure 6. “Net Rate Schemes” by Memsource

Determining whether terms refer to unique concepts or are synonyms for other terms is another challenge within terminology work, especially because each language tolerates linguistic variation differently (one of many unconscious language preferences associated with languages). As those of us in the industry are aware, both “fuzzy match” and “75% match” can refer to the same concept. Unqualified humans and machines can very easily misinterpret this and translate each of these terms using words from their language that refer to entirely different concepts for each term in the pair. If different translators don’t correctly identify these two terms as synonyms, inconsistencies in meaning will be introduced in the target content that will ultimately impede user understanding.

When compiling terminology from client content, capture these important term types first to prevent downstream issues and the cost that results from unplanned rework:

  • Key technical terms and any synonyms for those terms
  • Abbreviations and acronyms
  • Neologisms, such as product names or features
  • Ambiguous words, including homonyms
  • Inconsistent terms (check for and standardize the preferred spelling variation)
  • Company names, slogans, and trademarks

While the process of finding equivalents for terminology in one language is referred to above as translation, it’s important to note that terminology work is an entirely different linguistic process. For terminology work conducted under ideal circumstances, the process of building corpora, extracting terminology, and understanding concept relations would be carried out independently in each language to prevent the structure of any one language from dominating the structure of meaning conceptualization. In the real world, tight deadlines and limited budgets mean that terminology work is often treated more as a process of translation, with English as the lingua franca source language within the technical fields.

To ensure that concepts are adopted, borrowing from languages like English when introducing new technologies in developing markets allows for linguistic voids to be filled quickly. Still, the adoption of terminology over time may not be as successful as it would be if terminology were formed according to the structures of the “target” language. Note that terminology work should also start long before localization efforts are underway.

Compiling Terminology into Databases

When starting to compile collected terminology into databases, a frequent source of rework is the incorrect fusion of terminology work with what lay people sometimes refer to as “dictionary work.” This is also known within specialized communities as lexicography. Lexicography is the practice of compiling all the known concepts included among the meanings of a word into a single entry. Terminology works in the opposite direction. Terminography is the practice of compiling all the words used to denote a single concept in a single entry. For example, in lexicography, the following concepts would be collected in the entry for key (n.): key for a lock, piano key, key of music, etc. In terminography, the following synonyms would be collected in the concept entry for encryption key (n.): key, cypher, code. In terminography, the concept (not word) entry collects all the different words (terms) used to name that concept.

The information associated with term entries in terminological databases is organized according to a hierarchy with three information levels: entry (or concept) level, language level, and term level.

  • The entry level contains the information that applies to the entire entry regardless of language. (Remember, we’re working with a single concept in each terminology record.) The subject field is a good example of the type of data stored at the entry level. Since every term in the entry refers to the same concept or meaning, the subject field applies to every term. We store that information once at the entry level, rather than repeating it in lower levels of the hierarchy.
  • The language level contains information that only applies to a specific language. Each term entry may have records in any number of languages. The area for each language is sometimes called a “language block.” A definition in the specific language is a good thing to store at this level.
  • The term level contains the information that applies only to that specific term. A language block may contain many term blocks, but at least one term block for the head term plus as many blocks as needed for any synonyms. At this level we collect information like the gender and number of the term in question for Romance languages.

Here’s an example of what this hierarchy might look like:

Termbase

This hierarchy warrants emphasis given the impact that the proper hierarchical setup can have on the scalability of individual terminology resources and overarching terminology campaigns. When the information in the database follows this structure, with data points captured at the appropriate level and in conformance with the ISO 30042 Termbase Exchange standard8, your termbase will more easily transfer among CAT tools. This means you’ll be able to make your resource more robust over time. Your terminology journey will likely start with some sort of an Excel database. Once you’re ready to start building custom termbases within CAT tools, look for products that have incorporated the ISO 30042 standard, such as Wordbee.

Terminology Management: Dispelling Common Misconceptions

Ultimately, terminology management is about education. It’s true that the practice requires a certain initial investment of time and resources that will not be immediately recovered. But don’t let that deter you from incorporating this best practice into your localization efforts, and the sooner the better. Start small and polish your terminological processes and databases until you’re confident that the structures produce the intended results. Then build out robust resources over time.

To conclude, consider this final example: the car name Chevy Nova. “Nova” is a term that can either leave you seeing stars or snickering about a car that “doesn’t go.” (The words “No va” translate to “[It] doesn’t go” in Spanish, so it’s an unfortunate name for a car.) So, make sure your worldwide product and language service launches are Go’s instead of No go’s. Be sure to incorporate terminology management in your localization workflows.

Notes
  1. ISO 704:2009—Terminology Work—Principles and Methods (International Organization for Standardization), http://bit.ly/ISO704.
  2. ISO 704:2009 (Objects), 2, http://bit.ly/ISO704-Objects.
  3. López, Ártemis. “You, Me, Hir, and Non-Binary Language,” Intercambios (ATA Spanish Language Division, October 2020), http://bit.ly/Intercambios-Ártemis.
  4. Jedlicka, Daniel. Odborná terminologie 1, Section 2.3.7 Coinage (European Union, 2019), https://bit.ly/Jedlicka-Coinage.
  5. Ahmad, Khurshid, and Margaret Rogers. “Corpus Linguistics and Terminology Extraction.” In Handbook of Terminology Management (John Benjamins, 2001), http://bit.ly/Khurshid-Rogers-terms, 725–760.
  6. Baker, Mona. “Corpus-Based Translation Studies: The Challenges That Lie Ahead.” In Terminology, LSP, and Translation (John Benjamins, 1996), http://bit.ly/Baker-corpus.
  7. Simplified Technical English, Issue 7 (Aerospace and Defense Industries Association of Europe, 2017), http://bit.ly/simplified-technical-English.
  8. ISO 30042:2019. Management of Terminology Resources—TermBase eXchange (International Organization for Standardization), http://bit.ly/ISO30042.
Additional Resources

Karsch, Barbara, Petra Drewer, Donatella Pulitano, and Klaus-Dirk Schmitz. Terminology Work Best Practices 2.0. Deutscher Terminologie-Tag e.V. (Köln, 2020), http://bit.ly/terminology-work.

Introduction to TermBase eXchange (TBX), www.tbxinfo.net.

Translation Commons, https://translationcommons.org. If you want to learn more about terminology management, take the “Introduction to Terminology Google Classroom” course by Translation Commons. Access the course for free by registering on Translation Commons and visiting the Learning Center.

Wright, Sue Ellen. “TBX Dialects: Making eXchange Work for You.” In The Routledge Handbook of Translation and Technology (Routledge, 2018), http://bit.ly/Wright-TBX.


Alaina Brandt is an assistant professor of professional practice in the Translation and Localization Management program at the Middlebury Institute of International Studies at Monterey. In addition to serving as an ATA director, she is the membership secretary of ASTM International’s Committee F43 on Language Services and Products. She is an expert within the International Organization for Standardization’s Technical Committee 37 on Language and Terminology. abrandt@miis.edu

The ATA Chronicle © 2021 All rights reserved.