Basic terminology

Translation and Localization FAQ

Basic terminology?

Back translation
The process of translating a document that has already been translated into another language back to the original language- preferably by an independent translator.

Bidirectional (writing system)
A writing system in which is generally flush right, and most characters are written from right-to-left, but some text is written left-to-right as well. Arabic and Hebrew are the only bidirectional writing systems in current use.

CE marking
The letters CE are the abbreviation of a French phrase that literary means "European conformity". CE marking on a product is manufacturer's declaration that the product complies with the essential requirements of the relevant European health, safety and environmental protection legislations.

A symbol standing for the smallest abstract component of a writing system or script, including sounds, syllables, notions or elements, as opposed to glyphs.

Computer-aided translation (CAT)
Computer technology applications that assist in the act of translating text from one language to another.

Content management system (CMS)
A system used to store and subsequently find and retrieve large amounts of data. CMSs were not originally designed to synchronize translation and localization of content, so most of them have been partnered with globalization management systems (GMS).

Controlled languages
Subset of natural languages whose grammars and dictionaries have been restricted in order to reduce or eliminate both ambiguity and complexity. Also, stylistic rules- such as not using certain verb tenses or the passive voice- can be created, depending upon the group or organization and its language usage goals.

Corpus (plural "corpora")
A large body of natural language text used for accumulating statistics on natural language text. Corpora often include extra information such as a tag for each word indicating its part-of-speech and perhaps the parse tree each sentence. Also, a large body of source-language text used for translation.

Creole language
A well-defined and stable language that originated from a non-trivial combination of two or more languages, typically with many distinctive features that are not inherited from either parent.

A variety of a language used by people from a particular geographical area. The number of speakers and the area itself can be of arbitrary size. A dialect is a complete system of verbal communication- oral or signed but not necessarily written- with its own vocabulary and/or grammar.

DITA (Darwin Information Typing Architecture)
An XML-based architecture for authoring, producing and delivering technical information. This architecture consist of a set of design principles for creating "information-typed" modules at a topic level and for using that content in delivery modes such as online help and product support portals on the web.

Encoding scheme
Rules for assigning numeric value (code points) to characters. Encoding is a method by which a character set is turned into computerized form for transmission and preservation.

Derived from the combination of the words GLOBAL and LOCAL. The word refers to the creation or distribution of products or services intended for a global or transregional market, but customized to suit local language, laws and culture.

Globalization (g11n)
In this context, the term refers to the process that addresses business issues associated with launching a product globally, such as integrating localization throughout a company after proper internationalization and product design.

Globalization management system (GMS)
A system that focuses on managing the translation and localization cycles and synchronizing those with source content management. Provides the capability of centralizing linguistic assets in the form of translation databases, leveraging glossaries and branding standards across global content.

The shape representation or pictograph of a character.

A flowing phonetic subscript of the native Japanese writing system. In hiragana, all of the sounds of the Japanese language are represented by 50 symbols.

HTML (HyperText Markup Language)
A markup language that uses tags to structure text into headings, paragraphs, lists and links, and tells a web browser how to display text and images on a web page.

Information retrieval
The science of searching for information in documents, searching for documents themselves, searching for metadata which describe documents or searching within databases, whether relational standalone databases or hypertext networked databases such as the internet, for text, sound, images or data.

Internationalization (i18n)
Especially in a computing context, the process of generalizing a product so that it can handle multiple languages and cultural conventions (currency, number separators, data) without the need for redesign.

The Chinese characters that are used in the modern Japanese logographic writing system along with hiragana, katakana and the Hindu-Arabic numerals. The Japanese term kanji literally means Han characters. Despite the existence of some 13,000 kanji characters, these alone do not suffice to write Japanese. Hiragana characters are also required to express grammatical inflections.

A Japanese syllabary, one component of the Japanese writing along with hiragana, kanji and in some cases the Latin alphabet. The word katakana means fragmentary kana, as they are derived from components of more complex kanji. Katakana are characterized by short straight strokes and angular corners and are the simplest of the Japanese scripts. Katakana and Hiragana both render the same syllables, but Katakana is angular and used largely to spell words borrowed from other languages, while hiragana is cursive and is used more frequently to spell native Japanese words.

Lingua franca
A language that is adopted as a common language between speakers whose native languages are different.

Localization (l10n)
In this context, the process of adapting a product or software to a specific international language or culture so that it seems natural to that particular region. True localization considers language, culture, customs and the characteristics of the target locale. It frequently involves changes to the software's writing system and may change keyboard use and fonts as well as date, time and monetary formats.

Machine translation (MT)
A technology that translates text from one human to another, using terminology glossaries and advanced grammatical, syntactic and semantic analysis techniques.

Namespaces provide a simple method for qualifying element and attribute names used in eXtensible Markup Language (XML) documents by associating them with namespaces identified by URL references. XML Namespaces are the solution to the problem of ambiguity and name collisions.

Microsoft platform for applications that work over the internet.

Notified bodies
Organizations designated by the national governments of the member states of the European Union as being competent to make independent judgments about whether or not a product complies with the protection- essential safety- requirements laid down by each CE marking directive.

The relocation of business processes to another country, especially a country overseas. This includes any business process such as production, manufacturing or services.

Open-source software
Any computer software distributed under a license that allows users to change and/or share software freely. End users have the right to modify and redistribute the software, as well as the right to package and sell the software.

Optical character recognition (OCR)
Recognition of printed or written characters by a computer. Involves computer software designed to translate images of typewritten text- usually captured by a scanner- into machine-editable text or translate pictures of characters into a standard encoding scheme representing them in ASCLL or Unicode.

To hide a third-party provider to perform tasks or services often performed in-house. The third-party provider is then referred to as the outsourcer.

The delegation of non-core operations or jobs from internal production within a business to an external entity such as subcontractor that specializes in that operation. Outsourcing is a business decision that is often made to lower costs or focus on competencies. A related term, offshoring, means transferring work to another country, typically overseas. Offshoring is similar to outsourcing when companies hire overseas subcontractors, but differs when companies transfer work to the same company in another country.

Pay per click (PPC)
An advertising technique used on websites, advertising networks and search engines. With search engines, PPC advertisements are usually text ads placed near search results. When a site visitor clicks on the advertisement, the advertiser is charged a small amount.

Simplified Chinese
Refers to one or two standard Chinese character sets of printed contemporary Chinese written language, officially simplified by the government of the People's Republic of China in an attempt to promote literacy. Simplified Chinese is used in mainland China and Singapore, modified to be written with fewer strokes per character.

Search engine
A program designed to help find information to help find information stored on a computer system such as the worldwide web or a personal computer. A search engine allows a user to ask for content meeting specific criteria- typically those containing a given word, phrase or name- and retrieves a list of references that match those criteria.

Search engine optimization (SEO)
A set of methods aimed at improving the ranking of a website in search engine listing. SEO is primarily concerned with advancing the goals of a website by improving the number and position of its organic search results for a wide variety of relevant keywords.

Semantic Web
An extension of the worldwide web that provides a common framework allowing data to be shared and re-used across application, enterprise and community boundaries. It is based on Resource Description Framework (RDF), which integrates a variety of applications using XML for syntax and URLs for naming.

Source language
A language from which text is to be translated into another language.

Traditional Chinese
A Chinese character set that is consistent with the original Chinese ideographic form that is several thousand years old. Today, traditional characters are used in Taiwan, Hong Kong, Macau and by some overseas Chinese communities, especially those originating from the aforementioned regions/countries or who emigrated before the widespread adoption of simplified characters in the People's Republic of China.

The process of converting all of the text or words from a source language to a target language. An understanding of the context or meaning of the source language must be established in order to convey the same message in the target language.

Translation memory (TM)
A special database that stores previously translated sentences which can then be re-used on a sentence-by-sentence basis. The database matches source to target language pairs.

Translation Memory eXchange (TMS)
An open standard, based on XML, which has been designed to simplify and automate the process of converting translation memories (TMs) from one format to another.

The Unicode Worldwide Character Standard (Unicode) is a character-encoding standard used to represent text for computer processing. Originally designed to support 65,000 characters, it now has encoding forms to support more than 1,000,000 characters.

XLIFF (XML Localization Interchange File Format)
Specifically designed to support the localization of data and has features for updating strings, revision control, making different phases of the localization process, word count calculations and the provision of alternative or suggested language translations, among others. XLIFF is an open standard.

XML (eXtensible Markup Language)
A programming language/specification. XML is a passed-down version of SGML, an international standard for the publication and delivery of electronic information, designed especially foe web documents.