Japanese line breaks

Any text document consists of content and layout. The document translation process aims at recreating a document in the target language that is equivalent to the source document in both content and layout. Thus, the document translation process has two main sub processes: content translation and layout adjustment. Content translation must be - and since this is apparent to most people, it generally is - performed by native speakers of the target language.

The situation is different in the case of layout adjustments. Modem translation tools are so good at extracting the translatable text portions from source documents while protecting the non-translatable formatting elements those layout adjustments may not even be needed. This is typically the case for the translation of web formats such as HTML or XML. Since web layout is rather fluid, with a large part of the actual presentation controlled by the web browser, it is generally sufficient to simply replace the source with the target text. If the goal is to produce translated print documents, however, the translated text often has to be forced into a predetermined, fixed layout. Due to time constraints, cost considerations or other logistic factors, desktop publishers often find themselves confronted with the task of touching up a document of which they are unable to read a single word.

Although one may deplore this situation as a violation of best practice, it is nevertheless common enough to warrant treatment as an integral part of translation process. As such, it requires support material to help non-readers in their layout adjustment task.

In this issue, we will look at Japanese. The first concern a desktop publisher may have is text directionality. As many people know, Japanese books are traditionally read from right to left, in a top-to-bottom column format, but scientific and technical publications, including user manuals for hardware and software, are always written left to right in the same format as English documents. The web appears to be spreading this format still further. Thus, when English - language technical documentation is translated into Japanese, the source text should simply be replaced with Japanese, and the document layout should stay as is.

When space is tight in print documentation, it is often necessary in adjust the line breaks manually. For non - reader, Japanese text appears daunting at first glance, since words are often not separated by spaces. However, written Japanese has number of surface characters that can provide useful guidance.

First Japanese use punctuation marks to delimit sentences [period], sub clauses (comma) and insertions [parentheses]. Thus, just as in English, it is always safe in insert a line break after a period, or a comma or a closing parenthesis, or before an opening parenthesis. When foreign words are transcribed into Japanese script, spaces are indicated either with the ' character or a one - byte space. Inserting a lime break immediately after this dot character or the space is acceptable.
The Japanese writing system uses three different sets of characters, each one for a specific purpose. Chinese characters called kanji are used to convey concepts or word meaning: they are logographic symbols. Thus, kanji carry the main meaning of Japanese texts, Kanji are fairly easy to recognize, since most of these symbols looks fairly intricate. Since Japanese uses many hundred kanji, a complete listing is impractical.

Hiragana are symbols of Japanese origin that form a syllabary. This means that, like English letters, each symbol stands for a speech sound rather than a word meaning. However, while English letters generally represent a single sound, hiragana represent a whole syllable.
Hiragana are used to represent grammatical information-that is, they roughly correspond in English prepositions, conjunctions and similar function words. Hiraganas are generally attached at the end of a word - that is, hiragana typically from a unit with preceding kanji.
Katakana are used for transcribing foreign words and names. In some cases, as in product names or entire names, Japanese also uses Western script, and Arabic numbers are commonly used in Japanese just as in English.

Since these fairly easily distinguishable symbols sets are used for such different purposes, it is possible to make some useful generalizations for basic layout adjustments.

  • Try not to separate adjacent kanji symbols - that is, adjacent kanji should stay together as much as possible. However, when a long series of kanji [three or more] extends beyond the line limit, you may separate them.
  • Fewer than three adjacent hiragana and adjacent katakana should always stay together. Any series of these or more hiragana or katakana characters can generally be separated. The only exception is a character combination, which forms a single syllable. These character combination, however, are also easily identifiable, as the second and third characters are smaller in size that the first one. Furthermore, the long vowel sign - that is attached to hiragana or katakana - should never be separated from the preceding characters because this sign constitutes a part of the same syllable.
  • Never separate kanji / katakana / Western script from immediately following hiragana.
  • Never separate Arabic numerals from immediately following kanji.
  • You can separate Arabic numerals from immediately following hiragana / katakana / Western script.
  • You can separate hiragana from immediately following kanji / katakana / Western script / Arabic numerals.
  • You can separate katakana from immediately following kanji / Western script / Arabic numerals.
  • You can separate kanji from immediately following katakana / Western script / Arabic numerals.
  • You can separate Western script from immediately following kanji / katakana / Arabic numerals.