Language-localization tips aid overseas sales
Scott Rosenthal
October, 1997
You've just finished a project, and the boss comes by and says, "Great, now how
easy is it to add additional languages? Wed like to sell it abroad." The
knee-jerk response is that it's not terribly difficultsimply translate a few text
strings. Once you start, though, it no longer seems so easy. But with a little forethought
and some understanding of localization issues, an embedded design can easily work with
other languages.
Localization is the act of changing a design so it works properly in the end user's
locale. For embedded software, it involves not just languages but other regional
differences such as time and date formats, currency symbols as well as decimal points and
commas. In the old days, you could insist that a customer could have your product in any
language, as long as it was English. This attitude won't work today. The international
market is large and tough, and adapting a product to each country's needs might be the
only way to make it marketable.
Hardware issues
The embedded world puts special constraints on localizing a design compared with the PC
world. There, and especially with a graphical interface, issues such as character fonts
and the placement of objects left to right, right to left, or top to bottom fade away. In
the embedded world, with normally limited display and entry options, system hardware can
dictate the extent and implementation of localization efforts. For example, if a product
comes with 7-segment displays, it must use a decimal point in place of a comma, which is
the practice in German. Also, just because you can spell an English word with letters that
fit in seven segments (for instance, CAL) doesn't mean you can spell out the same word in
French.
Even with a text display using, for example, a 5x7 character cell, another
consideration is the availability of correct fonts. The Romance languages require a font
with support for all appropriate characters with accent marks, umlauts or other
diacritics. But if youre trying to convert a system for use in Asia, the use of a
simple display might be impossible. In this case, a GUI on a modern OS may be the only
choice.
Even if support for a Western language is all a system needs, additional problems
exist. One such hardware issue revolves around the amount of memory a design can devote to
localizations. For example, a PC provides a practically infinite sink for storing text
messages. An embedded system, in contrast, normally places tight limits on available
memory. Nonvolatile memory must store these messages, in essence splitting memory between
text and program code. You'll find it absolutely amazing how fast text can chew up memory.
With an average word containing five characters (bytes) plus a sixth byte for a space,
every 1k byte of ROM holds roughly 170 words. So, for example, this column would require
approximately 8k bytes of ROM for storage. That amount might not seem so bad, but multiply
it by perhaps five languages, and all of a sudden text storage climbs to more than 40k
bytes.
So, before you can even begin to handle language translations, you must resolve the
following questions when designing an embedded system for localization:
- Will the system need localization?
- Will the system handle more than one localization without changing the program?
- If more than one, how many localizations must a system handle without a program change?
- Is there room in ROM for both the program and all the messages for different
localizations?
- How does the user select the correct language?
- Do constraints exist on the user display or printer that might cause trouble? If so,
what can you do?
- Is there room on the display or printer for message expansion with other languages?
With a little help...
After deciding to localize an embedded system, you face two main issues: incorporating
localization within source code, and getting proper language translations for text
messages.
Programmers can choose from several ways to incorporate localization within source
code. The primary things you must change are date and time formats, decimal points and
commas, currency and text messages.
Im a firm believer in isolating user-interface routines from the rest of the
program code. As an example, a system might need to display frequencies in Hertz,
kiloHertz or megaHertz. Within a program I keep all frequencies as, for example, Hertz.
Any time the program must display a frequency, it calls a function that returns a string
with the value formatted for the correct representation, such as megaHertz. This isolation
also helps development when it comes to localization. By making software always call a
function for formatting output data, you can use one function to handle a particular
localization issue, such as the date format or substituting commas for decimal points.
Another way to handle localization issues is to embed special codes in strings. These
codes could stand for the currency symbol, the decimal-point symbol or the date symbol.
Before displaying such data to the user, the software passes the string through a localize
function that performs substitutions using information for the correct locale.
For localizing the date format, another technique I've used with an alphanumeric
display is to show it as dd-mmm-yyyy where the month appears as a 3-letter abbreviation
specific to each localization. This technique then treats the date format as a text-string
substitution and the format works in most regions of the world.
The text of it all
To most people, localization means only translating strings to another language. Not
quiteyou must also address numerous related issues including message expansion,
correct text selection and translation accuracy.
For instance, moving from English to another language generally increases the number of
characters in a message. Short ones seem to increase the most, whereas longer messages
don't enlarge as much. I generally try to leave 50% additional space on a display device
for string expansion. Also don't forget to write code that works with variable-length
messages instead of hard-coding the length. For example, if a program includes a column of
numbers with text labels preceding them, right-justify the labels so all the numbers line
up correctly on their decimal points or commas.
One of the more important issues with localization is how to modify text in a program.
Anyone who hard-codes the text message into the program's body pays dearly when localizing
the design. One technique is to create a text module for each language (such as english.c
or french.c). It holds a list of all the text strings in an array, and each language
variant of the module holds the same string information at the same array location. The
software, when it needs a text string, uses the index.
Similarly, for a system with multiple localizations, one technique is for software to
call a function that returns a pointer to the correct text string. The following snippet
shows an example of this function:
int language;/* global language selection */
char *GetTextPtr(int n)
{
switch (language) {
default: /* default is English */
case 0:
return &english[n];
case 1:
return &swedish[n];
case 2:
return &spanish[n];
case 3:
return &german[n];
case 4:
return &french[n];
}
}
Any function that needs a text string calls GetTextPtr(). Its input argument is an
index number into the string array (such as 4 for "Display diagnostics."). The
function then branches to the proper language pointer and returns a pointer to the text
string.
Translation woes
One of the biggest problems I've had with localization is finding translation staff who
can properly put text strings into other languages. Remember, these people generally
aren't computer whizzes, so technical terms and markings can throw them. For instance,
what do you expect a nonprogrammer to do with %5.2f or %%? Likewise, line continuations
and spaces for formatting can also throw off their work. The following programming
guidelines should help translators do their work:
- Use macros for computer terms in the middle of text strings. Use uppercase characters
that definitely don't look like text.
- Keep line lengths short. Remember about string-length expansion with other languages.
- When programming in C, don't continue lines with a "\"; instead, use the other
form for strings: "text 1" "text 2" (no comma between the strings)
- Create and give the translators a style guide for the system. Do you want acronyms for
various functions or should translators spell them out? Can or does the acronym change
with different languages? How much space can a translator use for technical and
product-specific terms? Are there any display-size restrictions that might limit how long
a single word can be (a non-trivial issue for German and Italian)?
PE&IN
Adapted from an article that appeared in Personal Engineering & Instrumentation
News.
Return to the article index.
|