Globalization Guidelines

Overview

Dojo addressed the globalization features at the very beginning of its development. This document presents the rules and describes how to use these features to globalize your Web applications based on Dojo version 1.0. Each of the rules use one of the following directive words:

* Must: You must always follow the rules; otherwise your application cannot be globalized.
* Should: You are recommended to follow the rules in some situations. Your application can be globalized if you do not follow these rules, but more effort might be needed.
* May: These rules do not affect the capability of globalization. You can choose whether to follow them or not.

Use the following guidelines to implement internationalization.

Encoding guidelines

You should always use UTF-8 for encoding settings wherever applicable.
You should encode all text files in UTF-8.
You must specify the UTF-8 encoding in every HTML file before any non-English characters.
You must use the BOM header for UTF-16 files.
You must use UTF-8 to decode XHR request parameters.
You must use UTF-8 encoding when using a non-English string in a URL.
You must set Content-Type in an HTTP response header if the response is not encoded in UTF-8.

Locale and Resource Bundle Guidelines

You must set Content-Type in an HTTP response header if the response is not encoded in UTF-8.
You must set djConfig.locale in all files to the same as the locale used by the server code.
You must always use resource bundle to store the strings displayed to users.
You should use djConfig.locale to set the default locale and extra locales, and use only dojo.requireLocalization without the locale parameter.
You may make a build to include resource bundles in the locales that you use.

String Manipulation Guidelines

You should use the Js2Xlf tool to convert JSON files into XLIFF files for translation.
You should deal with free text using ICU library at the server side.
You should use only casing functions for locale neutral situation.
You should not use locale sensitive casing functions provided by JavaScript.
You should always escape a string as a whole rather than character by character.
You should not use any comparing, searching, or replacing functions for strings that might contain combining character sequences.
You should not use inserting, removing, or splitting functions for strings that might contain special characters.
You should not use trimming functions for strings that might contain special characters.
You should not use counting functions for strings that might contain special characters.
You must check the return value of String.charAt() when the string contains surrogates.

Formatting and Validation Guidelines

You must use dojo.string.substitute() to generate text output rather than simply use "+" between strings.
You must use Dojo format functions to convert locale sensitive data into text.
You must use Dojo validating and parsing functions to convert text from the users' input into data.
You should not hard-code patterns and locales when formatting data.

Dijit Widgets Guidelines

You should not specify both the height and the width of a widget to be translated by numeric units.
You must ensure that all resources used in widgets are localizable.
You should consider BiDi support in development and customization.

Encoding Guidelines

Encoding Guidelines

The following guidelines should be used to implement internationalization in encoding.

You should always use UTF-8 for encoding settings wherever applicable.
You should encode all text files in UTF-8.
You must specify the UTF-8 encoding in every HTML file before any non-English characters.
You must use the BOM header for UTF-16 files.
You must use UTF-8 to decode XHR request parameters.
You must use UTF-8 encoding when using a non-English string in a URL.
You must set Content-Type in an HTTP response header if the response is not encoded in UTF-8.




The UTF-8 Encoding

You should always use UTF-8 for encoding settings wherever applicable.

This is a general rule for Web application design and development. You should manually set the encoding for files or I/O streams wherever applicable, because the default encoding (usually ISO-8859-1) cannot handle all Unicode characters. UTF-8 is the best choice when you use Dojo in your application, since Dojo only uses UTF-8.

The rest of this chapter describes the details of setting encodings in a Web application.



UTF-8 File Encoding

You should encode all text files in UTF-8.

You should encode all text files in UTF-8, including HTML files, CSS files, JavaScript files, etc.

You must specify the UTF-8 encoding in every HTML file before any non-English characters.

You must specify the encodings of all HTML files as early as possible. Ideally, this occurs on the server such that the server applies HTTP encoding headers to mark the document, otherwise this must be achieved in the browser using the meta tag. For example:


    
        
        Hello World!
    
    ...

This encoding declaration must appear before any non-English characters in a file; otherwise a browser might fail to read it correctly. For example, IE 6.0/7.0 cannot render the following content (encoded in UTF-8):


    
        ???
        
    
    Hello!

By default, browsers assume that all files referred by an HTML file use the same encoding as the referring HTML file. So if you have the encoding of every HTML file specified, you do not need to declare the encoding setting in each CSS or JavaScript file again, but you can override the encoding anyway when some files are not in the same encoding as the HTML file. For example,

...

...

...


Sending and Receiving Requests

You must use the BOM header for UTF-16 files.

A BOM header consists of 2, 3, or 4 bytes at the very beginning of a text file to indicate its encoding. For example, 0xFF 0xFE means that the file is encoded in UTF-16LE, while 0xEF 0xBB 0xBF means that the encoding is UTF-8. The BOM header can override the encoding settings mentioned above in a browser.

Using UTF-16 is not recommended, but if you choose it for some reason, the BOM header is required. Because UTF-16 is not compatible with ASCII, a browser even does not have a chance to read the encoding setting of the file content.

You must use UTF-8 to decode XHR request parameters.

The dojo.xhr* functions are the most common way in Dojo to enable Ajax features -- sending an asynchronous request to the server by an XMLHttpRequest object. The typical call to one of these functions can be:

dojo.xhrGet({
    url: "foo.jsp",
    content: {"name": "\u4e00"} // \u4e00 ("?") is the Chinese character for "one"
});

The url is where this request will be sent to. The content is the JSON object that will be sent in the request. In Dojo’s implementation, the key and value pairs in the content are encoded by the encodeURIComponent function first, and then converted to a query string like "key=value&key=value&...". The xhrPost function puts the query string into the request content, and other functions like xhrGet append the query string to the end of the url, so the previous code is equal to the following code:

dojo.xhrGet({
    url: "foo.jsp?name=%e4%b8%80", // %e4%b8%80 are the UTF-8 bytes for \u4e00
});

Because the encodeURIComponent function always uses UTF-8, you must use UTF-8 at the server side to decode the request parameters both in the URL (xhrGet) and in the request content (xhrPost).

For example, in Tomcat, you can set the encoding of URL by the URIEncoding attribute in server.xml:

<

You can set the encoding of the request content (xhrPost) by simply calling request.setCharacterEncoding before using the request object:

<%@page contentType="text/html; charset=utf-8" pageEncoding="utf-8"%>
<%request.setCharacterEncoding("utf-8");%>
...
name=<%=request.getParameter("name")%>

You MUST manually set the encoding on your server, because almost no Web server uses UTF-8 to decode URLs and request content by default. For example, Tomcat always uses ISO-8859-1 to deal with requests if you do not set the encoding. WebSphere uses a locale-encoding map to determine the request encoding from the client's language, but no locale is mapped to UTF-8 by default.

You must use UTF-8 encoding when using a non-English string in a URL.

Some browsers like IE always send URLs using the default system encoding. For example, in a Simplified Chinese Windows XP operating system, IE sends a URL encoded in GB2312. If you need to put some non-English parameters in a URL, make sure that you have encoded it first using the encodeURIComponent function. For example, in a Simplified Chinese Windows XP, if you run the following script in IE:

dojo.xhrPost({
    url: "foo.jsp?name1=\u4e00",
    content: {"name2": "\u4e00"}
});

You might get different results for name1 and name2 at the server side:

name1 --> 0xD2 0xBB (in GB2312, Wrong!)
name2 --> 0xE4 0xB8 0x80 (in UTF-8, Right!)

The right way is to encode name1 first:

dojo.xhrPost({
    url: "foo.jsp?name1=" + encodeURIComponent("\u4e00"),
    content: {"name2": "\u4e00"}
});


Sending Responses

You must set Content-Type in an HTTP response header if the response is not encoded in UTF-8.

An XMLHttpRequest object first checks the HTTP header of a response to see if there is a Content-Type property that sets the encoding of the response; otherwise, it always uses UTF-8 to decode the response into a string. Web servers usually set the Content-Type property automatically for dynamic files like JSP. However, for static files, Web servers probably do not know the encoding of the files and also do not set the Content-Type property for them.

Locale and Resource Bundle

You must set djConfig.locale in all files to the same as the locale used by the server code.
You must always use resource bundles to store the strings displayed to users.
You may use djConfig.locale to set the default locale and extra locales, and use only dojo.requireLocalization without the locale parameter.
You should make a build to include resource bundles in the locales that you use.

Locale Setting in Dojo

There is a slight difference in the locale naming conventions between Dojo and Java. Dojo uses "-" (hyphen) as the separator for concatenate language code, country code, and variants, whereas Java uses an "_" (underline). For example, "zh_CN" in Java is similar to "zh-cn" in Dojo.

Like the default locale in Java, Dojo has a global locale value that is stored in a global variable: dojo.locale. This default locale value affects the behavior of several locale-related functions and widgets. The value of dojo.locale is not supposed to be changed. You should use djConfig.locale to initialize this value.

Must set djConfig.locale in all files to achieve server-based personalization

If djConfig.locale is undefined, Dojo will consult the browser's navigator object for the setting chosen at browser install time. Note that this is unrelated to the locale setting in the preferences dialog, which is for interaction with the server only. To provide personalization from the server to control locale settings in an application, you must set djConfig.locale in the page at the server side, prior to loading dojo.js. For example, here is a JSP page that sets the default locale for Dojo:

...
<%
String actualLocale = ResourceBundle.getBundle("my.app.test",
    request.getLocale()).getLocale().toString().replace('_', '-');
%>

...


Resource Bundle Files

You must always use resource bundles to store the strings displayed to users.

Dojo introduces resource bundle into JavaScript. If you are familiar with Java resource bundle, you can find that Dojo resource bundle is very similar to Java resource bundle. The following table shows a summary of the differences between Java and Dojo:

  Java Dojo
File Format Properties file JSON file
Locale Identifier Suffix of file name Directory name
Locale Naming Use "_" (underline) as separator Use "-" (hyphen) as separator
Get Bundle ResourceBundle.getBundle dojo.requireLocalization, dojo.i18n.getLocalization
Get Message ResourceBundle.getString JSON object

For example, there are two resource bundles named "bar" and "foo" in a package named "my.app" with some of their localized versions:

In Java (6 files with different names):

my/
   app/
       bar.properties
       bar_zh.properties
       bar_zh_CN.properties
       bar_zh_TW.properties
       foo.properties
       foo_zh_CN.properties
And in Dojo (4 directories and 6 files):
my/
   app/
       nls/
           bar.js
           foo.js
           zh/
              bar.js
           zh-cn/
                 bar.js
                 foo.js
           zh-tw/
                bar.js
The fallback strategy in Dojo is the same as that in Java.

Using Resource Bundle

First, you should use the dojo.registerModulePath function to define the directory where resource bundles are as a registered module. The module name needs to be used in later callings to the dojo.requireLocalization and dojo.i18n.getLocalization functions. For the previous example, you can use the following line to define the module "my.app":

dojo.registerModulePath("my.app", "../../my/app");

Note: Here, the "../../my/app" path is relative to the directory that contains "dojo.js".

Then you can use the dojo.requireLocalization function to load resource bundles from files. After a resource bundle is loaded, the dojo.i18n.getLocalization function returns a copy of the bundle object.

When you get the bundle object, you can use it as a normal JSON object (a hash) to get messages. If you modify values in the bundle object, the original global bundle object will not be affected.

You may use djConfig.locale to set the default locale and extra locales, and use only dojo.requireLocalization without the locale parameter.

djConfig.locale overrides the browser's default locale as specified by the navigator Javascript object. This setting is effective for the entire page and must be declared prior to loading dojo.js. djConfig.extraLocale establishes additional locales whose resource bundles will be made available. This is used rarely to accomodate multiple languages on a single page. No other locales may be used on the page.

If you omit the locale parameter when calling the dojo.requireLocalization function, the function will load the resource bundles for locales in djConfig.locale as well as for all the locales in djConfig.extraLocale.

For example, if you define:

then the following two code blocks are equal:

Code block A:

dojo.requireLocalization("my.app", "bar");

var bar = dojo.i18n.getLocalization("my.app", "bar");
Code block B:
dojo.requireLocalization("my.app", "bar", "zh-cn"); // default locale
dojo.requireLocalization("my.app", "bar", "zh-tw"); // extra locale
dojo.requireLocalization("my.app", "bar", "fr");    // extra locale

var bar = dojo.i18n.getLocalization("my.app", "bar", "zh-cn"); // default locale
The first method is preferred as it is less brittle.



Builds

Before you deploy your Web application using Dojo, you should consider building the Dojo layers that are used by your application into a single JavaScript file. Using such a build brings you many advantages. The unused scripts, white spaces, comments, and overridden string values can be removed to make smaller downloads, and the need to search by locale can be skipped such that extra server requests and 404 responses are avoided. In general, the build reduces the request time from the browser to the server to avoid latency issues.

Should make a build to include resource bundles in the locales that you use

Resource bundles can either be included in a build or be used without a build. If you use resource bundles without a build, the first request for each resource bundle will generate N+1 HTTP requests when it searches the server for values, where N is the number of segments in the target locale. For example, a call of dojo.requireLocalization("my.app", "bar") in the "zh-cn" locale looks for "bar.js" first in the "zh-cn", then in "zh", and finally in the root. Without optimization, some of these requests might result in harmless HTTP 404 errors (page not found) if a variant does not need to override any definitions from its parent.



Translation

JSON is a convenient and efficient format for resource bundles in JavaScript, but the JSON format is not well supported by many professional translation centers. XLIFF is the industry standard file format for localization and translation. Among other things, XLIFF will ease in declaration of encoding and hide details from the translator such as JavaScript character entities. Tools will be developed to support round-trip transforms between JSON and XLIFF. Support for gettext PO files in the future is also possible.

Translators must also be aware of the substitution syntax of Dojo — ${x}

String Manipulation

Although the string data in JavaScript is specified in UTF-16 internally, JavaScript does not provide enough manipulation functions for Unicode strings. It is not practicable to develop a full Unicode compatible JavaScript library like ICU, since some Unicode algorithms, like normalization and collation, are too complicated to run in a script language. This chapter shows what you can do and what you should not do with strings in JavaScript. If you need full support for Unicode compatible string manipulations, you must use ICU library (ICU4C or ICU4J) at the server side instead of in JavaScript.

I18N Guidelines for String Manipulation

You should use the Js2Xlf tool to convert JSON files into XLIFF files for translation.
You should deal with free text using ICU library at the server side.
You should use only casing functions for locale neutral situation.
You should not use locale sensitive casing functions provided by JavaScript.
You should always escape a string as a whole rather than character by character.
You should not use any comparing, searching, or replacing functions for strings that might contain combining character sequences.
You should not use inserting, removing, or splitting functions for strings that might contain special characters.
You should not use trimming functions for strings that might contain special characters.
You should not use counting functions for strings that might contain special characters.
You must check the return value of String.charAt() when the string contains surrogates.

Special Kinds of Characters

Some special characters in Unicode can probably cause problems in a string to be manipulated:

Combining Character Sequence: a combining character sequence starts with a normal character followed by one or more combining marks. For example, "A" followed by a U+0768 (an accent) becomes "À". A combining character sequence should be treated as one character.
Surrogate Character: surrogate characters must appear in pairs. For example, "U+DB40 U+DC00" represents an ancient Chinese character. A surrogate pair must be treated as one character.

Basic String Manipulations

There are several basic string manipulations. Some of them are locale-sensitive, which means that they might cause different results in different locales:
Transformation: to transform a string into a specified form, for example, casing and escaping. Casing is a locale-sensitive operation, because some characters might have different casing results in different locales. For example, the Latin letter "i" is uppercased to "İ" (a dotted "I") in the Turkish language.
Recognition: to recognize special characters in strings, for example, white-space characters.
Join: to concatenate two strings into one.
Division: to separate one string into two substrings from a certain position.
Enumeration: to count characters in a string, for example, getting the string's length.
Comparison: to compare one string with another, for example, searching and sorting. Comparison is locale-sensitive.

Possible Problems

The following table shows the possible problems that might occur when you perform a specific manipulation on strings that contain special characters:
  Ordinary Character Combining Character Sequence Surrogate Character
Conversion Locale Sensitive
(Unsolvable)
Locale Sensitive
(Unsolvable)
Locale Sensitive
(Unsolvable)
Recognition Unicode Specific
(Unsolvable)
Unicode Specific
(Unsolvable)
Unicode Specific
(Unsolvable)
Join (No Problem) (No Problem) (No Problem)
Division (No Problem) Broken Combining Character
(Unsolvable)
Invalid Code Point
(Solvable)
Enumeration (No Problem) Wrong Character Number
(Unsolvable)
Wrong Character Number
(Solvable)
Comparison (No Problem) No Canonical Comparison
(Unsolvable)
(No Problem)

Locale Sensitive: JavaScript has a pair of locale sensitive casing functions -- String.toLocaleUpperCase and String.toLocaleLowerCase, but they only honor the default locale of the client's operating system and cannot be controlled by Web applications.

Unicode Specific: JavaScript might not recognize all Unicode characters that have a specific property. For example, some white-space characters (e.g., the no-break space -- U+00A0) cannot be removed when the string is trimmed in JavaScript:

alert("\u00A0".replace(/\s/g, "").length); // 0 (in Firefox), 1 (in IE)

Show Me (THIS STILL NEEDS TO BE FIXED)

Broken Combining Character: a combining character sequence might be broken into meaningless characters. For example, when you replace all "A" in a string to "B", you might get all "À" replaced by "B̀ ", which is obviously not what you want. This kind of problems cannot be solved in JavaScript.
Invalid Code Point: a pair of surrogates can be separated into unpaired surrogates, which are invalid code points in Unicode. This problem could be a very serious problem, because invalid code points might crash some browsers (Safari 2.0.3 loses response when you try to select such invalid characters). Fortunately, unlike other kinds of problems, this kind of problems can be solved in JavaScript, because the surrogate scope is quite simple to be checked (from U+D800 to U+DFFF). You must make sure that no unpaired surrogates appear in the result string.
Wrong Character Number: because of combining character sequence and surrogate character, one meaningful character can consist of several characters in JavaScript that are 16-bit integers. You want to get the number of meaningful characters, but only get the number of 16-bit integers. This kind of problems cannot be solved in JavaScript, unless it is caused by surrogate characters.
No Canonical Comparison: JavaScript cannot make canonical comparison on the strings in different Normal Form. Although the ECMAScript Specification v3 recommends that canonical comparison be supported by the String.localeCompare(str) function, no browser actually implements this feature.
You should deal with free text using ICU library at the server side.

The only safe operation for all strings is joining. To manipulate a string, first you need to know what kinds of characters are allowed in it and whether the manipulation is locale sensitive. If the string accepts combining character sequences and surrogate characters or if the manipulation is locale sensitive like casing, you are recommended to send the string to the server and deal with it by using ICU library at the server side. For example, your application might need to handle the following kinds of strings: Restricted Identifier: most identifiers, such as the user's login name, are restricted to letters and numbers, so you can manipulate them in JavaScript freely and safely. Program Information: program information is a string that is only used for your application code, for example, enumeration item names, program commands, and statements. Make sure that no combining character sequence or surrogate character is included in your program information. Then you can manipulate them without potential problems. URL or E-mail Address: URLs and e-mail addresses might contain almost any Unicode characters. You should deal with them at the server side. Free Input Text: the text entered freely by users should be manipulated at the server side. String Manipulating Functions in JavaScript and Dojo

You should use only casing functions for locale neutral situation.
Some casing functions perform only locale neutral casing conversion. You should use them only for program information that is not displayed to end users directly. These functions include:
  • String.toLowerCase()
  • String.toUpperCase()
You should not use locale sensitive casing functions provided by JavaScript.

Other casing functions perform locale sensitive casing conversion with the default locale of the client's operating system. You should never use them. The reasons are: first, it is not guaranteed that browsers implement these functions as the exact behavior defined in the latest Unicode Standard; second, you cannot control the output by your Web application. It is almost unacceptable that an end user needs to change the default locale of the operating system so as to change the locale of your Web application. These functions include:

  • String.toLocaleLowerCase()
  • String.toLocaleUpperCase()

You should always escape a string as a whole rather than character by character.
  • window.escape()
  • window.unescape()
  • window.encodeURI()
  • window.decodeURI()
  • window.encodeURIComponent()
  • window.decodeURIComponent()
  • You should not use any comparing, searching, or replacing functions for strings that might contain combining character sequences.

    Functions in this section might encounter problems from the manipulation of comparison and division on free text. They might replace wrong characters, return invalid strings or characters, and they cannot perform canonical comparison. Use them only for program information without special characters. These functions include:

    • String.indexOf()
    • String.lastIndexOf()
    • String.localeCompare()
    • String.match()
    • String.replace()
    • String.search()
    • dojo.string.substitute()

    Should not use inserting, removing, or splitting functions for strings that might contain special characters.

    Functions in this section might have the problem from division on free text. They might return invalid string or characters. Use them only for program information without special characters. These functions include:

    • String.slice()
    • String.split()
    • String.substring()

    You should not use trimming functions for strings that might contain special characters.

    Trimming functions might have the problem of Recognition. They cannot recognize all white-space characters defined by Unicode. Use them only for program information without special characters. These functions include:

    • dojo.string.trim()
    • dojo.trim()

    You should not use counting functions for strings that might contain special characters.

    Functions for counting characters can only return the number of UTF-16 units rather than the real meaningful characters. These functions include:

    • String.length
    • dojo.string.pad()

    You must check the return value of String.charAt() when the string contains surrogates.

    The String.charAt() function might return an unpaired surrogate that is invalid. You should always check the returned value, for example: view plaincopy to clipboardprint?

    function codePointAt(s, pos) {
        var c = s.charAt(pos);
        if (c == "") {
            return c;
        }
        var code = c.charCodeAt(0);
        if (code >= 0xD800 && code <= 0xDBFF) { 
            // lower surrogate detected, we need to fetch more char
            if (pos < s.length - 1) {
                code = s.charCodeAt(pos + 1);
                return code < 0xDC00 || code > 0xDFFF ? "" : s.substring(pos, pos + 2);
            } else {
                return ""; // error in the string
            }
        } else if (code >= 0xDC00 && code <= 0xDFFF) {
            // higher surrogate detected, we need to fetch more char backward
            if (pos > 0) {
                code = s.charCodeAt(pos - 1);
                return code < 0xD800 || code > 0xDBFF ? "" : s.substring(pos - 1, pos + 1);
            } else {
                return ""; // error in the string
            }
        }
    }

    Formatting and Validation

    Dojo has the capability of formatting messages like what Java does. Message formatting functions are very important to the translatability of an application -- separating code and translatable materials and supporting application translation without any modification to the code. Furthermore, Dojo provides full functions for data formatting and validation based on Unicode CLDR data, just like what ICU library does. You can easily format your output and validate users' input in Dojo without invoking code at the server side.

    You must use dojo.string.substitute() to generate text output rather than simply use "+" between strings.
    You must use Dojo format functions to convert locale sensitive data into text.
    You must use Dojo validating and parsing functions to convert text from the users' input into data.
    You should not hard-code patterns and locales when formatting data.

    Must use dojo.string.substitute() to generate text output rather than simply use "+" between strings.

    You can use dojo.string.substitute() to substitute place holders in the message string. This function provides a function similar to the java.text.MessageFormat.format() function in Java. You must use this function to get your text output instead of simply using the "+" operator; otherwise your application loses its translatability. For example, an English message "File 'foo.txt' is not found in directory '/root/bar'." can be translated as "?'/root/bar'??????'foo.txt'???" in Chinese. If you write your code like msg = pieceA + fileName + pieceB + dirName + pieceC, you cannot translate this message into Chinese without modifying your code, because the positions of fileName and dirName are swapped. Therefore, the right approach is to write the code like this: /* GeSHi (C) 2004 - 2007 Nigel McNie (http://qbnz.com/highlighter) */ .geshifilter {font-family: monospace;} .geshifilter .imp {font-weight: bold; color: red;} .geshifilter .kw1 {color: #000066; font-weight: bold;} .geshifilter .kw2 {color: #003366; font-weight: bold;} .geshifilter .kw3 {color: #000066;} .geshifilter .co1 {color: #009900; font-style: italic;} .geshifilter .coMULTI {color: #009900; font-style: italic;} .geshifilter .es0 {color: #000099; font-weight: bold;} .geshifilter .br0 {color: #66cc66;} .geshifilter .st0 {color: #3366CC;} .geshifilter .nu0 {color: #CC0000;} .geshifilter .me1 {color: #006600;} .geshifilter .re0 {color: #0066FF;}

    dojo.requireLocalization("my.app", "message")
       var message = dojo.i18n.getLocalization("my.app", "message")
       msg = dojo.string.substitute(message.FILE_NOT_FOUND_IN_DIR, ["foo.txt", "/root/bar"]);

    And the resource bundle "message.js" is like this: /* GeSHi (C) 2004 - 2007 Nigel McNie (http://qbnz.com/highlighter) */ .geshifilter {font-family: monospace;} .geshifilter .imp {font-weight: bold; color: red;} .geshifilter .kw1 {color: #000066; font-weight: bold;} .geshifilter .kw2 {color: #003366; font-weight: bold;} .geshifilter .kw3 {color: #000066;} .geshifilter .co1 {color: #009900; font-style: italic;} .geshifilter .coMULTI {color: #009900; font-style: italic;} .geshifilter .es0 {color: #000099; font-weight: bold;} .geshifilter .br0 {color: #66cc66;} .geshifilter .st0 {color: #3366CC;} .geshifilter .nu0 {color: #CC0000;} .geshifilter .me1 {color: #006600;} .geshifilter .re0 {color: #0066FF;}

    { 
        FILE_NOT_FOUND_IN_DIR: "File '${0}' is not found in directory '${1}'."
    }
    Now you can translate the message into Chinese by only providing a translated resource bundle in the locale "zh-cn": /* GeSHi (C) 2004 - 2007 Nigel McNie (http://qbnz.com/highlighter) */ .geshifilter {font-family: monospace;} .geshifilter .imp {font-weight: bold; color: red;} .geshifilter .kw1 {color: #000066; font-weight: bold;} .geshifilter .kw2 {color: #003366; font-weight: bold;} .geshifilter .kw3 {color: #000066;} .geshifilter .co1 {color: #009900; font-style: italic;} .geshifilter .coMULTI {color: #009900; font-style: italic;} .geshifilter .es0 {color: #000099; font-weight: bold;} .geshifilter .br0 {color: #66cc66;} .geshifilter .st0 {color: #3366CC;} .geshifilter .nu0 {color: #CC0000;} .geshifilter .me1 {color: #006600;} .geshifilter .re0 {color: #0066FF;}
    { 
       FILE_NOT_FOUND_IN_DIR: "?'${1}'??????'${0}'???"    FIXME!
    }
    Here are more examples of using dojo.string.substitute(): /* GeSHi (C) 2004 - 2007 Nigel McNie (http://qbnz.com/highlighter) */ .geshifilter {font-family: monospace;} .geshifilter .imp {font-weight: bold; color: red;} .geshifilter .kw1 {color: #000066; font-weight: bold;} .geshifilter .kw2 {color: #003366; font-weight: bold;} .geshifilter .kw3 {color: #000066;} .geshifilter .co1 {color: #009900; font-style: italic;} .geshifilter .coMULTI {color: #009900; font-style: italic;} .geshifilter .es0 {color: #000099; font-weight: bold;} .geshifilter .br0 {color: #66cc66;} .geshifilter .st0 {color: #3366CC;} .geshifilter .nu0 {color: #CC0000;} .geshifilter .me1 {color: #006600;} .geshifilter .re0 {color: #0066FF;}
    dojo.require('dojo.string');
        dojo.require('dojo.number');
    // Use format function.   
    // "dojo.number.format" is a format function defined by Dojo.   
    // It uses the default locale in Dojo, as defined by the user's environment 
     console.debug(dojo.string.substitute( 
         "The number of '${1}' is '${0:dojo.number.format}'.",["saved files", "123456"]))
    // Output: The number of saved files is 123,456. 
       
    // Use named substitutions. 
     console.debug(dojo.string.substitute( 
         "The number of '${item}' is '${number:dojo.number.format}'."
         {item: "saved files", number: 123456}))
    // Output: The number of saved files is 123,456.

    Cultural Formatting and Validation

    You must use Dojo format functions to convert locale sensitive data into text

    Dojo utilizes the Unicode CLDR data to format and validate locale sensitive information, such as time, number, and currency. The built-in conversion functions in JavaScript like Date.toString() are not fully Unicode- compatible, and you should not use them in a globalized Web application. For dates and numbers, Dojo format functions are not needed to be called explicitly in your code, and the dojo.string.substitute() function mentioned above can combine the format capabilities with format functions. You only need to specify the format function name in the template string, for example: /* GeSHi (C) 2004 - 2007 Nigel McNie (http://qbnz.com/highlighter) */ .geshifilter {font-family: monospace;} .geshifilter .imp {font-weight: bold; color: red;} .geshifilter .kw1 {color: #000066; font-weight: bold;} .geshifilter .kw2 {color: #003366; font-weight: bold;} .geshifilter .kw3 {color: #000066;} .geshifilter .co1 {color: #009900; font-style: italic;} .geshifilter .coMULTI {color: #009900; font-style: italic;} .geshifilter .es0 {color: #000099; font-weight: bold;} .geshifilter .br0 {color: #66cc66;} .geshifilter .st0 {color: #3366CC;} .geshifilter .nu0 {color: #CC0000;} .geshifilter .me1 {color: #006600;} .geshifilter .re0 {color: #0066FF;}
    dojo.require('dojo.string');
        dojo.require('dojo.number');
        dojo.require('dojo.date.locale');
        var msg = "Number of processed file number before ${1:dojo.date.locale.format}:"   
            + "${0:dojo.number.format}"
        console.debug(dojo.string.substitute(msg, [123456, new Date()]);
    If you want more control over the output format and locale, you can use the specific format functions respectively. For details, refer to the API documents of the following functions:
    • dojo.number.format
    • dojo.date.locale.format

    You must use Dojo validating and parsing functions to convert text from the users' input into data

    Although JavaScript provides some methods to convert text to data, it does not fully support localized text, for example, view plaincopy to clipboardprint? /* GeSHi (C) 2004 - 2007 Nigel McNie (http://qbnz.com/highlighter) */ .geshifilter {font-family: monospace;} .geshifilter .imp {font-weight: bold; color: red;} .geshifilter .kw1 {color: #000066; font-weight: bold;} .geshifilter .kw2 {color: #003366; font-weight: bold;} .geshifilter .kw3 {color: #000066;} .geshifilter .co1 {color: #009900; font-style: italic;} .geshifilter .coMULTI {color: #009900; font-style: italic;} .geshifilter .es0 {color: #000099; font-weight: bold;} .geshifilter .br0 {color: #66cc66;} .geshifilter .st0 {color: #3366CC;} .geshifilter .nu0 {color: #CC0000;} .geshifilter .me1 {color: #006600;} .geshifilter .re0 {color: #0066FF;}
    console.debug(new Number("123456"); // output: 123456
    console.debug(new Number("123,456"); // output: NaN
    You must use Dojo's functions to handle validation and parsing. For details, refer to the API documents of the following functions:
    • dojo.number.regexp
    • dojo.number.parse
    • dojo.date.locale.regexp
    • dojo.date.locale.parse

    You should not hard-code patterns and locales when formatting data

    Usually, you only need to set the selector and the formatLength properties for date (type for number and currency for currency), and use the default values of other properties when calling formatting, validating, or parsing functions. Dojo looks for the correct pattern based on the current default locale. If you specify the pattern by hard coding, the output format cannot be changed with the user's locale.

    Generating files for cultural support in other locales

    Dojo uses the Unicode Common Locale Data Repository (CLDR) to format and parse locale-specific data. A subset of the locales supported by the CLDR is transformed to JavaScript and checked into the dojo.cldr package as resource bundles. Although only a handful of locales ship in dojo.cldr, any or all of the hundreds of locales supported by the CLDR may be generated easily by invoking a script. Dojo provides an Ant task to perform the transformation from LDML, an XML-based markup, to JSON files. The entire CLDR source is available with these tools in the util/buildscripts/cldr directory. To invoke this script, you need to check out the current Dojo source code or download the buildscripts archive file and overlay it with your Dojo build, and consult the README file in util/buildscripts/cldr for details. It is as simple as invoking 'ant' in that directory. After the script is executed, all transformed JSON files are added to the "dojo/cldr/nls" directory.

    Dijit Widgets

    All Dijit Widgets are implemented according to these Globalization guidelines except for right-to-left bi-directional (BiDi) support, which is not complete as of the 0.9 release.

    You should not specify both the height and the width of a widget to be translated by numeric units.
    You must ensure that all resources used in widgets are localizable.
    You should consider BiDi support in development and customization.

    Translation Spread

    Should not specify both the height and the width of a widget to be translated by numeric units

    When the text content in a widget is translated into another language, the translated string might be longer than the original one, i.e., the one in English. If you specify the size of the widget by numeric units, the widget may not render properly in other languages. For example, some text might be truncated or unexpected text wrapping may distort the layout. Therefore, keeping the translation spread in mind at the beginning of development might be a better practice. You can use the following approaches to set the size of a widget:

    • Use percentage rather than numeric units.
    • Set only one of the dimensions (height or width) in numeric and leave the other as auto. You can also use the white-space style to control the text-wrapping style. This can make the widget expand on the right.

    Using BiDi Support

    Eventually, all Dijit widgets will be consistent with the direction of the HTML document. [TODO: the following text needs to be changed; I think we're only checking for the .dijitRtl class which is based on the document's direction -AP] By default, the display direction is inherited from the parent node, and the direction of the whole page is left-to-right. There are two ways to change the display direction for an HTML node: one is to set the DIR attribute, and the other is to set the direction style. The direction style can override the setting to the DIR attribute, and the computed value of the direction style is the actual way in which the node is displayed. To make the pop-up menu mirrored, for example, you can just set DIR="rtl" on the HTML node to mirror the whole page.

    If a page contains a right-to-left layout, you must also import the RTL style sheet of the current Dojo theme and make sure that it is the last one to be imported. The RTL style sheet has no effect on the left-to-right layout.
    view plaincopy to clipboardprint?

    <style>
    @import "../../../dojo/resources/dojo.css";
    @import "../../themes/tundra/tundra.css";
    @import "../../themes/tundra/tundra_rtl.css";
    </style>

    If the whole page is in a right-to-left layout, you only need to import the RTL style sheet. To make the whole page right-to-left, you can set DIR="rtl" on either the HTML node or the BODY node.

    If you want to mix both left-to-right and right-to-left layouts in a page, you must add a dijitRtl class to each right-to-left layout container besides importing the RTL style sheet.
    view plaincopy to clipboardprint?

    <div dir="rtl" class="dijitRtl">
    <div dojoType="dijit.Tree">
    ...
    </div>
    </div>

    Using Localized Widgets

    Benefited from the CLDR support of the Dojo core library, Dijit provides plenty of localized widgets that are related to date, number, and currency data processing. You can find their introduction and manual in the corresponding chapters in the Dojo Book 0.9. These widgets include:

    • dijit.form.NumberTextBox
    • dijit.form.CurrencyTextBox
    • dijit.form.TimeTextBox
    • dijit.form.DateTextBox

    Widgets Development and Customization

    You must ensure that all resources used in widgets are localizable.

    The resources that are used in widgets include texts, images, audio and video clips, etc. Only HTML code and style names can be hard-coded in the widget code. You can use the resource bundles in Dojo to store localized resources. See "Locale and Resource Bundle" for more information.

    Should consider BiDi support in development and customization .

    Most browsers and HTML itself support the mirrored display in the BiDi environment, but it does not mean that Dijit widgets can support BiDi by nature. There still are some specific codes in Dijit to handle the mirror feature. For example, when displayed in right-to-left direction, the popup menu should appear from left rather than from right, and the arrow that indicates sub menus should also be flipped.

    [TODO: strike this para? I think dir is gone? -AP]
    Dijit widgets have a dir attribute that is defined in the dijit._Widget class, the base class of all widgets. The dir attribute can be set to indicate the direction of the widget when it is being created. To know the actual direction in the code, you should use the _Widget.isLeftToRight() function. The returned value of this function is not supposed to be changed after the widget is created. That is, currently, Dijit widgets do not support dynamic change of the display direction.

    To develop or customize the Dijit widget with BiDi enabled, you should use different styles for the left-to-right and right-to-left directions. For example, it is always required to change the styles like float: left and background-image: url(images/left-arrow.png) for the right-to-left layout for some portion of a widget. You can write all right-to-left styles in an RTL style sheet, and all style classes should begin with the dijitRtl class:
    view plaincopy to clipboardprint?

    .dijitRtl .TreeContainer .TreeExpando {
    float:right;
    }

    Better still, try to avoid any gestures or visual cues which imply left or right direction, so that styling and coding for BiDi becomes unnecessary. For example, dijit.Tree uses the space key to open a tree instead of a left or right arrow. A "+" symbol might be used for an indicator for extended data, rather than a left or right arrow.