Categories
SXSW '08

SXSW2008 notes – Client-Side Code and Internationalization

Monday, 10 March 2008 – 3:30PM
Abstract:
This presentation will cover tips, techniques, best practices, and gotchas for designers working with XHTML, CSS, and JavaScript in multiple languages. Special attention will be given to right-to-left and bidirectional content. XHTML + CSS + JS + UTF8 + LTR + RTL = client-side i18n fun
Jon Wiley – User Experience […]

Monday, 10 March 2008 - 3:30PM

Abstract:
This presentation will cover tips, techniques, best practices, and gotchas for designers working with XHTML, CSS, and JavaScript in multiple languages. Special attention will be given to right-to-left and bidirectional content. XHTML + CSS + JS + UTF8 + LTR + RTL = client-side i18n fun

Jon Wiley - User Experience Designer, Google

Google is localized into 117 languages.

Does internationalization = translation? No.

It is enabling your product for localisation. Adaptation of product’s cultural content (including language).

globalization: carries too much baggage to be precise in this context.

translaiton is not transliteration.

Localization is more than translation
- local content
- legal compliance
- marketing is culturally dependent
- keyboards
- currency formats
- date formats
- cultural appropriateness

This session will not focus on those - only markup.

Character encoding
- In the beginning, there was ASCII and it was limited to our character set.
- Unicode attempts to bring all language character set together.
- UTF-8 is what we want to use (vs. UTF-16 or UTF 32) because it is backwards compatible with ASCII. ASCII is a subset of UTF-8

Since there are thousands of Unicode characters then no font has all of them. Test everything.

Possible to present a mix of scripts at one time.

Another advantage of UTF-8 is that it is smaller than UTF-16 except for CJK (Chinese-Japanese-Korean) languages.

No need to use character escapes since special characters are just in UTF except…
reserved chars: > < &
hard to see characters  

Telling the browser what to do in the content-type, meta or css. Priority is given to the meta.

Specifying a languages
You really want to serve in the right language since they need to know how to pronounce the words.
Speciifying a lang does NOTHING to specify encoding and direction.
<html lang="en" xml:lang="en"...
Again, meta and repsonse header can be used, but the html header is best.

Direction:
LTR = left to right text
RTL = right to left
bidi = bidirectional (e.g., Hebrew and English in the same document)

Logical order in the source

visual hebrew: literally coded bacwards

Scripts have a default direction and need not be specified
markup is LTR, numbers are always LTR
spaces and punctuation are inherently directionally neutral and they inherit the surrounding script. exception: pucntuation inbetween two scripts and then it defaults to document spec. This can be handled in markup

Avoid changing direction in CSS because direction is not primarily a layout aspect.

Text expansion
English is a compact language. Small words form English can expand easily to 200%-300% in other languages. This hits the hardest in nav tabs and areas designed to be snug.
Use 40% for a rule of thumb.
Some languages eliminate spaces (e.g., German) and this causes wordwrap issues
Some scripts have wider or taller characters.

Whatch out for abbreviations
they are not as common in other languages

Tools
Google translation service (translate.google.com) can be used for machine checks for text expansion. Do not use this for production because you will get crap.
CSS Janus - script for flipping CSS-based layouts. Table-based layouts do not have this problem. However, this is more difficult in CSS. http://cssjanus.commoner.com/

Javascript
Embedded text will render as written regardless of of direction.

IE actually flips the scrollbar to the other side. FF does not.

There needs to be a bidi acid test.