Pango: Text handling for the World

July 27, 2001
Forrest Cook

Owen Taylor of Red Hat Labs gave a talk on the Pango library. In brief, Pango provides international language support for GTK. Pango provides textual display and editing functions for a large number of the commonly used written languages. Written languages differ in many ways. English and its associated ASCII encoding represent one of the simplest of the written language systems found on computers. There is a one-to-one representation between ASCII characters and displayed characters, and there are no compound characters.

To represent the many languages that are found in the world, a number of complexities must be dealt with. Some languages are written right to left and others are written left to right. Vertical writing is also a consideration, although it is not one that is currently being dealt with. Languages such as Arabic and Hindi have characters that are built from smaller units, or glyphs. The glyphs are represented on the computer as several distinct characters, but are displayed as a single entity. The glyphs differ if they occur at the beginning of a line. Furthermore, complicated rules are required to determine where to break lines and paragraphs, these rules vary from language to language.

Text boundaries represent where to break words. In English, there is white space between words. Other languages don't necessarily use white space. Sentences and paragraphs are also considered to be boundaries, and the rules vary from one language to another. When applying high quality text formatting to English text such as kerning, ligatures, and hyphenation, special rules also need to be applied.

The Pango library's function is to manage all of these language complexities while presenting a single interface to the programmer. Display and editing of international text is the main focus of Pango.

Pango solves many of the aforementioned language representation problems. The basic Pango core is designed to be small. Each language is represented by a seperate module. This cuts down on memory usage, since an Arabic user may not want to load a Japanese language module. Pango rendering is system independent, it is possible to use it on the X window system and Win32, and printers as well.

Pango is structured into several pieces: fundamental types, rendering libraries, language modules, and shape modules. Fundamental types provide public interfaces and driver routines. Rendering libraries provide for backend connectivity such as to the X window system and printers. Language modules contain the specifics for dealing with each of the various supported languages. Shape modules provide functionality for different font styles such as italics.

Under the hood, Pango has the job of breaking input text into runs of characters. Attributes such as color and size are applied to the text. Boundary resolution is performed to calculate line breaks. Shaping is done to deal with context variation and direction. Line breaking is then performed to split the text into individual lines.

GTK2 is the primary user of Pango. Pango is not drectly tied to GTK+, but all GTK+ text rendering is done in Pango. The use of Pango benefits GTK+ by adding support for internationalization. Short font names are another benefit. Pango allows the font abstractions to be moved away from the toolkit. With Pango, it is possible to write applications that work in many languages. Widget labels and text editing boxes automatically deal with the bidirectional character sets. An interesting demo was given involving a text entry box that contained English and Arabic words. As the cursor was moved across the line from the one language to the other, it would change direction and starting position.

Future plans for Pango include support for high quality printing, better hyphenation support, full justification, and the ability to handle vertical text such as Chinese and Japanese. Better support is being planned for non X window system backends. Pango seems like an ideal choice for adding international text to embedded Linux applications.