February 13, 2007

Linux Breaks Through the Language Barrier

This article was published in the "Linux For You" magazine in January, 2005.

Localization or l10n as it is popularly known , involves translating the user interface-menus, dialogs, help, documentation, etc to your native language. This article will equip you with the basics and help you in getting started with the enormous task ahead.

Ever dreamt of using an application having user interface in your native language? If yes, then read on because this article will help you getting started in accomplishing your dream. A lot of people argue that Linux can have a very bright future in India if it is translated to various Indian languages. This is due to the fact that India is a vast country where different states have different cultures and people speak different languages.

There are a lot of language teams working on l10n (localization) of OSS (Open Source Software). They are providing local language support for applications by translating the UI (User Interface) i.e. dialogs, menus, help, documentation etc. to various Indian languages.

Various tools are available for translation of Linux applications. For an overview of different translation tools, check out the December 2003 issue of LFY (Page no. 70). Here we will focus on Hindi translation using KBabel which is one of the most powerful translation tools available.

Prerequisites
Before getting started you'll need Hindi language support in your distro. For this download the latest IndLinux package from http://www.indlinux.org and plug-in the following commands in a Linux terminal:

# tar -zxvf indlinux-*.tar.gz

Now, change (cd) into the newly created directory and run the installer using the following command:

# ./install.sh

Scan the README file for instructions on how to add Hindi language option to the login screen.

Since KBabel requires fonts and keymaps support from the native system, you need to set up the Hindi keymap before you can type in Hindi. For this right click on the panel and add keyboard layout switcher to it. Right click on the keyboard layout switcher and add the Hindi keymap. Now you can switch between keymaps by simply clicking on the map icon in the panel.

To see any application in Hindi interface you either need to login with Hindi support or you can set the global LANG environment variable as follows:

# export LANG=hi_IN.UTF-8

Now you can run any application such as gedit, epiphany etc. in Hindi interfaces from the same terminal.

Figure 1: Epiphany in Hindi Interface

One of the problems with Hindi translation that you can clearly see is the misalignment of shirorekha (continuous line above characters). To fix this, add the following lines to /etc/fonts/fonts.conf:






replace Raghindi with your devanagari font.

PO File Basics

Almost all the applications that run under Linux use the GNU gettext framework. All the text that is visible to the user through dialogs, menus, help, documentation etc. is stored in message catalogs called PO (Portable Object) files, which are translated. These PO files are then compiled into MO (Machine Object) files. Whenever an application executes it looks out for a MO file for that language. If the file is available it is displayed in that language otherwise it is displayed in English.

Figure 2: PO File Example

Before beginning translation, you need a POT (Portable Object Template) file which contains the text that is displayed in the interface of any application along with its corresponding translation. Generally POT files can be found in the source of any package. If not, you can create your own POT file by using the following command:

# xgettext -o

l10n links
http://i18n.kde.org/tools/kbabel
http://www.indlinux.org
http://www.indictrans.org
http://www.ankurbangla.org
http://hi.openoffice.org
http://www.cdacindia.com
http://www.ncb.ernet.in/projects/indix
http://translate.sourceforge.net
http://l10n-status.gnome.org
http://i18n.kde.org

Check out the man page of xgettext command for complete description about input/output files.

A POT file consists of 2 parts:

  • POT header
  • msgid-msgstr pairs

POT header contains name and version of the package being translated, translator details, POT file creation and revision date, language team details and encoding scheme used.

msgid variable contains the original English string that appears in the UI of an application and msgstr contains the corresponding translated string. You need to read the original string from msgid and type the corresponding translated string in Hindi. As simple as that!

Using KBabel for translation

KBabel is one of the most powerful translation tools available. KBabel comes with a standard KDE installation. But if you are a geeky one you can download the latest version from http://www.kde.org. It has a host of advanced features like full navigation capabilities, statistics function, spell checking, dictionary functionality, plural form support, automatic headers update, UTF-8 support, and a rough translation tool that can translate your files roughly by using a translation database. You can add items to the translation database by using already translated PO files. KBabel also includes validation of translated strings, syntax check etc. You can also search for a particular word by selecting the word and clicking search, KBabel displays all the possible translations for that word.

Tip: You need to fill-up the translation database before beginning translation for quick and consistent reference. For that you'll need PO files of already translated applications. Just visit http://www.gnomebangalore.org and request a free CD that consists of source code of GNOME, OpenOffice and Mono etc.

KBabel has a very good catalog manager which can be used for keeping track of translation of large projects like OpenOffice, GNOME etc. Just give the path of the top-level directory where you have put your translations. It shows the status of all the files of that project (translated, untranslated or fuzzy). Just click an untranslated file and start translating!

KBabelDict allows you to translate any text using KBabel capabilities for automated translation. The KBabel suite will help you to translate quickly and also to keep translations consistent. For convenience, KBabel window is divided into 4 parts:

  • The upper-left edit box is read-only and contains the current msgid field from the opened PO file and its English text.
  • The bottom-left edit box contains the msgstr field related to the msgid shown and here you can edit the translated text.
  • The top-right part of the window is a comments panel where you can view the comments added for the entry currently being edited.
  • The bottom-right part of the window is the editor window. It has two tabbed panels-one storing search information, the other context information.
Figure 3: KBabel GUI Interface



Read the KBabel Handbook for more details.

Table: KBabel keyboard shortcuts

KeyDescription
Page UpMove to previous message
Page Down Move to next message
Ctrl+Page UpMove to previous fuzzy message
Ctrl+Page Down Move to next fuzzy message
Alt+Page UpMove to previous untranslated message
Alt+Page Down Move to next untranslated message
Shift+Page UpMove to previous error message
Shift+Page Down Move to next error message
Ctrl+Shift+Page Up Move to previous fuzzy or untranslated message
Ctrl+Shift+Page Down Move to next fuzzy or untranslated message

Below is a step-by-step process of translating your first file:

  1. Open KBabel, choose settings>configure KBabel and enter your details.
  2. Open settings>Translation Database and add PO files of already translated packages to the database.
  3. Open an untranslated file and click settings>Rough Translation.
  4. Now you only need to translate those strings that it fails to find in the database.
  5. Change the keymap to Hindi, read the untranslated string (msgid) and write the translated string (msgstr) in Hindi.
  6. Once you are done with the file do a spell check if you have one for Hindi, syntax check etc.
  7. The header will be automatically updated once you save the file with .po extension.

Running the Application

Once you have translated all the files of an application you need to build a PO compendium i.e. a PO file containing all translations extracted from a set of PO files. You can use gettext utilities like msgcat and msgmerge for this purpose. Since an application requires MO files at runtime, you'll need to compile PO file to MO file using the following command:

# msgfmt -o

For Hindi language copy this MO file to /usr/share/locale/hi/LC_MESSAGES:

# cp /usr/share/locale/hi/LC_MESSAGES/

To test an application set the global LANG variable to Hindi:

# export LANG=hi_IN.UTF-8

Now, run the application from the same terminal and you'll be able to see the application in Hindi interface.

So, now you have localized your first application, but wait, this is not even the tip of the iceberg. There are thousands of lines that need to be translated for each application and each Linux distro ships-in with thousands of applications. So, you can imagine what a mammoth task this is.

In the beginning try translating small files and once you become comfortable with it move onto larger ones.

Contributing

If you are keen to jump into the world of translations, it would be advisable to join one of the translation groups that are already doing the job rather than doing it alone. You can also subscribe to the mailing lists to keep in touch with other members of the group, take and offer help, know the status of translations, share files etc.

So, get going and use your translation skills for the benefit of non-English speaking open source community.

No comments: