gnu gettext and correct locale.


I could not get translated string with Gnu gettext sometimes. I read gettext info document. Actually I used gnu gettext library on php language. I checked php document also. By the way, gettext on php is a wrapper function for c api gnu gettext library. I did not understand why gettext did not work well.

The program to study gettext.

Here is the directory structure about program to study gettext library.

bin
|-- i18n
|   |-- en
|   |   `-- LC_MESSAGES
|   |       `-- messages.mo
|   |-- en_US
|   |   `-- LC_MESSAGES
|   |       `-- messages.mo
|   |-- ja
|   |   `-- LC_MESSAGES
|   |       `-- messages.mo
|   `-- ja_JP
|       `-- LC_MESSAGES
|           `-- messages.mo
`-- i18n-msg

You can get this software from here.

This program print a message to console simply. If you run it with
-l ja_JP -c UTF-8 option, you will have Japanese translated message. But
if you run with either following options, you will not have Japanese translated message. I wonder about gettext library not working with some options.

  • -l ja
  • -l ja -c UTF-8
  • -l ja_JP
  • -l en

Here is procedures call sequence about gettext.

/* locale is the command option you specified. */
setloale(LC_MESSAGES, locale);

/* domain_dir is "absolute path to i18n directory" */
bindtextdomain("messages", domain_dir);

/* codeset is the command option you specified. */
bind_textdomain_codeset("messages", codeset);

You can not call setlocale successfully

When you run the program with some of above options, you can not set locale.

$> bin/i18n-msg -l ja
failed to bind text domain with ja

As you see above, locale ja exits. but setlocale was failed.

You get unexpected string with some options.

When you run the program with following option, you get unexpected string.

$> bin/i18n-msg -l ja_JP
gettext????????????

I had message files with UTF-8 and my environment LC_CTYPE is ‘en.UTF-8’. You could understand by reading man 7 locale that LC_CTYPE is used to display string.

To call setlocale successfully

I read some man pages about locale. I knew that I have to have the locale before I call setlocale. My linux operating system has following locales.

$> locale -a
aa_DJ
aa_DJ.iso88591
aa_DJ.utf8en_US
...
en_US.iso88591
en_US.iso885915
en_US.utf8
en_ZA
...
ja_JP
ja_JP.eucjp
ja_JP.ujis
ja_JP.utf8
japanese
japanese.euc
...
zu_ZA
zu_ZA.iso88591
zu_ZA.utf8

You can not see ‘ja’ in above list. That is the reason why I can not call setlocale successfully.

To call setlocale(“ja”) successfully

To call setlocale(“ja”) successfully, you have to “ja” locale. You can have “ja” locale by run localedef command. locale is a directory structure. each files under the directory contain binary data for locale related functions. Default locale data reside in a system directory. It is not easy to have new custom locale. I had a new “ja” locale under my account directory and set environment value ‘LOCPATH’ to be get “ja” locale.

Here is the way to generate “ja” locale in my linux system.

$> localedef -i ja_JP -f UTF-8 locales/ja

Here is my way to call setlocale successfully.

    char* locale_res;
    int result;
    locale_res = setlocale(LC_MESSAGES, locale); 
    if (!locale_res) {
        char* exec_dir;
        exec_dir = get_executable_dir();
        if (exec_dir) {
            size_t exec_dir_len;
            size_t locales_dir_len;
            size_t loc_path_buffer_size;
            const char* locales_dir = "locales";
            char* loc_path_buffer;
            char* saved_loc_path;
            locales_dir_len = strlen(locales_dir);
            exec_dir_len  = strlen(exec_dir);
            loc_path_buffer_size = exec_dir_len;
            loc_path_buffer_size += locales_dir_len;
            loc_path_buffer_size += 1;
            loc_path_buffer = (char*)malloc(loc_path_buffer_size);
            /* save old locpath environment */
            saved_loc_path = getenv("LOCPATH");
            if (loc_path_buffer) {
                snprintf(loc_path_buffer, loc_path_buffer_size,
                   "%s%s", exec_dir, locales_dir);
                /* loc_path_buffer is point "locales" directory in a directory */
                /* which contains executable */    
                setenv("LOCPATH", loc_path_buffer, 1);
            }
            /* Now you will call setlocale successfully. */
            locale_res = setlocale(LC_MESSAGES, locale); 
            result = locale_res ? 0 : -1; 
            if (saved_loc_path) {
                /* restore LOCPATH*/
                setenv("LOCPATH", saved_loc_path, 1);
            } 

            if (saved_loc_path) {
                free_str(saved_loc_path);
            }

            if (loc_path_buffer) {
                free_str(loc_path_buffer);
            }
        }
        if (exec_dir) {
            free_str(exec_dir);
        }    
    } else {
        result = 0;
    }
    return result;

To get expected string from gettext.

To get expected string from gettext, You have to call setlocale(LC_CTYPE, <locale.codeset>) or bind_textdomain_codeset(<domain>, <codeset>).

To call setlocale(LC_CTYPE, <locale.codeset>)

I read gettext procedure source code. I knew that the returned string is converted by iconv to a codeset which is specified by you. If a message is not bound a codeset, gettext use nl_langinfo(CODESET). nl_langinfo(CODESET) is set by setlocale(LC_CTYPE, <locale.codeset>) in the end. In according to man 3 setlocale, The linux program’s locale is “C” at startup. The environment variable may have another locale setting. The locale related environment is not effect your program at startup.

Here is my environment setting about locale.

LANG=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

If you have locale “C”, nl_langinfo(CODESET) return “ANSI_X3.4-1968”. This is the reason why you get unexpected string.

To call bind_textdomain_codeset

It is straight forward way to call bind_textdomain_codeset. You would understand what the function do on man bind_textdomain_codeset. I wrote it little. It bind domain to a codeset you want to use for iconv output parameter.

Conclusion

To use gettext utilities, you need to consider about followings.

  1. Use localedef and set LOCPATH environment value if you can not find a locale you want to use.
  2. Call setlocale(LC_CTYPE, <locale.codeset>) or bind_textdomain_codeset(<domain>, <codeset>) to get expected output with gettext.