{"id":911,"date":"2021-01-17T08:21:34","date_gmt":"2021-01-17T08:21:34","guid":{"rendered":"https:\/\/oc-soft.net\/?p=911"},"modified":"2022-04-26T10:12:07","modified_gmt":"2022-04-26T10:12:07","slug":"gnu-gettext-and-correct-locale","status":"publish","type":"post","link":"https:\/\/oc-soft.net\/en\/gnu-gettext-and-correct-locale\/","title":{"rendered":"gnu gettext and correct locale."},"content":{"rendered":"\n<p>I could not get translated string with Gnu gettext sometimes. I read gettext info document. Actually I used gnu gettext library on php language. I checked php document also. By the way, gettext on php is a wrapper function for c api gnu gettext library. I did not understand why gettext did not work well.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The program to study gettext.<\/h3>\n\n\n\n<p>Here is the directory structure about program to study gettext library.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>bin\n|-- i18n\n|   |-- en\n|   |   `-- LC_MESSAGES\n|   |       `-- messages.mo\n|   |-- en_US\n|   |   `-- LC_MESSAGES\n|   |       `-- messages.mo\n|   |-- ja\n|   |   `-- LC_MESSAGES\n|   |       `-- messages.mo\n|   `-- ja_JP\n|       `-- LC_MESSAGES\n|           `-- messages.mo\n`-- i18n-msg<\/code><\/pre>\n\n\n\n<p>You can get this software from <a href=\"https:\/\/github.com\/oc-soft\/i18n-t-0.git\">here<\/a>.<\/p>\n\n\n\n<p>This program print a message to console simply. If you run it with<br><code>-l ja_JP -c UTF-8<\/code> option, you will have Japanese translated message. But<br>if you run with either following options, you will not have Japanese translated message. I wonder about gettext library not working with some options.<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><code>-l ja<\/code><\/li><li><code>-l ja -c UTF-8<\/code><\/li><li><code>-l ja_JP<\/code><\/li><li><code>-l en<\/code><\/li><\/ul>\n\n\n\n<p>Here is procedures call sequence about gettext.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/* locale is the command option you specified. *\/\nsetloale(LC_MESSAGES, locale);\n\n\/* domain_dir is \"absolute path to i18n directory\" *\/\nbindtextdomain(\"messages\", domain_dir);\n\n\/* codeset is the command option you specified. *\/\nbind_textdomain_codeset(\"messages\", codeset);<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">You can not call setlocale successfully<\/h3>\n\n\n\n<p>When you run the program with some of above options, you can not set locale.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$&gt; bin\/i18n-msg -l ja\nfailed to bind text domain with ja<\/code><\/pre>\n\n\n\n<p>As you see above, locale ja exits. but setlocale was failed.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">You get unexpected string with some options.<\/h3>\n\n\n\n<p>When you run the program with following option, you get unexpected string.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$&gt; bin\/i18n-msg -l ja_JP\ngettext????????????<\/code><\/pre>\n\n\n\n<p>I had message files with UTF-8 and my environment LC_CTYPE is &#8216;en.UTF-8&#8217;. You could understand by reading man 7 locale that LC_CTYPE is used to display string.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">To call setlocale successfully<\/h3>\n\n\n\n<p>I read some man pages about locale. I knew that I have to have the locale before I call setlocale. My linux operating system has following locales. <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$> locale -a\naa_DJ\naa_DJ.iso88591\naa_DJ.utf8en_US\n...\nen_US.iso88591\nen_US.iso885915\nen_US.utf8\nen_ZA\n...\nja_JP\nja_JP.eucjp\nja_JP.ujis\nja_JP.utf8\njapanese\njapanese.euc\n...\nzu_ZA\nzu_ZA.iso88591\nzu_ZA.utf8<\/code><\/pre>\n\n\n\n<p>You can not see &#8216;ja&#8217; in above list. That is the reason why I can not call setlocale successfully.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">To call setlocale(&#8220;ja&#8221;) successfully<\/h4>\n\n\n\n<p>To call setlocale(&#8220;ja&#8221;) successfully, you have to &#8220;ja&#8221; locale. You can have &#8220;ja&#8221; locale by run localedef command. locale is a directory structure. each files under the directory contain binary data for locale related functions. Default locale data reside in a system directory.  It is not easy to have new custom locale. I had a new &#8220;ja&#8221; locale under my account directory and set environment value &#8216;LOCPATH&#8217; to be get &#8220;ja&#8221; locale. <\/p>\n\n\n\n<p>Here is the way to generate &#8220;ja&#8221; locale in my linux system.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$&gt; localedef -i ja_JP -f UTF-8 locales\/ja<\/code><\/pre>\n\n\n\n<p>Here is my way to call setlocale successfully.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>    char* locale_res;\n    int result;\n    locale_res = setlocale(LC_MESSAGES, locale); \n    if (!locale_res) {\n        char* exec_dir;\n        exec_dir = get_executable_dir();\n        if (exec_dir) {\n            size_t exec_dir_len;\n            size_t locales_dir_len;\n            size_t loc_path_buffer_size;\n            const char* locales_dir = \"locales\";\n            char* loc_path_buffer;\n            char* saved_loc_path;\n            locales_dir_len = strlen(locales_dir);\n            exec_dir_len  = strlen(exec_dir);\n            loc_path_buffer_size = exec_dir_len;\n            loc_path_buffer_size += locales_dir_len;\n            loc_path_buffer_size += 1;\n            loc_path_buffer = (char*)malloc(loc_path_buffer_size);\n            \/* save old locpath environment *\/\n            saved_loc_path = getenv(\"LOCPATH\");\n            if (loc_path_buffer) {\n                snprintf(loc_path_buffer, loc_path_buffer_size,\n                   \"%s%s\", exec_dir, locales_dir);\n                \/* loc_path_buffer is point \"locales\" directory in a directory *\/\n                \/* which contains executable *\/    \n                setenv(\"LOCPATH\", loc_path_buffer, 1);\n            }\n            \/* Now you will call setlocale successfully. *\/\n            locale_res = setlocale(LC_MESSAGES, locale); \n            result = locale_res ? 0 : -1; \n            if (saved_loc_path) {\n                \/* restore LOCPATH*\/\n                setenv(\"LOCPATH\", saved_loc_path, 1);\n            } \n\n            if (saved_loc_path) {\n                free_str(saved_loc_path);\n            }\n\n            if (loc_path_buffer) {\n                free_str(loc_path_buffer);\n            }\n        }\n        if (exec_dir) {\n            free_str(exec_dir);\n        }    \n    } else {\n        result = 0;\n    }\n    return result;\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">To get expected string from gettext.<\/h3>\n\n\n\n<p>To get expected string from gettext, You have to call setlocale(LC_CTYPE, &lt;locale.codeset>) or bind_textdomain_codeset(&lt;domain>, &lt;codeset>).<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">To call setlocale(LC_CTYPE, &lt;locale.codeset>)<\/h4>\n\n\n\n<p> I read gettext procedure source code. I knew that the returned string is converted by iconv to a codeset which is specified by you. If a message is not bound a codeset, gettext use nl_langinfo(CODESET). nl_langinfo(CODESET) is set by setlocale(LC_CTYPE, &lt;locale.codeset>) in the end. In according to man 3 setlocale, The linux program&#8217;s locale is &#8220;C&#8221; at startup. The environment variable may have another locale setting.  The locale related environment is not effect your program at startup.<\/p>\n\n\n\n<p>Here is my environment setting about locale.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>LANG=en_US.UTF-8\nLC_CTYPE=en_US.UTF-8\nLC_NUMERIC=\"en_US.UTF-8\"\nLC_TIME=\"en_US.UTF-8\"\nLC_COLLATE=\"en_US.UTF-8\"\nLC_MONETARY=\"en_US.UTF-8\"\nLC_MESSAGES=\"en_US.UTF-8\"\nLC_PAPER=\"en_US.UTF-8\"\nLC_NAME=\"en_US.UTF-8\"\nLC_ADDRESS=\"en_US.UTF-8\"\nLC_TELEPHONE=\"en_US.UTF-8\"\nLC_MEASUREMENT=\"en_US.UTF-8\"\nLC_IDENTIFICATION=\"en_US.UTF-8\"\nLC_ALL=<\/code><\/pre>\n\n\n\n<p>If you have locale &#8220;C&#8221;, nl_langinfo(CODESET) return &#8220;ANSI_X3.4-1968&#8221;. This is the reason why you get unexpected string. <\/p>\n\n\n\n<h4 class=\"wp-block-heading\">To call bind_textdomain_codeset<\/h4>\n\n\n\n<p>It is straight forward way to call bind_textdomain_codeset. You would understand what the function do on man bind_textdomain_codeset. I wrote it little. It bind domain to a codeset you want to use for iconv output parameter. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>To use gettext utilities, you need to consider about followings.<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>Use localedef and set LOCPATH environment value if you can not find a locale you want to use.<\/li><li>Call setlocale(LC_CTYPE, &lt;locale.codeset>) or bind_textdomain_codeset(&lt;domain>, &lt;codeset>) to get expected output with gettext. <\/li><\/ol>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I could not get translated string with Gnu gettext sometimes. I read gettext info document. Actually I used gnu gettext library on php language. I checked php document also. By the way, gettext on php is a wrapper function for c api gnu gettext library. I did not understand why gettext did not work well. [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":924,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_locale":"en_US","_original_post":"https:\/\/oc-soft.net\/?p=911","footnotes":""},"categories":[1],"tags":[],"class_list":["post-911","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized","en-US"],"_links":{"self":[{"href":"https:\/\/oc-soft.net\/wp-json\/wp\/v2\/posts\/911","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/oc-soft.net\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/oc-soft.net\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/oc-soft.net\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/oc-soft.net\/wp-json\/wp\/v2\/comments?post=911"}],"version-history":[{"count":14,"href":"https:\/\/oc-soft.net\/wp-json\/wp\/v2\/posts\/911\/revisions"}],"predecessor-version":[{"id":930,"href":"https:\/\/oc-soft.net\/wp-json\/wp\/v2\/posts\/911\/revisions\/930"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/oc-soft.net\/wp-json\/wp\/v2\/media\/924"}],"wp:attachment":[{"href":"https:\/\/oc-soft.net\/wp-json\/wp\/v2\/media?parent=911"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/oc-soft.net\/wp-json\/wp\/v2\/categories?post=911"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/oc-soft.net\/wp-json\/wp\/v2\/tags?post=911"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}