Canonical Voices

Posts tagged with 'i18n'

David Planella

As part of the Ubuntu App Developer Week, I just ran a live on-air session on how to internationalize your Ubuntu apps. Some of the participants on the live chat asked me if I could share the slides somewhere online.

So here they are for your viewing pleasure :) If you’ve got any questions on i18n or in Ubuntu app development in general, feel free to ask in the comments or ping me (dpm) on IRC.

The video

The slides

Enjoy!

The post Internationalizing your apps at the Ubuntu App Developer Week appeared first on David Planella.

Read more
Barry Warsaw

Recently, as part of our push to ship only Python 3 on the Ubuntu 12.10 desktop, I've helped several projects update their internationalization (i18n) support.  I've seen lots of instances of suboptimal Python 2 i18n code, which leads to liberal sprinkling of cargo culted .decode() and .encode() calls simply to avoid the dreaded UnicodeErrors.  These get worse when the application or library is ported to Python 3 because then even the workarounds aren't enough to prevent nasty failures in non-ASCII environments (i.e. the non-English speaking world majority :).

Let's be honest though, the problem is not because these developers are crappy coders! In fact, far from it, the folks I've talked with are really really smart, experienced Pythonistas.  The fundamental problem is Python 2's 8-bit string type which doubles as a bytes type, and the terrible API of the built-in Python 2 gettext module, which does its utmost to sabotage your Python 2 i18n programs.  I take considerable blame for the latter, since I wrote the original version of that module.  At the time, I really didn't understand unicodes (this is probably also evident in the mess I made of the email package).  Oh, to really have access to Guido's time machine.

The good news is that we now know how to do i18n right, especially in a bilingual Python 2/3 world, and the Python 3 gettext module fixes the most egregious problems in the Python 2 version.  Hopefully this article does some measure of making up for my past sins.

Stop right here and go watch Ned Batchelder's talk from PyCon 2012 entitled Pragmatic Unicode, or How Do I Stop the Pain?  It's the single best description of the background and effective use of Unicode in Python you'll ever see.  Ned does a brilliant job of resolving all the FUD.

...

Welcome back.  Your Python application is multi-language friendly, right?  I mean, I'm as functionally monolinguistic as most Americans, but I love the diversity of languages we have in the world, and appreciate that people really want to use their desktop and applications in their native language.  Fortunately, once you know the tricks it's not that hard to write good i18n'd Python code, and there are many good FLOSS tools available for helping volunteers translate your application, such as Pootle, Launchpad translations, Translatewiki, Transifex, and Zanata.

So there really is no excuse not to i18n your Python application.  In fact, GNU Mailman has been i18n'd for many years, and pioneered the supporting code in Python's standard library, namely the gettext module.  As part of the Mailman 3 effort, I've also written a higher level library called flufl.i18n which makes it even easier to i18n your application, even in tricky multi-language contexts such as server programs, where you might need to get a German translation and a French translation in one operation, then turn around and get Japanese, Italian, and English for the next operation.

In one recent case, my colleague was having a problem with a simple command line program.  What's common about these types of applications is that you fire them up once, they run to completion then exit, and they only have to deal with one language during the entire execution of the program, specifically the language defined in the user's locale.  If you read the gettext module's documentation, you'd be inclined to do this at the very start of your application:

from gettext import gettext as _
gettext.textdomain(my_program_name)

then, you'd wrap translatable strings in code like this:

print _('Here is something I want to tell you')

What gettext does is look up the source string (i.e. the argument to the underscore function) in a translation catalog, returning the text in the appropriate language, which will then be printed.  There are some additional details regarding i18n that I won't go into here.  If you're curious, ask in the comments, and I'll try to fill things in.

Anyway, if you do write the above code, you'll be in for a heap of trouble, as my colleague soon found out.  Just running his program with --help in a French locale, he was getting the dreaded UnicodeEncodeError:

"UnicodeEncodeError: 'ascii' codec can't encode character"

I've also seen reports of such errors when trying to send translated strings to a log file (a practice which I generally discourage, since I think log messages usually shouldn't be translated).  In any case, I'm here to tell you why the above "obvious" code is wrong, and what you should do instead.

First, why is that code wrong, and why does it lead to the UnicodeEncodeErrors?  What might not be obvious from the Python 2 gettext documentation is that gettext.gettext() always returns 8-bit strings (a.k.a. byte strings in Python 3 terminology), and these 8-bit strings are encoded with the charset defined in the language's catalog file.

It's always best practice in Python to deal with human readable text using unicodes.  This is traditionally more problematic in Python 2, where English programs can cheat and use 8-bit strings and usually not crash, since their character range is compatible with ASCII and you only ever print to English locales.  As soon as your French friend uses your program though, you're probably going to run into trouble.  By using unicodes everywhere, you can generally avoid such problems, and in fact it will make your life much easier when you eventually switch to Python 3.

So the 8-bit strings that gettext.gettext() hands you have already sunk you, and to avoid the pain, you'd want to convert them back to unicodes before you use them in any way.  However, converting to unicodes makes the i18n APIs much less convenient, so no one does it until there's way too much broken code to fix.

What you really want in Python 2 is something like this:

from gettext import ugettext as _

which you'd think you should be able to do, the "u" prefix meaning "give me unicode".  But for reasons I can only describe as based on our misunderstandings of unicode and i18n at the time, you can't actually do that, because ugettext() is not exposed as a module-level function.  It is available in the class-based API, but that's a more advanced API that again almost no one uses.  Sadly, it's too late to fix this in Python 2.  The good news is that in Python 3 it is fixed, not by exposing ugettext(), but by changing the most commonly used gettext module APIs to return unicode strings directly, as it always should have done.  In Python 3, the obvious code just works:

from gettext import gettext as _

What can you do in Python 2 then?  Here's what you should use instead of the two lines of code at the beginning of this article:

_ = gettext.translation(my_program_name).ugettext

and now you can wrap all your translatable strings in _('Foo') and it should Just Work.

Perhaps more usefully, you can use the gettext.install() function to put _() into the built-in namespace, so that all your other code can just use that function without doing anything special.  Again, though we have to work around the boneheaded Python 2 API.  Here's how to write code which works correctly in both Python 2 and Python 3.

import sys, gettext
kwargs = {}
if sys.version_info[0] < 3:
    # In Python 2, ensure that the _() that gets installed into built-ins
    # always returns unicodes.  This matches the default behavior under Python
    # 3, although that keyword argument is not present in the Python 3 API.
    kwargs['unicode'] = True
gettext.install(my_program_name, **kwargs)

Or you can use the flufl.i18n API, which always uses returns unicode strings in both Python 2 and Python 3.

Also interesting was that I could never reproduce the crash when ssh'd into the French locale VM. It would only crash for me when I was logged into a terminal on the VM's graphical desktop.  The only difference between the two that I could tell was that in the desktop's terminal, locale(8) returned French values (e.g. fr_FR.UTF-8) for everything, but in the ssh console, it returned the French values for everything except the LC_CTYPE environment variable.  For the life of me, I could not get LC_CTYPE set to anything other than en_US.UTF-8 in the ssh context, so the reproducible test case would just return the English text, and not crash.  This happened even if I explicitly set that environment variable either as a separate export command in the shell, or as a prefix to the normally crashing command.  Maybe there's something in ssh that causes this, but I couldn't find it.

One last thing.  It's important to understand that Python's gettext module only handles Python strings, and other subsystems may be involved.  The classic example is GObject Introspection, the newest and recommended interface to the GNOME Object system.  If your Python-GI based project needs to translate strings too (e.g. in menus or other UI elements), you'll have to use both the gettext API for your Python strings, and set the locale for the C-based bits using locale.setlocale().  This is because Python's API does not set the locale automatically, and Python-GI exposes no other way to control the language it uses for translations.

Read more
pitti

Suppose you install Ubuntu and select a language other than English (it’s known to happen!). This will install the general and the GNOME language packs, translated LibreOffice help, and so on. Now, install a KDE package or GIMP. You’ll notice that the new application is not translated and has no help available for your language. The next time you open the language selector from control-center it would tell you that you miss some language support and offer to install it, but this has been pretty indiscoverable, and we really can do better.

Today’s language-selector upload provides an aptdaemon plugin which automatically marks corresponding language support packages (translated help, dictionaries, spell checker modules, and translations themselves) for installation for any newly installed package, for all languages that are configured on your system.

For example, I have German and English locales on my system, and no KDE packages. Before, installing GIMP got me just that:

$ aptdcon -i gimp
The following NEW package will be installed (1):
gimp

Now it automatically installs the corresponding localized help:


$ aptdcon -i gimp
The following NEW packages will be installed (4):
gimp gimp-help-common gimp-help-de gimp-help-en

I am using aptdcon here as it points out the effect better than software-center doing all this in the background, but both use aptdaemon, so the effect will be the same.

Likewise, installing the first KDE-ish package will automatically install the KDE language packs:


$ aptdcon -i kate
The following NEW packages will be installed (71):
kate kate-data [...] kdelibs5-data [...] language-pack-kde-de language-pack-kde-en [...]

This is now possible because I rewrote the check-language-support logic from scratch; the old code was very slow, hard to read and a nightmare to maintain, and also depended on a lot of data files. The new code is very fast (figuring out all missing language support packages for all installed packages for all available locales takes 8 ms on my system), and has full test coverage.

While the check-language-support program still works (I rewrote it using the new API), it is easier and probably a lot faster to just use the new API now, e. g. in our Ubiquity installer.

Say goodbye to this 2.5 year old bug!

Read more
Jonathan Riddell

Bug 83941 “bzr doesn’t speak my tongue” has been closed: bzr core can now be translated. (The qbzr and bzr-explorer guis have been internationalized for a couple of years.) If you want to help bring bzr to those who prefer to work in non-English languages please help translate at Launchpad.

The translation will involve quite a bit of specialist language (what is French for “colocated branch”?) and I expect there are strings yet that need to be added to the translation file. I also need to look at translations for plugins.  Please send issues to either the Bazaar mailing list or as bugs on bzr on Launchpad.

Philippe Lhoste wrote a while ago about the issues of translating DVCS terminology.

Read more
Martin gz

Last week was the Bazaar sprint, which was fantastic and tiring. Somehow even the people who’d been at UDS just before made it through five packed days of fixing bugs, preparing releases, and debugging package imports. We were most hospitably hosted at the Canonical offices a long way up Millbank tower. But even those who couldn’t be there in person to enjoy the view were part of the experience. At home in the Ukraine Alexander wore his Bazaar shirt in support during the first day. On IRC larstiq and santagada ran the test suite on pypy and investigated incompatibilities. And all week we had a small robot John sitting in the middle of the table on the line from the Netherlands, working on performance bugs and offering helpful advice.

There were two new faces introduced. Max has been a stalwart maintaining the ~bzr PPAs and getting daily builds working. Jonathan is joining the Bazaar team on rotation from Kubuntu, which is very exciting for fans of qbzr. He started getting to know bzrlib by taking on some bugs tagged ‘easy’ and pair programming on harder ones. It was a bit tough to keep track of everything going on, but good progress was made on the Ubuntu Distributed Development front, the translation framework branches Naoki put together were landed, and lots of pet bugs were fixed. Download bzr 2.4b3 now to see the rest of the results for yourself.

After these long days in front of screens a nice meal out was a welcome treat. Over dinner we even managed to get on to topics other than code on occasion. On Thursday evening everyone went to As You Like It at the Globe as groundlings. Even with the language barrier to overcome for some of the sprinters, the comedy lived up to the categorisation. Trying to use the cycle hire scheme to travel there and back proved more of an obstacle. The bikes themselves were fine, provided you could get past the terrible computer interface and persuade the system to let you rent them. Now, if only they took patches for that…

Read more
Martin Pool

TortoiseBzr (integration into the Microsoft Windows shell) is now being internationalized, with Japanese and Spanish translations now almost complete.

If you speak any language other than English, you can help with translation.


Read more
Martin Pool


Thanks to INADA Naoki (and friends?), Bazaar documentation is now available in Japanese.

Read more
Martin Pool


The Bazaar Explorer GUI is now available in 11 human languages (and partially translated into a few more) including Algis Kaballa’s recent translation into Lithuanian.

20091110-explorer-lithuanian

Read more