Monthly Archives: April 2009

Unicode with Python 2 and PyGTK

Playing with Unicode in Python 2 is not fun, and combining this with third-party libraries brings even more headaches. This post explains how Unicode in PyGTK is handled.

Note: This information is only valid for Python 2.x. It will likely change when PyGTK releases support for Python 3.

Calling GTK+ functions: PyGTK accepts str and unicode objects as input. str objects are assumed to be in UTF-8. If you pass a non-UTF-8 str to a GTK+ function, it will work until you try to show it, where you’ll get a “PangoWarning: Invalid UTF-8 string passed to pango_layout_set_text()”.

Handling GTK+ return values: PyGTK functions always return strings as str objects. In most (all?) cases, the strings are encoded in UTF-8. Ideally, Python programs should use unicode strings internally, so it’s wise to convert the output of PyGTK function calls to unicode.

Example:

label1.set_text("Some UTF-8 string")
label1.set_text(u"Some Unicode string")
x = label1.get_text()  # x is an str object containing UTF-8 string.
y = unicode(x, 'utf-8')  # y is the unicode version of x.
y = x.decode('utf-8')  # Same as above.
Advertisements