Playing with Unicode in Python 2 is not fun, and combining this with third-party libraries brings even more headaches. This post explains how Unicode in PyGTK is handled.
Note: This information is only valid for Python 2.x. It will likely change when PyGTK releases support for Python 3.
Calling GTK+ functions: PyGTK accepts
unicode objects as input.
str objects are assumed to be in UTF-8. If you pass a non-UTF-8
str to a GTK+ function, it will work until you try to show it, where you’ll get a “PangoWarning: Invalid UTF-8 string passed to pango_layout_set_text()”.
Handling GTK+ return values: PyGTK functions always return strings as
str objects. In most (all?) cases, the strings are encoded in UTF-8. Ideally, Python programs should use
unicode strings internally, so it’s wise to convert the output of PyGTK function calls to
label1.set_text("Some UTF-8 string") label1.set_text(u"Some Unicode string") x = label1.get_text() # x is an str object containing UTF-8 string. y = unicode(x, 'utf-8') # y is the unicode version of x. y = x.decode('utf-8') # Same as above.