PyGTK 2.22+ binaries for Windows

2010-10-26

These are actually development versions of PyGObject and PyGTK as downloaded on 2010-10-27, not any specific release versions. Credits to John Stowers, who maintains PyGTK’s Windows port.

Update (2011-01-08): pygtk.org now provides official packages for PyGTK 2.22. I’ve removed all download links to my own builds.

NumPy dependency

PyGTK uses NumPy in one function and this is not documented in the Win32 README. There is some automated check to disable NumPy support if it’s not available, but it fails on my machine because I have NumPy’s binaries installed; PyGTK’s setup.py assumes I also have the include files, which I don’t.

No big deal; --disable-numpy.

pkg-config and space characters

pkg-config only gained support for space characters five months ago, and that’s still in their development version. (Insert expletives here.)

I don’t think I’ve used C:\PROGRA~1-style paths in years. Thanks for the nostalgia, pkg-config.


Facebook doesn’t confirm e-mail addresses

2010-09-17

I don’t have a Facebook account.

If Facebook has been harassing you with messages that seemingly came from me, please take it up with them.

I should have received an e-mail when somebody registered an account under my e-mail address. Evidently, either the FAQ lies, or I threw the mail to the spam bin ages ago and forgot about it. The latter is more likely, but that does not excuse websites from applying a simple “confirm before use” policy.

Update: What makes me even angrier is that whoever registered under my e-mail address was able to see my list of potential contacts.


Chrome/Chromium with GPU acceleration

2010-09-14

Chromium just switched on GPU acceleration. There was a changeset in the source tree a few hours ago causing some test failures, prompting me to check what interesting development was happening. Apparently they changed the command-line switch needed to enable GPU acceleration, which seems to indicate that they wanted it enabled by default. This is likely a reaction to yesterday’s Slashdot coverage on GPU acceleration in web browsers.

Image showing Chromium automated test results with more than 20 failures

The first thing I noticed when using this Chromium build was its bugginess. As in crash-your-tabs buggy. There were quite a lot of test failures caused by the change.

The other problem could only be seen in pages with HTML5 Video (and most likely Canvas as well): the text in these pages looked blurry.

Image showing blurry font rendering

Compare with normal rendering:

Image showing sharp font rendering

(Update: r59324 has been reverted, so no GPU acceleration for now.)
(Update (2010-09-18): Looks like it’s back. I haven’t noticed any particular instabilities, so that’s great. Blurry fonts I can manage; that’s also how Firefox nightly looks like, and I’m starting to suspect my graphics driver.)

At least now we know that GPU acceleration in Chromium is not ready yet. At least now I know that whatever technique these browsers use to render pages with GPU doesn’t sit well with mine (an Intel chip).


Pango: Determine if a font is monospaced

2010-09-05

If you have a GtkFontButton, finding out whether the chosen font is monospaced is quite a complicated process. Here is a complete walk-through.

(By the way, I will be using PyGTK’s Pango documentation because the C version is a mess.)

FontButton.get_font_name returns the font family (a.k.a. “font name”), style, and size; for example, “Liberation Serif Italic 14″. The first thing we need to do is pick just the family name. We do this by going through a PangoFontDescription.

desc_str = font_button.get_font_name()  # Liberation Serif Italic 14
desc = pango.FontDescription(desc_str)
family_name = desc.get_family()  # Liberation Serif

Next, check whether the font family describes a monospaced font. Here is where it gets dodgy. We need an arbitrary PangoContext, which can be obtained from a GtkWidget using Widget.get_pango_context. We then list all available font families and find the one with the appropriate name. Call FontFamily.is_monospace to finish the job.

(By the way, this is also a good place to show off Python’s for-else construct.)

context = widget.get_pango_context()  # widget can be any GtkWidget.
for family in context.list_families():
	if family.get_name() == family_name:
		break
else:  # Should not happen.
	assert False
family.is_monospace()  # False -- Liberation Serif is proportional.


Win32 Python: getting user’s display name using ctypes

2010-06-19

This post explains how you can obtain the user’s display name (a.k.a. “real name” or “full name”) in Windows, using Python’s ctypes module. However, it also serves as a mini tutorial/demonstration of ctypes.

First, a bit of background. I researched this while working on a patch for Jokosher. When you create a new project in Jokosher, it will prompt you with a dialog asking for the name of the project and so on. One of the fields in this dialog is the Author field, which by default should be filled with the logged-in user’s real name. While there are several ways to get the user’s login name (a.k.a. “username”), there is no easy way to get their real name in Windows.

This is where ctypes and GetUserNameEx come in. ctypes is a Python library that lets you call C functions. GetUserNameEx is the C function in Win32 API that we want to call.

For the impatient, here is the full code. Continue reading if you want to know how it works and maybe learn a bit about ctypes. Otherwise, copy away. Note, however, that it does not have any error checking whatsoever.

import ctypes

def get_display_name():
	GetUserNameEx = ctypes.windll.secur32.GetUserNameExW
	NameDisplay = 3

	size = ctypes.pointer(ctypes.c_ulong(0))
	GetUserNameEx(NameDisplay, None, size)

	nameBuffer = ctypes.create_unicode_buffer(size.contents.value)
	GetUserNameEx(NameDisplay, nameBuffer, size)
	return nameBuffer.value

Read the rest of this entry »


Using matplotlib in a Web application

2010-06-11

matplotlib‘s FAQ has a section dealing with the exact topic of this post: using matplotlib in a web application server. The problem is that I couldn’t find it easily from my Web searches. What I’m doing here is adding my own twist to the answer, and hopefully making it slightly more search-friendly.

While testing a Web app that I was working on, I noticed that it would often hang. At first I dismissed it as a server problem, but it kept occuring on one particular page. A few hours and many head scratches later, I narrowed the problem down to matplotlib.

# Negative example; do not use.

import matplotlib.pyplot as plt

def callback():
	# ... (process data)
	fig = plt.figure()
	# ... (draw stuff)
	fig.savefile(path)

I used matplotlib to draw a plot and save it to a file. The code was quite long, but it involved steps similar to the above listing. The first time it ran, everything went OK. The second time, it always hung at the pyplot.figure call. This smelled like a threading / deadlock problem, so I tried to put a lock on the pyplot calls (which I should have done anyway, considering pyplot operates on a single plot at a time). Still, it didn’t work.

After some Web searches and more head scratching, I accidentally arrived at the FAQ entry mentioned earlier.

Here’s the gist of the available solutions.

First option: configure matplotlib to use the Anti-Grain Geometry backend. Continue using pyplot, carefully grouping its commands together and surrounding them with a lock.

from threading import Lock
lock = Lock()

import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

def callback():
	# ... (process data)
	with lock:
		fig = pyplot.figure()
		# ... (draw stuff)
		fig.savefile(path)

Second, better option: dump pyplot and use matplotlib’s object-oriented API instead. For this one, you don’t need to care about threads or locking or whatever.

from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
from matplotlib.figure import Figure

def callback():
	# ... (process data)
	fig = Figure()
	canvas = FigureCanvas(fig)
	# ... (draw stuff)
	canvas.print_figure(path)


Converting MDB to CSV using OOo/LibO

2010-05-18

This post explains the steps necessary to convert a Microsoft Access database file into CSV, using OpenOffice.org (or LibreOffice) Base.

  1. OOo can’t open an MDB file natively. You have to create a new ODB file that connects to the MDB file as a “remote” database. Just don’t try to move around the files.
  2. Copy the tables over into a different, native ODF database (ODB) file. This may sound arbitrary, but it’s necessary; otherwise, the next step would either be impossible or unbearable.
  3. Base doesn’t have an “export to CSV” menu entry. You have two options:
    1. Run the following SQL command (Tools → SQL…).
      SELECT * INTO TEXT "filename_without_ext" FROM "TableName"
      

      This command only works on a native ODB because Access’s SQL engine doesn’t support the INTO TEXT construct.

      Note that this actually creates an external table in the database (if it doesn’t show up, reload the database). As a consequence, you can’t use the same name for the CSV file as an existing table.

    2. Copy the table into Calc and export it to CSV. This would have been extremely slow if done from the non-native DB.

Credits to the people at an OOoForum.org thread who figured this out.


User agent madness

2010-05-15

Have you seen Chrome/Chromium’s user agent string?

Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.8 (KHTML, like Gecko) Chrome/5.0.394.0 Safari/533.8

I’m surprised they didn’t throw IE and Opera in there as well, for good measure.

I know of the compatibility arguments, but really, this is absurd.

Update (2010-05-26): Compare with Midori’s UA string:

Midori/0.2 (X11; Linux; U; en-us) WebKit/531.2+


EBML/Matroska parser for Python

2010-04-28

This post explains an EBML parser that I wrote in Python. (EBML is Matroska‘s binary “markup language”.) It is implemented as a single-file library and is available under a free software licence.

Background

I’ve been working to implement Matroska (mka, mkv, webm) tag-reading support in Exaile. Mutagen—the tag library that we use—currently doesn’t have this feature, so I looked elsewhere.

[Update (2010-06-16): News about Exaile 0.3.2, location of the Matroska parser in Exaile's source tree, and discussion on WebM support.]

Choices

Previously, I had a working solution using hachoir-metadata, but it doesn’t really make sense to depend on another large tagging library when we’re already using Mutagen. To make matters worse, I accidentally deleted the branch during our recent Bazaar upgrade problem.

I started shopping around for other possible solutions and found videoparser, which seemed quite nice and compact. It’s still a different library, though, and it doesn’t seem to be packaged in Debian.

I was considering just using it anyway for yet another temporary hack when I chanced on MatroskaParser.pm, a Perl library written by “Omion (on HA)”. It’s only 816 lines of Perl; discounting the README and the Matroska elements table, we’re looking at than less than 450.

Solution

I took the crazy decision of translating MatroskaParser.pm into Python. Despite the horror stories out there about Perl, this particular code is written in a style that is extremely readable if you’re somewhat familiar with the language.

Well, I’ve finished the porting: 250 lines of EBML parser written in Python. Parts of MatroskaParser.pm that are not relevant—mainly the validity checker and the Block parser—have been removed, and the output data structure has been simplified. The next job is to actually extract tags out of the structure.

Matroska tags

Matroska tags are quite different from MP3 and Vorbis tags, in that they’re not just a flat list of key-value pairs. Consider the following snippet.

[{'SimpleTag': [{'TagDefault': [1],
                 'TagLanguage': ['und'],
                 'TagName': ['TITLE'],
                 'TagString': ['Light + Shade']},
                {'TagDefault': [1],
                 'TagLanguage': ['und'],
                 'TagName': ['ARTIST'],
                 'TagString': ['Mike Oldfield']}],
  'Targets': [{'TargetTypevalue': [50]}]},
 {'SimpleTag': [{'TagDefault': [1],
                 'TagLanguage': ['und'],
                 'TagName': ['TITLE'],
                 'TagString': ['Surfing']}],
  'Targets': [{'TargetTypevalue': [30]}]}]

There are two types of tags in this example. The first (target type: 50) explains the album (title: Light + Shade, artist: Mike Oldfield), while the second (target type: 30) explains the track (title: Surfing). Translating this structure into tags that Exaile can understand is not hard, just needs a bit of planning.

(By the way, notice that Matroska makes implementing album artists / compilation albums very intuitive: you can have an artist tag at album level, and another at track level. There are even other levels specified.)

Another tricky part is getting the track length out of the structure. Under /Segment/Info, you’ll find something like

[{'Duration': [14821615.0],
  'TimecodeScale': [22674]}]

At first I randomly assumed the duration is specified in seconds, and got around 171 days as output, which is obviously wrong. Apparently you need to apply this formula to get the length in seconds:

Length = Duration * TimecodeScale / 10^9

Code

The code is now available at Exaile’s repository. It’s licensed under GPL 2+ with the standard Exaile exception, although I will consider relicensing it if there is interest.

Notice that the last 100-or-so lines make up the Matroska tagging part. Depending on your needs, you may need to expand the list of elements based on either MatroskaParser.pm or the Matroska specification.

Future

Matroska read-only tag support will be in Exaile 0.3.2. Maybe one day I’ll add write support and integrate the whole thing into Mutagen, but don’t count on it. If anyone wants to do it, I’m more than happy to help.

My next goal is to create a subclass of the EBML parser that uses GIO. It probably won’t be relevant to most people, so just be aware.

What about WebM?

Funny how I made this post shortly before WebM was announced. Coincidence? Yes, unfortunately; I’m not as cool as the Mozilla and Opera people, who were let in on Google’s secret.

At this point, the WebM container is mostly just a subset of Matroska (the only incompatibility I’ve noticed is the change in doctype, from matroska to webm). As far as I know, they use the exact same EBML structure for tags, so there’s no reason Exaile or this code shouldn’t be able to read tags from a WebM file.


Google presentations: Go, mobile optimisation

2010-03-17

There were two short presentations by Google earlier today at my university. The first was on the Go programming language, the second on optimising the Google Maps API for mobile devices.

This post serves as rough notes for the things I learned. I’m not sure everything here is correct, considering these are stuff I’ve never researched before.

Go

  • Concurrent programming language. Déjà vu.
  • Weird type system: static typing, no inheritance, implicit interface declaration.
  • Composition = Ruby mixins?
  • Goroutines = coroutines = Erlang processes.
  • My guess is this will end up like D, Pike, Groovy, Nemerle, et al., but who knows. The syntax is certainly nicer than Erlang.

Google Maps optimisation

  • Situation
    • JavaScript parsing takes ages.
    • 3G is high latency but high bandwidth.
  • Solution
    • Static initial image (standard practice among UI people, but interesting application).
    • Less code (well duh).
    • MVC (not very clear how).

Follow

Get every new post delivered to your Inbox.