Tuesday, July 28, 2009

Extracting Clipart from the Office Website

Despite things like the recent GPL-licensed(!) contribution from Redmond, Micro$oft is not well-known for being friendly toward the open source movement. One example is the clipart available on the Microsoft Office website. It's an excellent resource for teachers and office workers. With the Office 2007 clipart manager dead in the water with Crossover Office and Wine last I checked, getting clipart to make newsletters and such in Linux is a problem.

The problem is that clipart from the Office website is downloaded in MPF files. These are simply XML files with the actual file data base64-encoded. There is a Perl script to extract the files, but it requires several modules to be downloaded from CPAN. Not too horrible for me, but definitely not something that, say, the secretary at my church would be able to tackle. No more. I have written a GPL-licensed Python script, mpfextract.py, to extract all clipart from MPF files. This'll also be a nice thing to have once Haiku has WMF support beyond having libwmf ported. :)

