Reverse engineering LibreOffice
OpenDocument is a great format, but sure, it has its problems. One of these problems is the lack of specification for some features. For instance, which image format should be supported ? PNG, JPEG, GIF, these one are only logical. SVG too nowadays… But we could create the Calligra Picture Format if we wanted to, store it in the OpenDocument file… And every other office suite would be lost.
Introducing StarView Metafile
Of course, we don’t do that. It’s pointless. There are already so many formats, we would never have to create our own.
But over ten years ago… Things were different, especially in the StarOffice world.
StarOffice already had a drawing application, 20 years ago. To serialize drawings, they invented a format they call “GDI Metafile”, under the .svm extension (StarView Metafile). This format is used in StarOffice/OpenOffice.org/LibreOffice when you copy paste drawings for instance.
The main problem with this format is that it is absolutely not documented.
So since years, in documents in your mailboxes, on the internet… you may find pieces of undocumented format that noone can read except [Star|Open|Libre]Office(.org)? (that regexp is simpler that the usual list, I’ll use it now…), even if the document is respecting the OpenDocument specification (of course, 20 years ago, it was the StarOffice binary format… we won’t mention it at all, it’s off-topic).
Solving the blob
The svm format is, unlike SVG or OpenDocument, based on a binary stream and not on XML or any human readable format. I took a shot at it, using Okteta, and saw pieces of text that may describe elements, but nothing outside that except numbers…
So I went down to the LibreOffice source code directly. And it’s true :
- They still have lots of comments in german (hopefully, I speak german a bit),
- They reinvented the wheel, at least four times,
- They have lots lots lots of old forgotten backward compatibility code…
I really respect the hard working guys trying to clean that up. They have so much to do… And I didn’t dig far beyond the dust…
Anyway, I started working on both implementing and documenting that format. I’ll explain how it works later, when I really understand it.
Since speaking about images with only text is not really appreciated, here is one of my test SVM file, rendered with no LibreOffice code…