Reverse engineering LibreOffice

April 18th, 2011  Posted at   KDE

OpenDocument is a great format, but sure, it has its problems. One of these problems is the lack of specification for some features. For instance, which image format should be supported ? PNG, JPEG, GIF, these one are only logical. SVG too nowadays… But we could create the Calligra Picture Format if we wanted to, store it in the OpenDocument file… And every other office suite would be lost.

Introducing StarView Metafile

Of course, we don’t do that. It’s pointless. There are already so many formats, we would never have to create our own.

But over ten years ago… Things were different, especially in the StarOffice world.

StarOffice already had a drawing application, 20 years ago. To serialize drawings, they invented a format they call “GDI Metafile”, under the .svm extension (StarView Metafile). This format is used in StarOffice/OpenOffice.org/LibreOffice when you copy paste drawings for instance.

The main problem with this format is that it is absolutely not documented.

So since years, in documents in your mailboxes, on the internet… you may find pieces of undocumented format that noone can read except [Star|Open|Libre]Office(.org)? (that regexp is simpler that the usual list, I’ll use it now…), even if the document is respecting the OpenDocument specification (of course, 20 years ago, it was the StarOffice binary format… we won’t mention it at all, it’s off-topic).

Solving the blob

The svm format is, unlike SVG or OpenDocument, based on a binary stream and not on XML or any human readable format. I took a shot at it, using Okteta, and saw pieces of text that may describe elements, but nothing outside that except numbers…

So I went down to the LibreOffice source code directly. And it’s true :

  1. They still have lots of comments in german (hopefully, I speak german a bit),
  2. They reinvented the wheel, at least four times,
  3. They have lots lots lots of old forgotten backward compatibility code…

I really respect the hard working guys trying to clean that up. They have so much to do… And I didn’t dig far beyond the dust…

Anyway, I started working on both implementing and documenting that format. I’ll explain how it works later, when I really understand it.

 

Since speaking about images with only text is not really appreciated, here is one of my test SVM file, rendered with no LibreOffice code…

Who doesn't love calligra ?

5 Responses to “Reverse engineering LibreOffice”

  1. You, Sir, are my hero! SVM is the bane of my existence, and the part of ODF which makes it nearly impossible for me to go out and push it with a straight face… But now… wow, seriously great work :)

    [Reply]

  2. TheBlackCat says:

    Is this going to be the default format for storing images internally in calligra, or is something more sane going to be used? Have you discussed with libreoffice developers about them changing to a more sane format by default, which calligra would also use? (perhaps png and svg)

    [Reply]

    moi Reply:

    There is no way this can be the default format for Calligra (I refuse to write a generator, only a parser should be written), and it’s not exactly the default format or OpenOffice either. Well, it sort of is, but I don’t know in which case exactly.
    Since recent releases, LibreOffice and OpenOffice.org support SVG, making it a much saner format to choose, but still, there are thousands of documents containing SVM files.

    [Reply]

  3. Jos says:

    Excellent!
    I’m very curious about this format. Are the LibreOffice / OpenOffice guys helping here? Surely having documentation will help them sensibly clean up their code.

    [Reply]

    moi Reply:

    They never provided any kind of documentation for years…
    So I just decided to go ahead and do it myself.

    [Reply]

Leave a Reply

*