Open Packaging Convention (OPC)

Was reading MSDN Magazine through the week and there is an article in it on OPC. In short OPC is a new standard for saving complex data, perhaps stored in multiple files, to a single file (a package) using a combination of two standard technologies zip and xml.

Here is the article...

OPC: A New Standard For Packaging Your Data
http://msdn.microsoft.com/msdnmag/issues/07/08/OPC/default.aspx

The idea is simple you store your data files seperately and use xml to create relationships between them. All the seperate files are then combined into a single zip file. Office 2007 uses these new file formats, calling them Open XML Formats, but there is nothing stopping you creating your own.

I'll give you an example a Word document is made up of text, styles, maybe a macro and perhaps some images. With the new format instead of saving this in binary the following happens, the text is stored as a WordML file, the styles are saved in an xml file, the macros are stored in a seperate binary file and each image is saved to a jpeg file. Relationships are then used to knit the document together and essentially describe how the document is built from each of the seperate files. All the files are then zipped into an single archive and the normal zip extenstion is changed to associate the single file package with Word.

Here is a noob friendly article that describes the file format, a good place to start

Ecma Office Open XML Formats architecture guide
http://office.microsoft.com/en-us/products/HA102057841033.aspx

and here is another thats not so noob.... but more complete

Introducing the Office (2007) Open XML File Formats
http://msdn2.microsoft.com/en-us/library/ms406049.aspx

You also don't need to wait until your financially paranoid company decides to upgrade to Office 2007. There are both software development kits and Office 2003 Compatibility Packs to get you up and running (although I haven't had time to give them a proper once over).

2007 Office System: Microsoft SDK for Open XML Formats
http://www.microsoft.com/downloads/details.aspx?familyid=ad0b72fb-4a1d-4c52-bdb5-7dd7e816d046&displaylang=en

Microsoft Office Compatibility Pack for Word, Excel, and PowerPoint 2007 File Formats
http://www.microsoft.com/downloads/details.aspx?familyid=941B3470-3AE9-4AEE-8F43-C6BB74CD1466&displaylang=en

If any of the links don't work then do a search for the title and you'll get the pages.

It has always been a nightmare trying to extract data from Office files and if the company you work for is anything like the one I work for where the report is more important than the data held within the report, or you get many requests to extract data from Word or Excel, or even if your fed up telling people that the 36Mb spreadsheet that they cannot produce a report from should have been a database, then you'll be happy, very happy, as sometime soon things are going to get easier.

Published Saturday, July 21, 2007 6:33 AM by dsmyth

Comments

No Comments

The leading UI suite for ASP.NET - Telerik radControls
Outstanding performance. Full ASP.NET AJAX support. Nearly codeless development.