Existing corpus of binary documents.
Malaysia raised 23 comments against the OOXML Draft standard at the end of the 5 month review period. By January 14th, Ecma responded with a set of proposed dispositions for Malaysia to review. One of the items we raised was about how OOXML handles dates.
I wrote a blog post on this issue "Malaysia's history is ill-formed" back in June 2007, to illustrate what is wrong with the spec, why it's not good for users and how to fix it. I even discovered that the predecessor to MSOOXML, the Microsoft Office 2003 XML file format (MSO2K3XML), actually solved the problem by fully implementing ISO 8601 date format!

MSO2K7 actually supports ISO 8601 dates ... but ironically, only when it writes to OOXML's predecessor file format MSO2K3XML, which was tauted as the next big thing in 2003. But subsequently forgotten within the year.
Ecma's response:
Proposed Disposition
We agree that it is important for SpreadsheetML to support ISO 8601, and we propose the following changes to allow dates in the ISO 8601 format. These changes update the description of date and time representation within SpreadsheetML. They also add the necessary schema to support the added format, and they provide examples of using ISO 8601 dates in SpreadsheetML.
Additionally, there was a request to remove one of the existing serial date base systems, but in order to maintain compatibility with the existing corpus of binary documents, those date bases will remain in the specification
To its credit it goes on to describe how to use the ISO 8601 date formats within the spec. But what is peculiar is that it still includes the old way of using the serial date base within the spec. The new ISO dates are only "recommended" while the old serial dates are still normative (i.e. must be implemented if a third party needs to write applications to support OOXML).
Corpus Porkus
The justification of including the arcane date encoding method is this "maintain compatibility with the existing corpus of binary documents". Ecma would argue that they are protecting current customers interests in that they have a huge investment in documents in the old format. And therefore OOXML should be compatible with them.
There are two ways of handling this "existing corpus of binary documents". The First is to leave it as it is and hope that in the future, filters will always be around to open the files, Second is to convert them to an archival format or to migrate to a future proof file format.
The first way is risky. People would not like risk relying on a vendor to be around for the next 50 - 100 years, let alone a tech company. The second way is possible, but painful in the short term.
Do nothing or Convert. Those are the choices. OOXML is not a magic device which is "compatible" with any of the binary file formats. It's merely another file format.
The magic lies in the file converters or translators. Translators know how to convert from one format to another. The more it knows about the two formats, the higher the fidelity of the conversion. So if Ecma was really worried about the existing corpus of binary documents, then it should start work on fully specifying the Binary File Format (BIFF) of Microsoft Office, macros and all. Hopefully we hear good news tomorrow (15th Feb), when its due to be (re-)released by Microsoft. And pigs may fly over the Petronas Twin Towers 2.
So it doesn't matter if OOXML needs to have its dates stored as a serial number. It can easily save it as a ISO 8601 date format. All the translation from string to serial number is done during the File Load/Save process, where the computer converts the internal representation to a user friendly, XML readable format.
If they say that "Oh no! Then it will be not compatible with our customers customized programs which go through millions of Excel rows!", the answer is simple. Your customers will have to re-write all their customized programs anyway to support your new file format OOXML, regardless of whether you use ISO 8601 or serial numbers.
If they say "But there already exists a few thousand Ecma 376 files in the wild which have encoded dates as serial numbers!", I can only respond: Bad bad excuse. It's the risk of the vendor to base its product on a file format which is immature and liable to change during the standardisation process. Don't expect the National Bodies to have to compromise just because of a vendors zeal to rush out a product with the full knowledge that the standardisation process will cause the specifications to change. The vendor has to deal with it, which means product recalls, patches or freely available 'convertors'.
[Update. 1am 15th Feb: I just remembered that Doug Mahugh had something to say about the risks in implementing Ecma 376 content creators before DIS 29500 was approved:
"Well, its too early for other vendors to commit to this file format. After the BRM (Ballot Resolution Meeting - in February 2008) there may be changes to it, so it is risky, and may not make commercial sense to implement OpenXML as it is at the moment."
Back to the date issue.
Malaysia is obviously not the only one who is worried about this issue. Czech, Denmark, France, Britain, India, Ireland, Kenya, Philippines, USA, and many others have all noted comments on this. What is interesting is the work from Antonis Christofides, a representative from the National Technical University of Athens, as a committee member of the Greek National Body. This item thoroughly reviewed and a constructive document was prepared which is worth reading.
"Alternative Disposition on Dates"
Its still in its draft stage, but it seems very well thought out. It is presented in a very visual manner and easy to understand. The solution is elegant in that ultimately only one form of encoding is used, while the hard work is done during the conversion process, where it handles the fringe case of a formula results on numbers being displayed as a date.
The Greek contribution goes on to compare how other applications have handled this problem, and demonstrates prior work on how it can be done:
The way to address the problem is similar to what has been done in OpenOffice and Open Document Format (ODF). ODF dictates that timestamps are stored as timestamps, leaving it to the application to handle legacy conversions. While OpenOffice Calc apparently treats timestamps in the same way as Microsoft Excel, in fact it includes underlying conversions so that it properly stores timestamps as required by ODF.
I think the Greek solution is sound, and I would recommend the National Bodies to support it during the BRM in a few weeks time.
The ramifications
Of course this is not just about dates. It's about all the issues where the excuse by the Ecma proposition was "No we can't fix the spec, we got to keep the way things are, because that's how it's always been and there are billions of documents out there. So tell you what we can do, let's just add in your suggestion as an additional solution, just to complicate matters further. kthxbai"
To highlight the additional complexities, the proposed disposition also includes new XML elements called valIso, maxValIso and minValIso. This is to complement the val, maxVal and minVal
within the spec. Oh, if valIso is in the cell, use it and ignore val.
But val is not really val as it depends on the epoch and if the year is
1900.
The "existing corpus of binary documents" is Ecma's stock solution to most of Malaysia's comments. Instead of cleaning things up, they give the impression that they are brushing things under the carpet and putting the burden of document fidelity on the shoulders of future developers instead of addressing it today. This is a fixable problem which can be handled by todays conversion software. Let's put an end to the propagation of 20 year old bugs once and for all.
yk.
[Update, 1am 15th Feb 2008: Identification of the Greek committee member, and Doug Mahugh's comment on the risks of implementing Ecma 376 before its ratified as an ISO standard]

How do you convert Macros from Excell97 to ECMA376?
Posted by: zoobab | Thursday, 14 February 2008 at 08:48 PM