« ODF Plug-In Preview Available | Main | Do we need two ISO standards for document format? - Part 4 »

Thursday, 22 February 2007

Billions of Documents

“... all the features and functions of Office can be represented in XML and all your older Office documents can be moved from their binary formats into XML with 100 percent compatibility. We see our investment in XML support as the best way for us to meet customers’ interoperability needs while at the same time being compatible with the billions of documents that customers create every year.”

July 5th 2006 Chris Capossela

“A Foundation for the New World of Documents”

“Open XML standard covers the full set of features used in the existing corpus of billions of documents.”

9th December 2006 Roberto D'Angelo

“There are billions and billions of documents our there that were created in Microsoft Office. We are taking on the task of migrating all those into an open XML format that will fully preserve everything.”

19th May 2006 Brian Jones

“Given the billions of documents that have been created using Office and how many will be created in the future, this is a huge step for Microsoft and for the industry as a whole.”

22nd November 2005 Richard Godfrey

“The new Open XML file formats offer true compatibility with all of the billions of Office documents that already exist. You can take any existing Office document (well, any created in the last decade or so), open it in Office 2007 Beta 2, and then save it as an Open XML Format document that will look like the original, print like the original, and behave like the original.

28th July 2006 Doug Mahugh

“The specification enables implementation of the standard on multiple operating systems and in heterogeneous environments, and it provides backward compatibility with billions of existing documents.

14th February 2007 Tom Robertson and Jean Paoli

Billions of Documents

250pxdr_evil

Whenever I read that statements of this sort, images of Dr. Evil from “Austin Powers” just pops into my mind where he places his little pinky to the corner of his mouth smiles and threatens the world with weapons of mass destruction, extracts huge ransoms to further fund his plan for world domination.

The image cannot be more appropriate when minions recite the same phrase although with reference to a much more geeky subject, on Office Document Formats. However, “One hundred Billion Dollars” is at stake here too.

Throw me a frickin' bone here! Need the info!

Steve_mutkoski So I just had to take this point to task. During the deliberation of Ecma 376 in SIRIM TC4 meeting, Microsoft was represented by Stephen Mutkoski (Regional Director for Interoperability and Innovation, Microsoft Asia APAC Corporate Affairs team), who is actually based in Microsoft Singapore.

You would have thought that his credentials and the fact that his company would fly him all the way up to Malaysia would mean that he would know the intricacies of the Microsoft Office Open XML (MSOOXML) specification to explain everything to us at the Technical Committee.

After his marketing spiel on the benefits of Ecma 376, we just had to ask:

What exactly in the 6000 page specification does MSOOXML guarantee backward compatibility with the 40 billion documents created worldwide?”

Here were some talking points he could have used to help his cause:

If he could point out specific technical definitions within MSOOXML which elaborate on the legacy issues, and explain the inclusion of the undescriptive stubs like “lineWrapLikeWord6” which simply state “utilize and duplicate the output of those applications” then perhaps we may be convinced.

If he elaborated on how MSOOXML addresses the legacy bug of handling the 1900 leap year problem, instead of just replicating and propagating the problem, then perhaps there is some justification on including the bug in the supposedly forward looking specification.3dbevels

If he could convince us that backward compatibility can only be achieved by rehashing the arcane Rich Text Format into XML, instead of architecting a robust set of File Format Import and Export Filters, then maybe we would see that their claim may be justified

If he could persuade us that Office Macros play a small role in the backward compatibility problem and thats why its excluded from the MSOOXML spec then perhaps we could forgive Ecma for leaving it out and in its place, describe over many pages, how to render 3D widgets.

However he could do none of the above nor did he attempt to even try. Perhaps it was all the laughing after the “billions of documents” which caused the uneasy silence.

And when Mr. Bigglesworth gets upset ...

There was a question by a fellow committee member regarding Microsoft's “Covenant not to Sue” with regards to the MSOOXML spec.

Stephen described that Microsoft took great pains in making sure that the licensing is as clear and as friendly as possible and Microsoft's pledge not to sue implementations of MSOOXML is sincere and this is confirmed with his work with Larry Rosen in making the legal aspects compatible with Open Source licenses.

So the rep from Microsoft is a lawyer. They sent a lawyer to a Technical Committee briefing. Okay, perhaps we could get more legal information then.

I asked, “Would Microsoft sue developers of office applications who implement areas which are not directly covered by Ecma 376?”

“Don't say 'sue' ...”

“But there is a 'Covenant NOT to SUE' and my limited, layman legal understanding would just assume that the opposite would be 'to sue'

“...”

“So specifically, if a developer independently implemented 'Macros' to be compatible with Microsoft Excel 2007, would they get sued?”

“Are Macros included in the specification?”

“ ... !?  Sorry? Are you asking me? Haven't you read the 6000 page document?”

“ ... no ....”

“Well, in the brief time I had to review Ecma 376, I could not locate a section on Macros within the large document. Also Microsoft Excel 2007 (beta) has an alternate file format to save which “includes Macros” which suggests that the Ecma 376 specification is deficient.

So if Macros are not included in Ecma 376, will developers who develop it independently get sued?”

“I will have to get back to you ...”

This has direct implications to the guys at Novell who are working on VBA integration with OpenOffice.org.  Sure, with the new partnership between Microsoft and Novell, customers of Novell would not get sued, but the Novell team themselves and all other OpenOffice.org users are at risk here. So its imperative the Microsoft provides an answer if closely related technologies to MSOOXML are also covered in this covenant, so it is possible for third parties to develop solutions.

Macros (together with Formatting) is one of the major problems in allowing users to interoperate between legacy and alternate applications. i.e. We have trouble between versions of Microsoft Office, and bigger problems if we wanted to use alternative applications like OpenOffice.org or Lotus 1-2-3. This severely limits choice.

Im surprised that Barclays Bank which sits in the Ecma TC didn't request for Macros to be included considering they are large users of spreadsheets. If its an issue of volume, what difference would another couple of hundred pages do to add onto the specification?

So does MSOOXML really provide “true backward compatibility” or define a “full set of features” or “fully preserve everything” or “behave like the original”? No matter how many people in technical authority state it, this is blatant “marketing speak.”

Billions and Billions

Dr. Evil made famous the use of “one hundred billion dollars”. Because of this, I never fail to chuckle when people use the word “billion”.

200pxsagan_large3

Predating the Austin Powers movie franchise however, the extensive use of the word “billions” was made famous by the great Dr. Carl Sagan. [He claims that he never uttered the phrase “billions and billions” but he eventually succumbed and wrote a poignant book on the phrase.] Far from being Evil, he was a great critical thinker, scientist, and he helped saved the world from a nuclear holocaust.

His words of advice to amazing promises such as these are:

Extraordinary claims require extraordinary evidence.”

 

Microsoft's claims are extraordinary. We need proof.

 

 

yk.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/686627/16306298

Listed below are links to weblogs that reference Billions of Documents:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Wol: Thanks for the info on WordPerfect. Its interesting that their file formats have not changed much over so many years while Microsoft has jumped from one format to another. My guess is that most people would ignore MSOOXML anyway and select the binary .doc as their default file format in MSOffice07.

SM: Can you be more specific on the "binary blobs" which are not documented? For example we know that Macros are still in binary and not included in the specs. Perhaps something which we can identify, and I can ask the Microsoft folks the next time I meet them.

Stephane: Your information is valuable. You should get yourself heard in more channels.

Zaine: Yes, the "billions" chant has been going of for quite a while. I guess if you say it often enough, people will start to believe it, yeh? I personally wont mind a 6000 page specification IFF it contained useful information. Like you, I find it written like a hodgepodge of what lazy programmers put together at the last minute.


yk.

Their "billions" theme sounds like a republican talking point memo. Everyone on message, repeat the same ad nauseum, no matter the question.

What's caught my eye so far are the shallow reasons the Microsoft folks have provided for the initial set of contradictions uncovered. And every one of them is, ironically, proud that the OXML spec exceeds 6,000 pages — "If it were any less, you'd be bitchin' about that, too!"

No, no I wouldn't. No one can tell me what's in it, even though I've read through it, it reads like the composition of confused freshman.


Great post.

And for what it's worth, my modest contribution here : http://www.codeproject.com/useritems/office2007bin.asp

(a copy of this article was actually sent to Doug Mahugh (MS) back in August 2006, and subsequently received no backing. I wonder why...)

In fact, not only the VBA macros are not part of the specs, not even the attaching of macros to document parts is properly documented (which is something new in OOXML, since so far VBA macros and the document's content lived 100% separately in OLE streams).

As for the lack of backwards compatibility, there is more to say. I can send a few details if that helps. Send an email over at stephane dot rodriguez at gmail dot com.

From the comments returned on the 6000 page ECMA submission, it is clear that Open XML (now called Office Open XML (OOXML) presumably to try and create confusion with OpenOffice which uses the truly open ISO/IEC 26300 ODF format) is nothing more than undocumented proprietary Microsoft binary blobs wrapped up in XML. Since the binary blobs are secret and undocumented, nobody except Microsoft can impliment an application that makes use of content in those binary blobs.

This means that OOXML is just another one of Microsoft's incompatable proprietary file formats which nobody can implement fully except Microsoft, and so is completely useless for interoperability or long term storage.

OOXML is actually less interoperable than the old binary .DOC file formats, because the Microsoft patents on the "standard" can be used to prevent third party vendors implementing it properly if they ever become a threat to Microsoft, unliike the old binary formats which can be legally reverse engineered.

As for OOXML as an ISO standard, since OOXML does not define the proprietary blobs that are wrapped in the XML, Microsoft is the only company that can write applications that can fully access the actual information in the files. What is the point of an international standard that only one company can implement? The only useful function of OOXML is therefore to allow third party vendors to write OOXML save and load plugins for MS Office. Why is there any need for that when Microsoft will bundle a save and write plug-in with MS Office?

Apart from that, ODF exists as an international standard already, and according to ISO rules, existing standards must be used if they exist rather than creating new incompatible standards. To justify that OOXML should be made an ISO standard rather than incorporating it's functionality into the existing ODF ISO standard, Microsoft claims that it is impossible to implement the full functionality of OOXML into ODF. To prove this is a bald faced lie, the Open Document Foundation wrote a plug-in for MS Word called ACME 376 which does everything that OOXML does with 100% fidelity, but using extensions to ODF. You can try it here for yourself:
http://opendocument.foundation.googlepages.com/home
Of course, because Microsoft's undocumented proprietary binary blobs cannot be properly accessed by anyone other than Microsoft, like OOXML, it is only possible to get full functionality on Microsoft products that can understand the undocumented binary blobs. Hence like OOXML it is only good for writing plug-ins for MS Office, and it is completely useless for long term archiving or interoperability since the content in the binary blobs is secret and undocumented, and only accessible to Microsoft.

So MS needs to create this new stuff to provide "10 years of compatibility"? (ie back to 1997 and Office 97).

Let's "compare and contrast" with WordPerfect, which still uses the SAME file format that came out in 199*4*. I can read documents that far back, no problem. Indeed, I can create documents using the LATEST version, and somebody using WP6 (from 1994) can READ them no problem.

And WordPerfect blamed MS for forcing this change onto them - this ability would go back to the 1980s if it weren't for that! Oddly enough - you are apparently recommended to save WP documents in that 80s format for best conversion to Word ... it can't cope properly with the "new" WP file format!

Cheers,
Wol

Post a comment

If you have a TypeKey or TypePad account, please Sign In

Welcome to
Open Malaysia blog!

  • Bloggers @ Open Malaysia
    We are a group of individual bloggers working to build openness in Malaysia's ICT culture. Most of us have day jobs and a couple of us are students. Those with a job work for companies ranging from large international enterprises to self-run Malaysian start-ups.
    Email us at this address:
    open -AT- openmalaysiablog -DOT- com

Disclaimer...

  • We declare our independence of opinions from our employers, institutions, associations and clients, past and present. Thoughts and expressions in the Open Malaysia blog are rightly each blogger's own and each of us stand by what we individually write. Views by readers who post comments and others whose writings we link to in this blog are theirs.

April 2008

Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30      

Subscribe to this site
- FeedBurner Feed

Subscribe to this site
- email alert options

Your email address:


Powered by FeedBlitz

Enter your email address:

Delivered by FeedBurner

Blog powered by TypePad