« BRM Resolutions and Notes available | Main | "Interoperability woes with MS-OOXML" »

Friday, 07 March 2008

MSOOXML BRM - XML Names

[Preamble: my opinions are mine alone, and may not represent Malaysia's official positions. We have "Technical Committees" and "Industrial Standards Committees" and "Department of Standards" and "Ministries" to decide on this matter. The following is merely what I, as an unpaid volunteer and n00b to XML observed and think about this whole thing. My memory is not perfect. If you feel I misrepresented anyone or any ideas, please comment below and I WILL correct it. Promise!]

Now that the Edited Notes of the BRM have appeared, I have been getting queries on Malaysia's decisions at the BRM last week. So the best way I guess is to blog about it, and hopefully we get the story straight and not jump to any unwanted conclusions. After all, the rather concise notes of the BRM hides the extensive dialogue exchanged on each resolution.

Due to the large number of issues in contention, it was decided that issues will be raised by countries in a round-robin fashion in alphabetical order, where what countries considered the most important issues would be put up for deliberation. Malaysia's turn was the last country for the first day. Lucky us.

Our first item was to raise the issue with XML Names (MY-0016), in that there seems to be no logical and consistent naming convention in the Office Open XML (OOXML) spec. For example 'scrgbClr', 'blurRad', 'dir', 'algn' are names of attributes with different naming conventions, all of which appear within one XML element, 'outerShdw'.  I detailed this in a blog post entitled "OOXML has poor element names" more than a year ago now.

In that post, I argued that these names should conform to at least Microsoft's own developer's manual "Code Complete" by Steve McConnell. Additionally, it was found that in the ISO/IEC Directives, Part 2 "Rules for the structure and drafting of Internationa Standards" (2004) it clearly states  (page 11, my emphasis):

4.4 Consistency of documents

In order to achieve the aim of consistency within the complete corpus of documents published by ISO and IEC, the text of every document shall be in accordance with the relevant provisions of existing basic documents published by ISO and IEC. This relates particularly to

a) standardized terminology,
b) principles and methods of terminology,

c) quantities, units and their symbols,
d) abbreviated terms,
e) bibliographic references,
f) technical drawings and diagrams,
g) technical documentation, and
h) graphical symbols

Ecma's did provide a response to this concern in their proposed dispositions on the 14th of January. It was not satisfactory. Their explanation was, to summarise:

  1. element names (short or long) has always been an open debate with XML
  2. its a personal preference rather than a technical issue
  3. terseness is for performance and document size (<c> instead of <cell>)
  4. there is already consistency among names, but it could improve during maintenance
  5. there should be a "consistent abbreviation principle"
  6. recommend DSRL to have XML names dynamically updated to users preferences

Now I don't know where or how to start addressing this response. Is there an 'open debate' on XML names? The goals of XML is to be "humanly readable" and "terseness in XML markup is of minimal importance." Performance? The fact that MSOOXML documents are zipped means that most frequently occurring element names are tokenized to the smallest size for speed. So it becomes a non-issue. Of course we want a consistent abbreviation principle. Thats exactly what this concern requests.

Back to what was discussed at the BRM - logistical issues were raised about this concern:

  1. The changes to the spec would be quite extensive, affecting almost ALL the Parts.
  2. This would make existing MSOffice 2007 documents incompatible (use DSRL instead!)
  3. There will not be enough time within the BRM to come up with the clear instructions for the Editor to follow

I pushed on and said that these quirky names were difficult to understand in English, let alone a non-native English speaker who would have trouble finding the references in a Dictionary.

The Canadian and Australian delegation suggested that the name change could be handled using DSRL which would treat the original spec as one 'language' and write mappers for each element name to proper English. After all, why be so presumptuous and have English as the only language?

The China delegation added that in UOF they tagged each element with a number and used that to rename each element to suit whichever language the user wanted it in. Another country argued that writing these mappings will take time, and that is something we do not have ...

All of these suggestions were valid, and were possible solutions however it did not answer the basic question on why OOXML does not have a proper naming convention, let alone a conscious decision to cater for internationalisation or even naming.

There was consensus then that although this item was simple in concept, was too vast in changes to be handled by the editor, and therefore it's a problem to be addressed during the 'maintenance phase' of this spec.

Malaysia had to concede this issue as to not waste further time (after all, it was the end of the day, and the Swiss were getting twitchy). That is why it has been noted by the convenor that:

MY-16: Names
No consensus emerged on the MY proposal; a reformulation by MY remained possible.

=== Remember Housekeeping notice: My personal views come in thick now ===

Now my personal opinion is that this basic issue should have been addressed when Ecma was preparing Microsoft's original spec which, to me, looks like the C (language) types and structure names from various naming conventions (if any) were 'dumped' into an international standard. I see it as a failing on Ecma's part to apply the minimum requirements of ISO/IEC's Drafting Rules to the specification.

Hey, what is Microsoft paying them for? To produce an international standard, right?

Jankyburg_web BTW, Ecma held a cocktail event on Thursday evening (28th) and I had an opportunity to chat with Mr Jan van den Beld. He was the head of Ecma and instrumental in seeing OOXML through the process. He re-justified the case for OOXML to us delegates, and yes, he did reiterate his favourite phrase, and Ecma's unofficial motto “Better a good standard today than a perfect one tomorrow”... I guess different people have different definitions of "good"!

He did, with good humour, apologize for being the one responsible for bringing all of us here. That was nice of him ;-)

Maybe I'm too much of an idealist, and he being in the standards business for so long, and having seen so much rubbish (^H^H^H^H vendor specifications) pushed through over the years, has lower expectations for international standards. I dunno. Hey, maybe I AM cynical, too!

080228ecma

... more itty bitty finger food.

So I was disappointed that Malaysia's request for clearer XML names was pushed to a later undefined date to be addressed, but I was also relieved that I didn't have to do the tedious and impossible work of creating editorial instructions to rename all the XML attribute and element names in the 6000 page spec within the following 4 days.

That work, I still believe, should have been Ecma's responsibility 2 years ago, and I hope they address that soon as a work item in SC45, or at least come up with a comprehensive naming convention guide as a first step ...

It also has to be mentioned, that when this item was raised, UK chipped in with one issue of their own which is that the 'e'  element has 18 different definitions (see GB-0537). They wanted these occurrences to be named appropriately.

Ecma's disposition (Response 559) was a null response: they merely agreed with UK, and then listed the 18 e element with its different meanings in a table. Gee, thanks, Ecma! Thats really helpful.

I hope you can now understand the frustration of many National Bodies who found Ecma's responses to be consistently poor and unresponsive. Oh yeah, they are really good with the editorial fixes, but the technical responses unfortunately has a poor track record. Read them for yourselves at www.dis29500.org. It's a great resource.

I have since done more reading up on the ISO/IEC Directives Part 2, and UK's concern is very valid with regards to proper ISO Standards Drafting Rules (Section 4.3 Homogeneity, page 11, my emphasis):

The same term shall be used throughout each document or series of associated documents to designate a given concept. The use of an alternative term (synonym) for a concept already defined shall be avoided. As far as possible, only one meaning shall be attributed to each term chosen.

These requirements are particularly important not only to ensure comprehension of the document, or of the series of associated documents, but also to derive the maximum benefit available through automated text processing techniques and computer-aided translation.
 

So it becomes obvious that UK's concern  GB-0537 is definitely not addressed by Ecma's Response 559. Maybe the Ecma editor needs to brush up on ISO Standards Drafting rules? Or is it another opportunity for DSRL to save the day?

yk.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/686627/26806220

Listed below are links to weblogs that reference MSOOXML BRM - XML Names:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Good post, yk. What is remarkable is that almost 2 weeks after the BRM, there are still many details that are just getting to the light of day.

Thanks.

In addition, the lack of consistency of the DIS 29500 directly contradicts other normative dispositions of the quoted ISO/IEC Directives.

Page 10:
"Uniformity of structure, of style and of terminology shall be maintained not
only within each document, but also within a series of associated documents."

Page 11:
"Analogous wording shall be used to express analogous provisions; identical
wording shall be used to express identical provisions. The same term shall be used throughout each document or series of associated documents to designate a given concept. The use of an alternative term (synonym) for a concept already defined shall be avoided. As far as possible, only one meaning shall be attributed to each term chosen."

Dario,

Thanks for the input. Thats an interesting example as it highlights the XML Naming issue, Voting issues, and the issue with backward compat and how Ecma by default handles it: "Just add it on!"


Jomar,

Your request for the Binary to XML mapping is a valid request at this stage in time. After all, Microsoft never fails to claim "full fidelity to existing corpus of legacy documents". Ecma should be ashamed if they dont have this information available today.


Sander,

I agree with your assessment. Ecma's default resolutions are NOT to change the spec. That was also reiterated to us at the BRM after items failed to be resolved when it was voted down.

Interestingly enough, Doug Mahugh just wrote a blog post about this issue. Perhaps you can ask him about this matter yourself?

http://blogs.msdn.com/dmahugh/archive/2008/03/06/brm-results-publicly-available.php


Regards,

yk

--quote--
But in the proposed disposition ( Response 164 ) ECMA proposed to *add* "top","center","bottom" elements *leaving the old elements" "t", "mid" and "b".
--end quote--

Of course. As posted previously on this very blog, any and all changes made to the OOXML spec must leave compatibility with the current implementation of OOXML in MS-Office 2007. Ecma does not allow any change that breaks this compatibility. Office 2007 is currently outputting t, mid and b so they cannot simply replace it. They have to add to it.

The entire purpose of this exercise is that MS-Office 2007 is outputting ISO-conformant documents if this standard gets approved in ISO. If they would not do this then MS-Office would still not be eligable for government tenders that require ISO conformant applications, while ODF-conformant applications would be eligable. That's the entire point of this standardisation debacle: making sure that governments worldwide don't switch away from MS-Office (and by extension Windows).

I believe that a lot of ECMA responses followed this premise:

-------
"Just give NBs something to pretend that we have fixed the problem, and don't forget to put the word 'AGREED' at the beginning"
-------

One example of many, related to the poor quality of DIS 29500 regarding XML naming:

See Response 164 of this GB-0634 comment (
http://www.dis29500.org/GB-0634/ ):

--------------
Comment GB-0634:

ST_VerticalAlignment is defined in sml-styles.xsd and in
dml-diagramTypes.xsd. In the former, it can be top, center or bottom
(among other values). In the latter, these are "t", "mid" or "b".
Terminology should be consistent within a standard.
Remedy: Change the latter to use "top", "center" and "bottom" in place
of "t", "mid" and "b".
--------------

But in the proposed disposition ( Response 164 ) ECMA proposed to *add*
"top","center","bottom" elements *leaving the old elements" "t", "mid"
and "b". So the problem ( terminology should be consistent within a
standard ) is not resolved but aggravated !

I believe that many of the NBs that "bulk" approved this kind of dispositions didn't even read what ECMA responded.

In this cases ( and there are many of them ) the "new proposed" text of DIS 29500 ( still not provided by ECMA ) will be *worst*.


To paraphrase Tim Bray:

"Treated purely as a spec for representing documents, OOXML is lousy. Frank Farance of the US ISO delegation was quoted as saying there are probably hundreds of defects. He’s being way optimistic. Every time I open it and start reading, I pretty soon come across some unforgivably-ugly piece of XML or hideous piece of English grammar or statement that just doesn’t make sense. There are going to be interoperability problems up the wazoo."
( the read of the post here http://www.tbray.org/ongoing/When/200x/2008/03/02/On-OOXML , where he discusses both sides of the issue in full context )

Post a comment

If you have a TypeKey or TypePad account, please Sign In

Welcome to
Open Malaysia blog!

  • Bloggers @ Open Malaysia
    We are a group of individual bloggers working to build openness in Malaysia's ICT culture. Most of us have day jobs and a couple of us are students. Those with a job work for companies ranging from large international enterprises to self-run Malaysian start-ups.
    Email us at this address:
    open -AT- openmalaysiablog -DOT- com

Disclaimer...

  • We declare our independence of opinions from our employers, institutions, associations and clients, past and present. Thoughts and expressions in the Open Malaysia blog are rightly each blogger's own and each of us stand by what we individually write. Views by readers who post comments and others whose writings we link to in this blog are theirs.

April 2008

Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30      

Categories

Subscribe to this site
- FeedBurner Feed

Subscribe to this site
- email alert options

Your email address:


Powered by FeedBlitz

Enter your email address:

Delivered by FeedBurner

Blog powered by TypePad