« UOF and ODF comparison. Which should we choose? | Main | Use of Percentage as a measurement unit in MSOOXML is inconsistent »

Friday, 19 January 2007

OOXML has poor XML Element names

XML was designed to be humanly readable, in that if a developer had to look at an XML document, the document in itself would be self descriptive and easily understandable without much need to refer to the specification. Human readability is also a measure of good interoperability as the easier to understand a document, the easier it is to write translators for it. A poorly defined XML specification is a specification which would define tags which are non-descriptive, confusing, have inconsistent naming conventions and this contradicts the spirit and purpose of XML.

The specific goals of XML which MSOOXML contradicts are:

  1. XML documents should be human-legible and reasonably clear.

  1. Terseness in XML markup is of minimal importance.

An example of where this is a problem with EOOXML is the treatment of XML Element and Attribute names. In the element “5.1.10.45 outerShdw (Outer Shadow Effect)” (page 4413) in the VML section, the naming convention is devoid of vowels in the second word (e.g. where Shadow becomes Shdw).

This is a common programming naming convention where the programmer trades off readability for the economical use of keystrokes. However this naming convention should not apply on the XML level, as it will cause confusion and hinder future understanding of a document.

Here is the EOOXML specification itself displaying the many Attributes of this Element 'outerShdw':

Xmlnamesoutershadow

Take for example, the Child Elements and Attributes associated:

scrgbClr,  algn,  blurRad, dir,  dist, rotWithShape.

We need to query why these Attributes need to be so cryptic within the specification? As a recommendation, it would help the readability, clarity and consistency if these Attributes were renamed as such:

ScreenRGBColor, ShadowAlignment, BlurRadius, Direction, Distance, RotateWithShape

The 3 bytes saved in defining 'blurRad' versus 'BlurRadius' is not justifiable for the trade off for readability. It is even more so unnecessary with the 1 byte savings from 'align' to 'algn'.

Additionally, why is there a contradictory Element naming convention and Attribute naming conventions? The Element 'outerShdw' has its second word de-vowelled while Attributes like 'blurRad' and 'rotWithShape' do not exhibit de-voweling but instead a truncation of either first or seconds words. This selection seems arbitrary and inconsistent.


In addition, the naming convention itself is not consistent throughout the EOOXML specification.

In WordprocessingML, the attributes do not have the vowels removed. For example, the Element  “2.15.1.78 settings(Document Settings)” (page 2020) has a whole list of Child Elements have different naming conventions from the VML section.

Xmlnamessettings

The ones of interests are:

ActiveWritingStyle, attachedSchema, documentType, docVars, endnotePr, hdrShapeDefaults

'endnotePr' is similar to the VML convention except that Pr denotes 'Preference' when actually convention should dictate the name to be 'endnotePrfrnce'.

Note that even within the Child Elements of 'settings', there are inconsistencies. 'documentType' which is verbose and easy to read is contrasted with its sibling 'docVars' where the words 'document' and 'Variables' are both truncated. These two Child Elements should have their naming conventions harmonized to prevent significant confusion.

'hdrShapeDefaults' is another example of vowel removal, except that it is from the first word, which is another contradiction in naming conventions.

Codecomplete2Naming conventions in source code should be as descriptive as possible, and should avoid unnecessary abbreviations. Taking from Microsoft's own Best Practices Manual “Code Complete” by Steve McConnell, which has pertinent questions regarding naming decisions:

  • Does the code avoid abbreviations that save only one character?

  • Are all words abbreviated consistently?

  • Are the names pronounceable?

  • Are short names documented in translation tables?  ... and more.

The argument to save bytes in modern computers is moot nowadays as RAM memory is abundant  in addition, the file will efficiently compress  repetitive elements. Therefore there should be no justifiable reason for the readability trade off as of yesteryear.

This issue betrays the source of the specification in that it is derived directly from a single implementation by Microsoft. This is why we can see the inconsistent and programmer centric  naming conventions which really should not be in a modern and progressive and descriptive international standard. The naming conventions used in EOOXML contradict Microsoft's internal Best Practices.

Readability is an issue in ISO standards as the audience is in essence international, where English would not be the native language. Abbreviating or obscuring the meanings of certain words which English speakers would take for granted, would be hard for a non English speaking programmer of a different background and culture to deduce and translate. This additional barrier is unnecessary and should be resolved if this specification is meant for the international audience.

The inconsistent and contradictory naming conventions within the different sections (WordprocessorML, SpreadsheetML, VML) and even within Attributes, Parent and Child Elements will cause considerable confusion, frustration and impede technical usage of this draft specification.

Until there is a consistent naming convention of the XML Elements and Attributes throughout the EOOXML specification, this specification remains a reference document at best and is far from a usable international standard.

The effect of this is that it contradicts the spirit of XML and SGML (ISO 8879) which is to promote readable structured document interchange.

yk

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/686627/7552543

Listed below are links to weblogs that reference OOXML has poor XML Element names:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

You should read abit further!

4.4 Consistency of documents
In order to achieve the aim of consistency within the complete corpus of documents published by ISO and IEC, the text of every document shall be in accordance with the relevant provisions of existing basic documents published by ISO and IEC. This relates particularly to
a) standardized terminology,
b) principles and methods of terminology,
c) quantities, units and their symbols,
d) abbreviated terms,
e) bibliographic references,
f) technical drawings and diagrams,
g) technical documentation, and
h) graphical symbols.


(a) to (d) is obviously wanting in ecma 376

It is interesting to read the ISO Directives, Part 2 "Rules for the Structure and Drafting of International Standards", section 4.3, where it sets out the requirement for avoiding this type of inconsistent use of terminology:

"Analogous wording shall be used to express analogous provisions; identical wording shall be used to express identical provisions.

The same term shall be used throughout each document or series of associated documents to designate a given concept. The use of an alternative term (synonym) for a concept already defined shall be avoided. As far as possible, only one meaning shall be attributed to each term chosen."

Post a comment

If you have a TypeKey or TypePad account, please Sign In

Welcome to
Open Malaysia blog!

  • Bloggers @ Open Malaysia
    We are a group of individual bloggers working to build openness in Malaysia's ICT culture. Most of us have day jobs and a couple of us are students. Those with a job work for companies ranging from large international enterprises to self-run Malaysian start-ups.
    Email us at this address:
    open -AT- openmalaysiablog -DOT- com

Disclaimer...

  • We declare our independence of opinions from our employers, institutions, associations and clients, past and present. Thoughts and expressions in the Open Malaysia blog are rightly each blogger's own and each of us stand by what we individually write. Views by readers who post comments and others whose writings we link to in this blog are theirs.

April 2008

Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30      

Subscribe to this site
- FeedBurner Feed

Subscribe to this site
- email alert options

Your email address:


Powered by FeedBlitz

Enter your email address:

Delivered by FeedBurner

Blog powered by TypePad