MY-0006 - Percentages
Well, if Doug Mahugh wanted a technical discussion, all he needed to do was email me, and I could have clarified many things with him. He then wouldn't need to have to travel all the way from Seattle to KL to make a grandstand at PIKOM, nor would he need to have to bend his ethics to fabricate a business card just to be in a meeting he wasn't invited to, nor would he have needed to deal with the international backlash which he is facing now. Oh yes, and he wouldn't need to avoid all the pertinent questions (internal and external) this poorly planned 72 hours in KL has raised.
Nor would he have to "burn the bridges" which Microsoft has built over the many years and have now been obliterated ("dihapuskan") with Malaysian Government agencies and Industry associations.
Quite a pity. Microsoft Malaysia wanted something to shout about, and boy did they get something! What were they thinking? The management team will sure have a lot to answer to.
Anyway, here is my take on one item which Doug raised in his first post dedicated specially for me.
MY-0006 "Percentages"
I blogged about this more than a year ago, during the contradiction period. It was one of the few items which jumped out at me in OOXML when I was flipping through it for "inspiration". The blog post was entitled "Use of Percentage as a measurement unit in MSOOXML is inconsistent." So I was happy that this item was heard at the BRM.
Malaysia submitted this item as a comment at the Sept07 vote.
Ecma had 5 months to work on the 3000+ comments and ultimately 1027 unique responses. On January 14th 2008, their response was this:
I don't exactly know what planet these guys live on, but this response is completely unacceptable. "If its decided that a string datatype is desired .."? Excuse me, Mr Ecma, but isn't XML just a whole long list of strings? What you do with this string "datatype" is to parse it and convert it to the "integer and floating-point values" which is how a normal developer would proceed! Talk about bad excuse. The dog ate my homework.
[Fixed: Transcription from desired to decided. Thanks Wouter. Paragraph still applicable]
This is exactly the type of responses I had to deal with in reading through Ecma's "proposed dispositions". The quality was really low, and it didn't bode well for the confidence with other more important resolutions.
I mean, for them to delay this as "an appropriate topic for consideration during future maintenance of the spec" is truly irresponsible, considering HTML solved this 10 years ago! Crazy.
Anyway, during the BRM, Finland decided to take up the issue of "Units of measurements". One of the items, was this issue of Percentages. After a few days, they did come up with the text which I had the impression, replaces all the crazy ways of encoding Percentages, from decimals without the percentage sign, arbitrary units, fiftieths to a thousandth of a unit, with a sane way of encoding percentages, using a number and a "%" sign. (e.g 45.34% means what it means)
Fantastic.
Although they left out the enumerated types for shading (where pct62 == 62.5%), we were happy. What's interesting was that in the PIKOM meeting, where Doug Mahugh wanted to do his smoking gun, high drama presentation, Microsoft themselves couldn't even get their facts right.
In one of their handouts, it was stated that the BRM "Approved" MY-0006 to be DEFER'ed for maintenance.
In actuality, Finland raised it on the first day, and resolved (Resolution 6) it a few days later (I forget - could be Thursday or Friday - it was hectic), and Malaysia happily supported the Approval even though enum pct was not resolved.
So are we to believe voting summaries which Microsoft has passed around to various National Bodies? I would suggest making your own, and comparing the differences between these two docs.
BTW, Malaysia also reserves the right to review its votes in the BRM and a post analysis after reviewing the actual Resolutions within these 30 days. What we may have Approved then, we may now not be agreeable after careful study. There is nothing in the rules which say that just because a country approves 100% of all the dispositions, they will have to ultimately Approve the DIS come March 29th. In fact the rules are very clear about how NBs would decide on its end vote. See Section 6.7 in the BRM FAQ: There are no rules. [There is hope yet for countries like Chile and Ivory Coast]
And this is what I'm going to do now. I have reviewed the Resolutions which was passed by the BRM to see if it appropriately addresses Malaysia's concern. If you look at the actual resolution from Finland, something strange is happening. [ Open this zip, and read Response_FI-0010_percentages.doc ]
What we voted in the BRM not only means that we have to implement the crazy way of preserving percentages, but in addition, the new, clean way. In effect, the effort to implement OOXML is made harder! There is no indication of the conformance levels of these different ways of doing the same thing. As far as the text tells me, the two ways of of encoding the amount of 'b' (which means Blue) colour, can be in a thousandth of a percent, or, as a number with "%" appended!
Is this something which would promote interoperability? What happens now is that MSOffice 2007 will encode the colour as a thousandth of a percent. A new fully conforming OOXML application will implement it with "%"s. Both documents created by both applications are still considered in full conformance to the OOXML spec. But both documents are definitely NOT compatible nor interchangeable between with each application!
I know that Microsoft has the great mantra of loving choice and having more standards is a great thing. But can you tell me with a straight face that having more than one way of encoding percentages within a spec is a good thing? I think not.
One of the reasons why we couldn't make positive changes in the BRM was because it would break compatibility with existing Ecma 376 files created by one vendor's product. I personally had to wrangle with Ecma to dance around this artificial restriction, and people outside the BRM couldn't believe it. This is yet another case where the "improvements" from the BRM was specifically designed not to break MSOffice 2007, and yet give an impression that progress was made.
So Malaysia, after reviewing the resolution, will have to oppose this. If Microsoft wants to support the documents which it prematurely released because of Vista in 2007, then it's their problem by fixing their file filters. That's the risk which Doug Mahugh said back in September 2007 when I asked him what about vendors who want to implement OOXML early.
"Well, it's too early for other vendors to commit to this file format. After the BRM there may be changes to it, so it is risky, and may not make commercial sense to implement OpenXML as it is at the moment."
- Doug Mahugh, TechEd KL 2007
Finland could at least have indicated which percentages are "transitive" and which are "strict" I guess they wouldn't have known, because the Canadian Conformance Clause was being worked on in parallel and only resolved end of Thursday. Which highlights why the BRM is too short a time to make these changes.
So Doug wants to know why Malaysia is so suspicious about OOXML as it stands today? He doesn't need to look far. Read the specs, read the resolutions and come back to be on whether "4920" is a better way to write "98.4%" in a strict OOXML document.
So to summarise:
- Ecma's proposed dispositions are poor in quality
- Resolutions made in the BRM may not address the concerns of NBs who have raised issues
- Ecma is resistant to change which would break Ecma 376
- Resolutions may not have had the time to harmonize amongst themselves (in this case Finland labeling which method is transitive and strict)
- Two ways of doing things in a spec means that two conforming documents may not be compatible
- BRM was too short a time for a thorough review
- OOXML is becoming more of a Frankenstein than it already was
yk




Hail OpenBIFF!
So we now have from several sources "close to MS" the following claims about the relation between the design of DIS29500 and XML:
- Readability is unimportant
- Ease of parsing is unimportant (Rick Jelliffe even suggested to add bit operations to XSLT)
- Tag names are unimportant (but no translation is needed)
- Tag and attribute names do not have to be consistent
- Tag and attribute names do not have to be unique
- Escape characters, ie, 0-31, are included (see Rob Weir's blog). That is, XML 1.0 is irrelevant.
- Other XML standards are irrelevant (eg, MathML, Vector graphics)
- The same semantics, eg, text color and alignment, can be implemented in as many ways as there are coders
- Values must follow intel X86 and C/C++ types
- Units of measurement and their formats are only for internal use and have no counterpart in the rest of the world.
This is just a selection.
DIS29500 is a incomplete and incorrect description of BIFF, nothing more. XML was never necessary at all. It was only inconvenient.
So it is clear, we are not talking about OpenXML, but about OpenBIFF.
Winter
Posted by: Winter | Wednesday, 26 March 2008 at 05:02 PM
Look, let's start by getting one thing clear. An XML document, whether stored in core or fetched over the network, appears to the parser as a stream of bytes. A convenient layman's term for a stream of bytes is 'a string'. So yes, every XML document is 'a string'. But of more significance is its parsed form, which is, in LISP terms, an S-Expression, which is to say a list of atoms and lists; or more formally an acyclic directed graph of nodes.
Secondly, there is nothing in the least bit difficult about parsing '99.999%', and it is in no way more difficult to parse '99.999%' than it is to parse '99999'.
There is nothing inherent in the bytestream for '99999' which says that it represnts the predecessor to one hundred thousand, that is an interpretation made by a parser. The representation '99999', just like the representation '99.999%', is for the convenience of humans, not machines. And one is more convenient to humans than the other.
Secondly, the suggestion that the rounding error between the floating point representation of '99.999%' and its BCD representation might be significant in page layout is simply specious. On a screen as wide as a cinema screen, this difference would still be below the acuity of a human eye, let alone below the limits of any forseeable technology to render.
Posted by: Simon Brooke | Wednesday, 26 March 2008 at 03:50 AM
Wouter,
Thanks for your detailed answers. I appreciate your input, and I dont mean any disrespect if I seem to have misunderstood or disagree with your views. I do not work with XML 100% of the day like you do, and your expertise is certainly valuable.
Here are some issues which I think you may need to clarify with me so that I can fully understand you. You say: "stating that XML is ‘just a long list of strings’ is quite a few steps outside of reality ... Are you reasoning that XML should only support one data type? Type string?"
You have misunderstood my statement as what the Ecma proposed disposition said. I am NOT saying that the percentage datatype should be type string, but its representation in XML, as with all other units are at the end of the day, a series of Unicode/chars in an XML file. Which comes to your next point "Validation!". What do you have to do BEFORE you validate? You will need to load the XML (strings) from file into memory, then PARSE these strings and create proper machine representations from the description (XML) of the data (percentages/floats/strings/etc).
So you may have skipped a step there.
Additionally your XSLT kungfu is quite impressive. Even I (XSLT newbie), understand what you are saying. However in the real case, the selection of value depends on the precedence that "bestFit" overrides "dxa", and when a "%" is present, then its considered a Percentage with a Sign (see Resolution 6 please! link above). Do you think that this:
[xsl:choose]
[xsl:when test="@type='bestFit'"][xsl:value-of select=’-1’]
[xsl:when test="ends-with(@val,'%')"][xsl:value-of select=’ConvertPctWithSignToPct(@w:val)’]
[xsl:when test="@type=’dxa’"][xsl:value-of select=’ConvertToDXA(@w:val)’]
[xsl:when test="@type=’pct’"][xsl:value-of select=’ConvertThousandthToPct(@w:val)’]
[/xsl:choose]
is better and simpler than the one liner:
[xsl:value-of select=’ConvertToSize(@w:val)’]
where you write ConvertToSize just once to parse all the different sizing units: "bestFit" "10.3%", "355px", "132pt", "2.5x" etc, which is reusable and written once. I think its far better than the overly complex xsl:choose statement, especially if there are other elements within the XML. After all, CSS uses it all the time, so its not entirely alien nor hard to implement (I would think).
[My understanding of XSLT could be completely wrong, so please correct me if that is so.]
===
>> OpenDocument Essentials book
> Well that is just dandy, that a spec actually doesn’t
> define any samples so I need to reach for an external
> book to find stuff out. The book isn’t normative is it?
I believe I did note that the book was non-normative. Isnt it good that its non-normative? Its like reaching out to use your book for examples and explanations, isnt it? Isn't your book non-normative as well?
===
Questions to you Answers to the Questions.
You didn't answer questions 1,2,3,4,6,8. Not bad, its better than the Ecma response.
1 and 3 are deliberately the same, because I wanted to see how you handled two ways of doing one thing. And you chose the correct way, which is to ignore the redundant one. However now although you create a conforming answer sheet, your answer sheet may not be interoperable with with other answers which requires the other answer to be answered.
So you sort of answered the question in that one way, which is, "One way is the best way of doing things." Thanks.
For Answer #2, I really think you should look at all the Resolutions approved by the BRM, because, as the convenor says, this is the editing instructions which form the Final Text. These are the instructions which you will have to base your next version of your book on.
I suggest that you look at all the resolutions which has been approved in the BRM, so that you can comment more constructively here.
For Answer 4, 6 and 8, I dont see how you can not have your own independent opinion on these important matters. I also do not appreciate that you can 'hand-wave' away my questions to someone elses 'real' questions. Please show a bit more respect, and I will certainly reciprocate.
For Answer 5 (and 6), you may think its irrelevant, but I think it is very relevant. The fact that you dont have a Final Text also means that other National Bodies, too, dont have final text to mull over and approve. Additionally, the TC's in NBs are not as experienced and as well versed as you are, with regards to OOXML (not many people have the expertise to write books on it.) Their interpretation of the OOXML text, modified by the BRM may vary significantly. Do you think that it is responsible for a Nation to Approve something they cannot see/touch/smell?
For Answer 7, I have to disagree with you to a point. I dont think its fair that DIS 29500 should be shackled / hindered / restricted by the commercial interest of releasing MS Office 2007 back in 2007. It was their business risk to release a product based on a format which they knew would morph during the ISO process. If you believe so, then as time goes by, DIS 29500 will become more and more of a Frankenstein than a Fairy Godmother.
For Answer 8, its relevant, because I would like to know what you think of certain people's actions in their desperation to influence National Bodies. It was a hypothetical question, and its a test of your ethical values.
For Answer 9, I would like to respectfully disagree with you that its "intuitive to the max". As highlighted and underlined by the examples above.
You have only answered 5 questions out of 9. Please try again, and we shall see if we can get to your other questions.
=====
Now nksingh and Wouter,
I have noticed something. You two are bickering about the fact that my suggestion of having Percentages encoded as "34.25%" is 'silly' and just another 'nice-to-have' feature. Unfortunately for you two, that is not the case. Other countries too find it necessary to have modern and readable ways of encoding Percentages.
Thats why Finland proposed the resolution. So the issue here is not this squabbling on whether [type="pct" val="3023"] is better than [val="60.46%"] NOR is it about IEEE Floating point issues, NOR is it about "readability". Its about what we do with these TWO different ways of encoding percentage, which has been "ratified", "approved", "written in", "etched in stone", in the OOXML spec because of the decisions made in the BRM. It has so been decided. Back in February.
If you do not believe me, then check the Resolutions, given by the link I provded earlier.
So, can you two please stick to the topic, and discuss if this change is a "Good Thing" and whether this should be the pattern of how the development of OOXML should proceed in the future:
"If we don't like something, we will add our preferred way, AND still retain the bad way"
I personally think the Finland resolution is not thought through carefully with regards to the conformance clause proposed and resolved by the Canadians, late in the BRM. I also think that the Scope is clearly defined as the formats of MSOffice 97 - 2008 inclusive (but excludes OpenXML formats from that product line) [Resolution 19/20], thus it we should be allowed to break compatibility with Ecma376 as we deem fit, but list the approproate change details, as decided in Resolution 21.
Read the BRM Convenor's Edited Notes for this information.
Regards nksingh and Wouter,
yk.
Thanks John, and DarkPheonix for explaining issues which I would not be able to verbalise so concisely.
Posted by: Yoon Kit | Tuesday, 25 March 2008 at 04:32 PM
> The claim that XML is a
> structure that always needs to
> focus on human readability is
> just so way of with the real
> world, it is not even funny.
Mind if I do some quoting here?
"6. XML documents should be human-legible and reasonably clear."
"10. Terseness in XML markup is of minimal importance."
Know where those quotes come from? XML 1.0 Recommendation, published by the World Wide Web Consortium. Specifically the section titled "Origin & Goals". Link is here: http://www.w3.org/TR/2006/REC-xml-20060816/#sec-origin-goals .
The purpose of XML has always been human readability; there are far more efficient ways of structuring data for understanding by computers. It is MICROSOFT that claims otherwise, mainly because they prefer using XML as a technique for obfuscation.
Yoon Kit is right; ultimately, XML is simply a bunch of strings stuck together. The specification and/or the application determine what the strings mean; schema validation is simply a method of enforcing what the computer wouldn't be able to understand from simply looking at the entry. You're arguing as if the specification should leave this sort of information out because the application already knows. Yeah, but that's the whole point of having a spec; if it's not in the spec, reverse engineering becomes necessary.
> C# versus VB
Are you mainly a Visual Studio programmer or something? There are better languages on this planet.
And calling this mess "Open XML" is Newspeak. It's not open, and whether it's XML or not is debatable.
> wouldn’t you feel that
> having Open XML defined as a
> clear standard is better for
> business than not?
Clear standard, yes. If you think this is a "clear" standard, I've got a bridge to sell you in San Fransisco.
I hate how all these Microsoft apologists act as if the rest of the world should be grateful that Microsoft might possibly listen to their opinions before doing whatever they want. Microsoft is not God, and the rest of the world can get along just fine without them. Besides, the last time Microsoft asked for opinions, they ignored them all. I know because I actually participated then; I filed bugs and problems with Internet Explorer when asked. None of the responses I gave were even paid attention to.
Posted by: DarkPhoenix | Tuesday, 25 March 2008 at 01:03 PM
John,
With integer-based fixed-point math, there is only one valid answer. A machine and runtime system wouldn't be able to conform to the C specification if this werent the case. The same cannot be said for floating point support.
This is getting pretty esoteric, though. There are good arguments for using fixed point just like there are good arguments for using the other method. Either way, it's not that hard to implement what is in the spec, and changing this would be a gratuitous and unnecessary change to the format just to satisfy someone's opinion rather than some real customer requirement.
Posted by: nksingh | Tuesday, 25 March 2008 at 09:02 AM
nksingh,
Ah. Different chipsets may round differently no matter the format provided (this is why conformance details are useful in specs, how much one can deviate..). However the request here is to have a saner storage format.
Posted by: John Drinkwater | Tuesday, 25 March 2008 at 01:57 AM
John,
I meant consistency between different machines, not between different parts of the documents. Different machines might round slightly differently (some might be entirely missing floating point and would have to emulate it).
Posted by: nksingh | Tuesday, 25 March 2008 at 12:20 AM
nksingh:
You said “The reason to restrict oneself to fixed point decimals of a specific size is for consistency.” Have you not read the whole spec? Consistency? LOL. Please don’t defend poor design decisions with the reason of consistency, it has been shown Office Open XML is one of (if not the) least consistent specs ever.
Wouter:
The advantage of simplifying width attributes to _one_ attr, rather than 2, 3, or even 4, is ease of processing. It also helps human readability, but it means your processing code doesn’t have to have a scrambled mess of cases for each of the randomally named attrs when it comes to find out how an element is to be rendered.
As for your “Arabic Percentage” comment, the OOXML schema is wholy in American English, so don’t try to pull that one. You might has well suggest we support “,” as the decimal point for some European countries…
The language conveyed in the document should not have to dictate the storage format (though the locale/language should be correctly labelled).
YK, your
w:bottom w:width="15.1pt"
w:bottom w:width="60.46%"
w:bottom w:width="auto"
point is absolutely correct, I can’t imagine how engineers could disagree with that… it also increases ease of extending the spec.
(lets say, to use mm, or em, or a future unit, pife etc)
Posted by: John Drinkwater | Monday, 24 March 2008 at 10:21 PM
Hi Yoon Kit,
I am afraid you are way of yet again. To me stating that XML is ‘just a long list of strings’ is quite a few steps outside of reality. It would even be better to claim that XML is just a long list of ones and zeros, and say that Open XML is a binary format just as the old one (note that I don’t think this is true). Are you reasoning that XML should only support one data type? Type string? I think you are missing one glaring and obvious detail. Validation! You will want to express that a value is an int, not a sequence of numbers encoded as a string. That is why you have XSD and RelaxNG. Validation!
> [w:bottom w:width="60.46%" versus w:bottom w:type="pct" w:w="3023"].
Sorry, but totally irrelevant again. C# versus VB, Open XML vs ODF. The claim that XML is a structure that always needs to focus on human readability is just so way of with the real world, it is not even funny. For human readability, your encoding works better, true. But the fact of the matter is that usually the markup is not read by humans, but by applications, and the Open XML method works better for that (less string parsing, more string comparison which is factually easier to do). One more issue is that perhaps not all countries like using the % sign, how about Arabic? How about XSLT? Lot easier to say ‘[xsl:if test=@type=’dxa’][xsl:value-of select=’ConvertToPct(@w:val)’] than to parse the XML manually. Again easy peasy for developers.
>If OOXML was really a good standard, it should take care of its units at least to make it user friendly on a developer and end user level
This is fundamentally incorrect, you cannot please everyone, and it is probably best to focus on the way most will use Open XML, to develop applications. Why the tirade against Open XML because of these non-issues? IF you would really be looking at Malaysian best interests, wouldn’t you feel that having Open XML defined as a clear standard is better for business than not? I can forgive you for not using it based on missing percentage signs. BTW, end-users really are blissfully un-aware of all these discussions. My dad or friends don’t care a single bit about the internals of a document format.
>Its in the OpenDocument Essentials book: http://books.evc-cit.info/OD_Essentials.pdf
Well that is just dandy, that a spec actually doesn’t define any samples so I need to reach for an external book to find stuff out. The book isn’t normative is it?
Ok, now for your question list, of which a few are irrelevant or not addressed at me, but I’ll try anyway. (and yes, I am a real person, look me up on google)
> 1. Do you think that having more ways of encoding percentages makes a developers life more difficult?
This question is to narrow for me to answer directly. It will probably make my life easier, depending on how the encoding works, and what it is used for.
> 2. Do you think Finland sufficiently addressed the conformance issue?
I haven’t investigated dispositions specific to Finland, nor do I think that taking them out of context for reference here brings any more to the discussion why MYs investigation and result based on the dispositions is so strange.
> 3. Would you rather have many ways of writing percentage values than just one way?
You are repeating question 1, so I’ll just step over this one.
> 4. Do you feel that the proposed Ecma dispositions (such as the one above) was high in quality, and addressed the countries concerns satisfactorily?
Go and start asking real questions: http://www.oreillynet.com/xml/blog/2008/03/critical_questions_for_nationa.html
5. When will you work on the next version of your book - can you visualise the changes made at the BRM already? Do you have the Final Text?
Why is this a relevant question? The book was drafted on the ECMA spec, and the book will in most likelihood be updated for the ISO version. It is totally out of focus for the discussion of MY responses to the disposition like MY-20, which you haven’t addressed.
But no, I do not have a Final Text.
> 6. Would you expect NBs around the world to be able to visualise the post-BRM OOXML text and Approve on it in its entirety?
http://www.oreillynet.com/xml/blog/2008/03/critical_questions_for_nationa.html
> 7. Do you think that DIS29500 should be backward compatible with Ecma376?
Are you asking me if I am a C# or VB developer? I think backwards compatibility is of extreme importance.
> 8. If your company was explicitly not invited to a meeting, would you fabricate a business card of another organisation to attend it?
Again a question outside of the discussion. Take this up with Doug. It wasn’t me attending with that card was it? Why are you asking me this, and not focusing on the real issues?
http://www.oreillynet.com/xml/blog/2008/03/critical_questions_for_nationa.html
9. Why is OpenXML so deliberately non-intuitive?
Again strange, because I find it intuitive to the max. I get so much explanation, my head hurts. It is just that you like to get up in the morning and start writing documents by opening Notepad (eh… vi) and start typing XML tags from what I am reading, while I like to use an actual editor.
How about you explain MY stance on MY7, MY 8, MY 10, MY 11, MY 12, MY 13, MY 17, MY 20, MY 22, and MY 23?
Posted by: Wouter van Vugt | Monday, 24 March 2008 at 05:41 PM
YK,
The reason to restrict oneself to fixed point decimals of a specific size is for consistency. The sensible thing to do when parsing the fields of XML is to convert them to the expected type in machine memory, be it float or integer.
I could imagine you may be able to parse 95.4 as a fixed point decimal, but doing this would require a separate parser from the normal float parser and a naive implementer of the spec is likely to not realize this subtlety and get it wrong. For colors this would hardly make a big difference, but for properties which require alignment or for anything else where layout is detemined by the precise location of an item, using a floating point number could result in poor interoperability (text reflowing in the wrong place or items being horribly off on the page).
This isn't just a limitation of modern floating point units or something like that which can be fixed. It's a mathematical reality of trying to represent things in binary, just like you can't represent 1/3 in decimal (but you would be able to in a 'ternary' format). If you're willing to limit the potential precision of your units, you can avoid this pitfall.
The '%' symbol is just an opinion. Reasonable people can disagree on whether it is necessary and, strictly speaking, no competent developer will have a problem parsing the text with or without the '%' symbol. We're talking about 2 extra lines of code to separate the symbol from the number or 1 minute looking at that section of the spec to realize that the number is a percent if it's not obvious already.
Posted by: nksingh | Monday, 24 March 2008 at 04:43 PM
Hi Wouter,
Its nice to have the author of the much desired OpenXML book (especially in MS conferences) comment on this post.
[ http://www.openmalaysiablog.com/2007/09/microsoft-tech-.html (see comments) ]
> where you take a sentence, edit a single word to make it
> silly and then go spend an entire paragraph on that
Let me first apologize for transcribing the extracted text wrongly. Its a good thing that the actual text is displayed as a screenshot just above my mistake. I have fixed it accordingly. Fortunately for me however, my gripe still is valid.
Would you agree that all XML is a long strand of a string type? So there is no "deciding" on whether we should represent %'s as strings or not. They are strings by default. In no way does the MY comment suggest how to represent variables types in memory for application use (i.e. malloc an array of char for percentage types - thats just ridiculous). So the Ecma disposition misunderstands what we want, and uses that as an excuse.
So my entire paragraph still stands. I still think that the response is silly.
> w:zoom w:percent="71"
> How does adding a percentage sign in here help me as a developer one little bit?
Well, this is the simplest case, and the attribute name is 'percent' so its quite self explanatory. Well done for pointing out the easiest one. However, why the strange redundancy? 2.15.1.95 details it:
w:zoom w:val="bestFit" w:percent="90"
It says that if val is present, then percent is ignored. How about something more elegant like this:
w:zoom w:factor="bestFit"
if we need best fit zoom or
w:zoom w:factor="90%"
for a percentage scale or
w:zoom w:factor="2.5x"
if we ever wanted to have camera type zooms. But you see the point, why have more attributes for the different units rather than having a properly named attribute which is more intuitive and HTML like?
> Where ‘pct’ identifies a value in fiftieths of a percent.
> You are probably arguing that a ’50.34343434343434%’ should also be acceptable?
No, Im arguing that encoding it in 50th of a percent is NOT acceptable. Why the need for this confusion? Why must it be 50th of a percent? Because the designers of OpenXML was too lazy to do a quick conversion from binary to this pseudo-XML? So they get the easy way out, while future developers have to worry about this?
> ‘auto’, ‘dxa’ and ‘pct’.
Now that youve brought up 2.18.97, why is it so un-intuitive?
w:bottom w:type="dxa" w:w="302"
w:bottom w:type="pct" w:w="3023"
w:bottom w:type="auto"
Why can't it be something like
w:bottom w:width="15.1pt"
w:bottom w:width="60.46%"
w:bottom w:width="auto"
Doesnt that make things so much easier for the eyes and brain? Please explain, and provide a good reasoning.
> I think the entire comment is quite non-useful.
> It is like VB versus C# and it hits the core of two camps.
I dont think you can trivialise this argument as such. If OOXML was really a good standard, it should take care of its units at least to make it user friendly on a developer and end user level.
> (forgive my small rant at ODF: where’s the markup samples??)
Its in the OpenDocument Essentials book: http://books.evc-cit.info/OD_Essentials.pdf
where non-normative text is placed, so it keeps the main spec clean.
Im not sure why OpenXML folk keep bringing up ODF whenever they are queried. You are falling back on the argument that if X did Y, and Y was wrong, then Z can do W! Thats rather childish.
Additionally, I do not appreciate your personal attacks. Unless you are Doug Mahugh in disguise, I dont understand why you should be so vindictive in your posts. Yes, I am NOT a full time developer (anymore, anyway), but please don't be so presumptuous and patronising.
If you think that Percentages is "no big a deal", then why did Finland and other countries try to fix it?
Before you jump the gun and ask me more questions, I think I deserve some answers from OpenXML experts such as yourself first.
1. Do you think that having more ways of encoding percentages makes a developers life more difficult?
2. Do you think Finland sufficiently addressed the conformance issue?
3. Would you rather have many ways of writing percentage values than just one way?
4. Do you feel that the proposed Ecma dispositions (such as the one above) was high in quality, and addressed the countries concerns satisfactorily?
5. When will you work on the next version of your book - can you visualise the changes made at the BRM already? Do you have the Final Text?
6. Would you expect NBs around the world to be able to visualise the post-BRM OOXML text and Approve on it in its entirety?
7. Do you think that DIS29500 should be backward compatible with Ecma376?
8. If your company was explicitly not invited to a meeting, would you fabricate a business card of another organisation to attend it?
9. Why is OpenXML so deliberately non-intuitive?
Once you answer 98.4% of these questions, I will then entertain more questions from you.
Regards!
yk.
Posted by: Yoon Kit | Monday, 24 March 2008 at 09:41 AM
Hi Yoon Kit,
I would like to respond to some of the technical details of your discussion of MY-6, because your story contains some glaring omissions in detail, scoping it such that you can write about missing percentage signs and try push the story that Open XML is a bad standard.
First of all your response to the proposed change, where you take a sentence, edit a single word to make it silly and then go spend an entire paragraph on that:
"If its desired that a string datatype is desired
The text actually says ‘If it is *decided* that a string datatype is desired’. E.g. we don’t see a good place for it now, but perhaps in other places in the future. Not that hard to understand when you examine the spec based on the locations mentioned in the MY – 6 text, which you can of course find on www.dis29500.org.
The first part of the spec referenced is 2.15.1.95: Zoom. The markup sample displayed for that setting (forgive my small rant at ODF: where’s the markup samples??), is as follows:
w:zoom w:percent="71"
How does adding a percentage sign in here help me as a developer one little bit? You claim to be a developer, so you can at least feel with me when I say that parsing this piece of content is easier without the percentage sign than with it. Parse an int, or parse text and substring to the int, then parse it. Hmmm. Now for readability. Does ‘percentage’ as the attribute name ring a bell to it being a percentage? If this the quality of this blog post is a reference to the quality of your work relating Open XML I can only hope that there are more capable and less biased (anti Microsoft?) people on the committee.
The second item referenced by MY -6 is 2.18.97: ST_TblWidth. This simple type is used to indicate the width type of various things such as margins and spacings (which you can easily find out since the spec references all parent elements, and the places that a simple type is used). You can have width such as ‘auto’, ‘dxa’ and ‘pct’. Where ‘pct’ identifies a value in fiftieths of a percent. You are probably arguing that a ’50.34343434343434%’ should also be acceptable? First of all it is funny to me that you think that this super high amount of accuracy is useful. Fiftieth of a percent is more than enough for me personally, but perhaps you enjoy setting table margins to ’50.0123456789%’ instead of ’50.01’, whatever makes your day.
To me this is not something to be overly stressed about. It is defined based on legacy documents (ODF as well, isn't it?), but a total non-issue in the larger discussion.
As a side note, I think the entire comment is quite non-useful. It is like VB versus C# and it hits the core of two camps. MSDN magazine camp versus Raymond Chen camp[1]. Open XML versus ODF. If you want a file format which is not backward compatible with the existing corpus of documents, and is slightly under-documented and sometimes difficult to work with[2] choose ODF, if you want it the other way around, use Open XML. I wish that the discussions could focus on the real questions that you should be asking yourself, and not block the process by these ill-formatted and explained responses to the proposed dispositions. Formats are not perfect, ODF is not perfect [3], heck it even darn difficult (whitespace anyone?).
Now that I have you on the line, and you were so kind to display the MY votes to the dispositions I would like to discuss a whole lot of other items I find. Take MY 20 for instance. There was an attribute which could contain app-defined markup for equations. That is being commented on. Disposition is to add the content-type of the equation so that it is no longer undefined, people happy, but somehow MY votes disapprove.
Wouter van Vugt
[1] http://blogs.msdn.com/oldnewthing/archive/2003/10/15/55296.aspx
[2] http://blogs.msdn.com/oldnewthing/archive/2003/10/15/55296.aspx
[3] http://blogs.msdn.com/brian_jones/archive/2008/03/20/out-of-time.aspx
Posted by: Wouter van Vugt | Monday, 24 March 2008 at 07:43 AM
Yoon Kit, +1, well said.
As for “I mean floats do have their rounding problems”, mhmm, that’s an issue with current CPUs and their faulty maths, no reason to cripple our formats because of this — one would hope in the future this issue goes away and we can enjoy our accurate 99.99999999% numbers stored in ODF…
Posted by: John Drinkwater | Monday, 24 March 2008 at 02:54 AM
Hi NK Singh,
Thanks for commenting again.
> 0.01% of people who will actually crack the OpenXML file
Although 0.01% will crack open an XML file, 100% of the developers creating OOXML conformant applications would be completely confused by the 4 different ways of encoding percentages, and the same 100% would breath a sigh of relief when they see the familiar "%" sign.
> A % sign is a waste of time to write or read in these circumstances
Ah, if you have been a long time follower of this blog, I actually did a study on how long we 'waste' with these pesky symbols which makes readability better.
"Will readability hinder performance?"
http://www.openmalaysiablog.com/2007/06/will-readabil-1.html
I wrote a little program to do just that. I didnt optimize the code, and with the results, to read 5.5 million percentages with a trailing "%", it cost us 3 seconds more than without the "%". This is without fixing the inefficient sscanf function. Im sure it can be faster. But the point is, the upper limit is 3 seconds.
Is it really that much of a sacrifice? I dont think so. Gimme the standardised Percentages anyday, and Moore's Law will solve that problem.
> At most this is a 'nice to have' feature
I would argue against this. The purpose of XML is so that its readable by humans. The more we compromise on this, the more unreadable and unusable the spec becomes.
> There's actually a really good reason to use fixed point numbers.
Does that mean we can throw away our FPU's today? Im not too sure NK, I mean floats do have their rounding problems, but its not something which should restrict a file format and its representation, right?
So for the case of encoding it in a thousandth of a percent, what happens if we need 4 decimal places? 5.4324% How will this be encoded in OOXML? 5432.4? No it will be rounded down to 5432 which when loaded up again, become 5.4320. Which is worse. the rounding errors of floats/doubles or the rounding errors of integers?
Worst still, the maximum accuracy for percentages is to three decimal places. Why restrict an application of percentages just because of a legacy file format?
Additionally, what happens in the future when we have 128/256 bit computers where the default floats are just as good as decimals?
So I would disagree that a modern XML file format should dictate how you should store variables in memory. Instead, it should be abstract enough to just represent the data.
What do you think?
yk.
Posted by: Yoon Kit | Monday, 24 March 2008 at 02:22 AM
Also yk,
Did you know that if, for instance, you wanted to represent percentages to two decimal points that only five numbers can be accurately represented as a floating point type for any 1-percent interval?
65.00
65.25
65.50
65.75
66.00
There's actually a really good reason to use fixed point numbers.
Posted by: nksingh | Monday, 24 March 2008 at 01:48 AM
YK,
I don't think this is a particularly fair comment. These XML files are meant to be read by a program or, during development, a human with a spec. The values are quite well-defined in the spec and it's not particularly onerous to read it. A % sign is a waste of time to write or read in these circumstances (for the 0.01% of people who will actually crack the OpenXML file). At most this is a 'nice to have' feature and does not particularly impede someone implementing OpenXML. If people really care about it, this can be fixed in maintenance.
Posted by: nksingh | Monday, 24 March 2008 at 01:43 AM