« ODF Alliance Newsletter 28 June 2007 | Main | INCITS denies OOXML approval in the United States, Portugal Subverted »

Tuesday, 10 July 2007

Mathematically Incorrect

Rob Weir writes about the inadequacies of the Microsoft Office Open XML (MSOOXML) Formula definitions in his recent post: "The Formula for Failure". He highlights the loud bragging of Microsoft supporters and employees of how the MSOOXML is superior to the Open Document Format (ODF) as MSOOXML includes a 324 page "Formula Definition".

In this post I will cover this issue in further detail, and highlight the arguments we are getting from  anti-ODF vendors here in Malaysia, in the realm of the standardisation process. The complaint is as follows:

ODF Version 1.0 lacks the support for Formulas, hence is not relevant for Malaysia where its citizens and businesses uses spreadsheet extensively.

Interoperability test between OpenOffice and KOffice, both supporting ODF fails. [Example: "oooc:=SUM([.A1];[.A3];CEILING([.A4];3;2)) + LOG([.A1]+10;10) - SQRT([.A4]) ]"

Although I have already covered this issue on the "grammar" for formulas, as the old newsforge article on ODF interop with OOo and KOffice, the anti-ODF vendors seem to want to leverage on the fact that MSOOXML does have a "Formula Definition" and is willing to use this as a major advantage.

They weren't banking on the event that if we were to study this issue a little deeper, and their "asset" may turn out to be a huge liability on their efforts to make MSOOXML an ISO standard.

User Interfaces and Help files

While it is true that ODF v1.0 does not include a "Formula Definition", it is also true that throughout the entire history of spreadsheet usage, there has been no "Formula Definition." Although this has hindered interoperability to a certain extent, people got by using whatever information they had available to them.

The most immediate form of information is via the Graphical User Interface (GUI) Wizards when it comes to Formula building. This is a screenshot of OpenOffice.org's "Function Wizard" which is similar to most modern spreadsheet features:

Screenshot070121openformula_openoff

Please note the descriptions of each of the parameters for each formula. When the user clicks on the "Number" parameter, the explanation on screen changes to "Number(required) The number to be rounded up." and when "Significance" is focused, the on screen help changes to "Significance(required) The number to whose multiple the value is rounded." So this user friendly wizard contains the necessary requirements and describes in detail the parameters for the function.

Additionally, if we were to press "F1" to call up the Help file, in OOo, we get something like this:

Ooohelpfile

This Help file is interesting not only because it reinforces the implementation in the Function Wizard, but it also gives information on why certain parameters are required, and provides the justifications for them. It also provides some examples and expected results. The story about the third parameter will be covered in the later part of this post.

Now if we were to compare the Help file entries of other spreadsheet apps, we can see how well this function definition is entrenched. In KOffice:

Kofficehelp

In Gnumeric:

Gnumerichelpp

And last but not least; in Microsoft Excel 97:

Msofficehelp

Now this is information which people have used throughout the 20 years of spreadsheet existence. It would be interesting to find out what exactly is so revolutionary in MSOOXML which the Microsoft supporters are rejoicing in!

Here is the "CEILING" definition from the Ecma 376 / DIS 19500 specification (Part 4, Section 3.17.7.33):

Ecmadef

Compare the MSOOXML with the Microsoft Excel 97 definition. You would be hard pressed to find any new information in the Ecma specs over the Help file provided by MS Office 97. Interestingly, the American flavoured example on nickels and dimes, and the #NUM test case have been omitted.

So between April 2006 (MSOOXML Draft 1.2) and May 2006 (MSOOXML Draft 1.3), the Ecma TC45 was provided with the Microsoft Office Help Files which was dutifully inserted in the draft standard. It is no wonder then how the technical committee managed to pull off such a feat by defining the "Formula Definition" within 2 months!

Compare this to the work done by the OASIS ODF Formula Subcommittee. They started work back in October 2005 and are still working on the "Formula Definition." This is the definition of "CEILING" for ODF Open Formula:

Odf12

The thorough work by the ODF team cannot be over emphasised here. In this new definition of "CEILING" the ODF subcommittee brought together several innovations

  1. in KOffice and Gnumeric CEILING requires a minimum of only 1, and can default the Significance to "1"
  2. Microsoft Excel and MSOOXML can only have 2 parameters no more, no less.
  3. In OOo CEILING requires at least two parameters, with the third parameter for legacy compatibility

OpenFormula combines the best of all words from OOo, KOffice, Gnumeric and MSOffice where the CEILING function can accept 1,2, or even 3 parameters. OpenFormula  caters for all of these quirks in spreadsheet applications and attempts to be as inclusive for all the needs of the different applications. This is testament to the deliberations from the many vendors involved in the standardisation process.

Also look at the extensive "test cases" provided within the document itself. The OASIS team can extract these examples and automatically compile test spreadsheets to verify implementation accuracies.

Additionally, OpenFormula attempts to correct a wrong set by spreadsheets a long time ago ...

CEILING is Mathematically Incorrect

Inspect the extensive interoperability notes appended to the OpenFormula document. It details the different behaviours of applications from Gnumeric, KSpread (from KOffice), Microsoft Excel and OpenOffice.org.

For this function, there are current interoperability problems in that the evaluation of the CEILING formula is incompatible with the standard mathematical definition; Gnumeric, KSpread and Excel will give the wrong results for negative numbers.

OpenOffice.org and OpenFormula correctly evaluates the result, and additionally provides "backward compatibility" which is the third parameter where users can flag it to use the wrong but legacy type calculation.

In contrast, MSOOXML just has one definition which is mathematically wrong.

Here is the standard mathematical definition of the function "CEILING" as defined in all Maths Textbooks, and best illustrated by Wolfram Research, the creators of "Mathematica":

Mathdef

You will find that Mathematica and OpenFormula specifies "CEILING" correctly in that negative numbers; CEILING( -4.5 ) is and always should be -4. Whereas Microsoft Office and MSOOXML incorrectly specifies CEILING( -4.5 ) as -5.

It would be shameful if this mathematical error were to be ratified as an ISO standard.

Interoperability

The OpenFormula document is an important piece of work as it goes out of its way for the sake of interoperability not just between the vendors who chose to be represented in OASIS, but also to the leading de-facto vendor who chose NOT to contribute to ODF. The copious notes on the different behaviours of the different applications will help ensure that developers are aware of implementation pitfalls. It will also ensure that the irregular and legacy implementations are inclusive in the specification, to support the "billions" of documents currently in existence.

This is a significant area which clearly demonstrates that ODF is better built for interoperability for legacy documents yet maintaining mathematical accuracy for FUTURE documents. It also nicely demonstrates the collaborative efforts from multiple vendors to allow OASIS to cherry pick the best features from all contributors to create the best Open Formula yet.

So when it comes to comparing MSOOXML and ODF v1.0 on the basis of the inclusion of "Formula Definitions", it becomes clear that the anti-ODF folk have not much to shout about. In fact MSOOXML's "Formula Definition" is deficient and inaccurate. As Rob Weir pointed out, in addition to this CEILING problem, most of the other formulae are inadequately defined and not well tested.

yk.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/686627/19932952

Listed below are links to weblogs that reference Mathematically Incorrect:

» Blood Guts: News From TheArena from Opportunity Knocks
Some Odds And Ends From The File Formats Arena Does anyone else feel like they are in one of the old Roman gladiator arenas? On one side, we fight to destroy monopoly and lack of choice. On the other side, they fight against a single standa... [Read More]

» OpenXML is Really Funny (But Its a Joke That Can Cost Lives) from Boycott Novell
OOXML looks worse the longer you stare at it (or the more you read about) ... [Read More]

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

The mathematical definition includes complex numbers. Will we see those supported at all? Is there a need for them. It looks like it does CEIL(MOD(x)).

You guys are correct. I didnt notice the "negative significance for negative values" requirement.

Ill try to locate the ODF Formula subcommittee to see if it would be better to keep the significance the same sign no matter what.

BTW, the MSOOXML definition is deliberately vague on this issue. If you note the MS Excel 97 Help file, it defines

CEILING( -2.5, 2 ) equals #NUM

while the MSOOXML spec leaves the case where differing significance signs as unsaid. Did they know something, or was #NUM too application centric?

Anyway, lets see what the OASIS guys think.

Regards!

yk.

Following what Bill said (and agreeing with him)...

Surly the mathematical definition of Ceiling should apply precisely to a 'significance' of 1, even for negative values of the first argument. I don't see why you *need* a negative significance to cope with a negative argument x.

Then for s>0 you would have have the simple relationship:
Cei(x,s) = s*Cei(x/s,1) = s*Cei(x/s)

The sensible way to extend to negative values of s (if you choose to do this) is to use the above as the definition of Cei(x,s). For s=0, you notice that there's a well-defined limit as s-->0, and so define Cei(x,0)=x (not 0 as ODF does).

Your function is then mathematically sensible, and the two argument version matches the excel definition when it's defined. (Of course you would probably still want the mode flag to precisely preserve the Excel behaviour.)

The mathematical notion of ceiling doesn't involve a second parameter. So I don't think it makes sense to say that CEILING(-2.5,-1) "doesn't follow the mathematical definition". There *is* no standard two-argument ceiling function in mathematics. How do you expand the mathematical notion, explained on the Mathworld page, above to one which does deal with fractional values? There are multiple ways, but one way is to say CEILING(x,p) gives you the value k*p where k is the smallest integer with k >= x/p. I think that pretty much gives you Excel's behavior. Seems pretty reasonable to me, except for some reason Excel won't let you do CEILING(-2.5,1). If it did, then *that* would be the one I would expect to match mathemetics and return -2.

Converting from Excel to ODF, the translation would be CEILING(N,S) -> CEILING(N,S,1). The third arg in ODF is there for interoperability purposes, supporting well-known incorrect implementations.

Going from ODF to Excel, the translation might be:

CEILING(N,S) -> SUM(CEILING(N,S),IF(N lt 0, IF(MOD(N,S)!=0, -S, 0), 0))
and
CEILING(N,S,1) -> CEILING(N,S)

Somehow, I think that SUM function is what ought to be called STUPID_CEILING

Nick, when a hypothetical ODF supporting version of Excel converts a legacy CEILING call to ODF format, it will of course use the legacy flag that has been provided by the specification. For it to not to do so would be mathematically incorrect. What is your point?

I mean, of course, CEILING(-2.5; -1)

Much though I hate the idea of maintaining compatibility for Excel users, it's clear that, for interoperability reasons, CEILING(-2.5) needs to continue to be -3.
No, it's not the well-known mathematical function. Tough.
Unless you're going to convert occurrences of CEILING to STUPID_CEILING when importing from legacy formats such as Excel? And convert back on export?

again - microsoft making a mockery of the ISO process by submitting bogus and inaccurate standards just so they can keep their monopoly going.

please ISO kick ooxml out the door it came in on. just from this information here it should rejected.

Excellent post! Thank you for describing this case so explicitly.

Post a comment

If you have a TypeKey or TypePad account, please Sign In

Welcome to
Open Malaysia blog!

  • Bloggers @ Open Malaysia
    We are a group of individual bloggers working to build openness in Malaysia's ICT culture. Most of us have day jobs and a couple of us are students. Those with a job work for companies ranging from large international enterprises to self-run Malaysian start-ups.
    Email us at this address:
    open -AT- openmalaysiablog -DOT- com

Disclaimer...

  • We declare our independence of opinions from our employers, institutions, associations and clients, past and present. Thoughts and expressions in the Open Malaysia blog are rightly each blogger's own and each of us stand by what we individually write. Views by readers who post comments and others whose writings we link to in this blog are theirs.

May 2008

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Subscribe to this site
- FeedBurner Feed

Subscribe to this site
- email alert options

Your email address:


Powered by FeedBlitz

Enter your email address:

Delivered by FeedBurner

Blog powered by TypePad