Mathematically Incorrect
Rob Weir writes about the inadequacies of the Microsoft Office Open XML (MSOOXML) Formula definitions in his recent post: "The Formula for Failure". He highlights the loud bragging of Microsoft supporters and employees of how the MSOOXML is superior to the Open Document Format (ODF) as MSOOXML includes a 324 page "Formula Definition".
In this post I will cover this issue in further detail, and highlight the arguments we are getting from anti-ODF vendors here in Malaysia, in the realm of the standardisation process. The complaint is as follows:
ODF Version 1.0 lacks the support for Formulas, hence is not relevant for Malaysia where its citizens and businesses uses spreadsheet extensively.
Interoperability test between OpenOffice and KOffice, both supporting ODF fails. [Example: "oooc:=SUM([.A1];[.A3];CEILING([.A4];3;2)) + LOG([.A1]+10;10) - SQRT([.A4]) ]"
Although I have already covered this issue on the "grammar" for formulas, as the old newsforge article on ODF interop with OOo and KOffice, the anti-ODF vendors seem to want to leverage on the fact that MSOOXML does have a "Formula Definition" and is willing to use this as a major advantage.
They weren't banking on the event that if we were to study this issue a little deeper, and their "asset" may turn out to be a huge liability on their efforts to make MSOOXML an ISO standard.
User Interfaces and Help files
While it is true that ODF v1.0 does not include a "Formula Definition", it is also true that throughout the entire history of spreadsheet usage, there has been no "Formula Definition." Although this has hindered interoperability to a certain extent, people got by using whatever information they had available to them.
The most immediate form of information is via the Graphical User Interface (GUI) Wizards when it comes to Formula building. This is a screenshot of OpenOffice.org's "Function Wizard" which is similar to most modern spreadsheet features:
Please note the descriptions of each of the parameters for each formula. When the user clicks on the "Number" parameter, the explanation on screen changes to "Number(required) The number to be rounded up." and when "Significance" is focused, the on screen help changes to "Significance(required) The number to whose multiple the value is rounded." So this user friendly wizard contains the necessary requirements and describes in detail the parameters for the function.
Additionally, if we were to press "F1" to call up the Help file, in OOo, we get something like this:
This Help file is interesting not only because it reinforces the implementation in the Function Wizard, but it also gives information on why certain parameters are required, and provides the justifications for them. It also provides some examples and expected results. The story about the third parameter will be covered in the later part of this post.
Now if we were to compare the Help file entries of other spreadsheet apps, we can see how well this function definition is entrenched. In KOffice:
In Gnumeric:
And last but not least; in Microsoft Excel 97:
Now this is information which people have used throughout the 20 years of spreadsheet existence. It would be interesting to find out what exactly is so revolutionary in MSOOXML which the Microsoft supporters are rejoicing in!
Here is the "CEILING" definition from the Ecma 376 / DIS 19500 specification (Part 4, Section 3.17.7.33):
Compare the MSOOXML with the Microsoft Excel 97 definition. You would be hard pressed to find any new information in the Ecma specs over the Help file provided by MS Office 97. Interestingly, the American flavoured example on nickels and dimes, and the #NUM test case have been omitted.
So between April 2006 (MSOOXML Draft 1.2) and May 2006 (MSOOXML Draft 1.3), the Ecma TC45 was provided with the Microsoft Office Help Files which was dutifully inserted in the draft standard. It is no wonder then how the technical committee managed to pull off such a feat by defining the "Formula Definition" within 2 months!
Compare this to the work done by the OASIS ODF Formula Subcommittee. They started work back in October 2005 and are still working on the "Formula Definition." This is the definition of "CEILING" for ODF Open Formula:
The thorough work by the ODF team cannot be over emphasised here. In this new definition of "CEILING" the ODF subcommittee brought together several innovations
- in KOffice and Gnumeric CEILING requires a minimum of only 1, and can default the Significance to "1"
- Microsoft Excel and MSOOXML can only have 2 parameters no more, no less.
- In OOo CEILING requires at least two parameters, with the third parameter for legacy compatibility
OpenFormula combines the best of all words from OOo, KOffice, Gnumeric and MSOffice where the CEILING function can accept 1,2, or even 3 parameters. OpenFormula caters for all of these quirks in spreadsheet applications and attempts to be as inclusive for all the needs of the different applications. This is testament to the deliberations from the many vendors involved in the standardisation process.
Also look at the extensive "test cases" provided within the document itself. The OASIS team can extract these examples and automatically compile test spreadsheets to verify implementation accuracies.
Additionally, OpenFormula attempts to correct a wrong set by spreadsheets a long time ago ...
CEILING is Mathematically Incorrect
Inspect the extensive interoperability notes appended to the OpenFormula document. It details the different behaviours of applications from Gnumeric, KSpread (from KOffice), Microsoft Excel and OpenOffice.org.
For this function, there are current interoperability problems in that the evaluation of the CEILING formula is incompatible with the standard mathematical definition; Gnumeric, KSpread and Excel will give the wrong results for negative numbers.
OpenOffice.org and OpenFormula correctly evaluates the result, and additionally provides "backward compatibility" which is the third parameter where users can flag it to use the wrong but legacy type calculation.
In contrast, MSOOXML just has one definition which is mathematically wrong.
Here is the standard mathematical definition of the function "CEILING" as defined in all Maths Textbooks, and best illustrated by Wolfram Research, the creators of "Mathematica":
You will find that Mathematica and OpenFormula specifies "CEILING" correctly in that negative numbers; CEILING( -4.5 ) is and always should be -4. Whereas Microsoft Office and MSOOXML incorrectly specifies CEILING( -4.5 ) as -5.
It would be shameful if this mathematical error were to be ratified as an ISO standard.
Interoperability
The OpenFormula document is an important piece of work as it goes out of its way for the sake of interoperability not just between the vendors who chose to be represented in OASIS, but also to the leading de-facto vendor who chose NOT to contribute to ODF. The copious notes on the different behaviours of the different applications will help ensure that developers are aware of implementation pitfalls. It will also ensure that the irregular and legacy implementations are inclusive in the specification, to support the "billions" of documents currently in existence.
This is a significant area which clearly demonstrates that ODF is better built for interoperability for legacy documents yet maintaining mathematical accuracy for FUTURE documents. It also nicely demonstrates the collaborative efforts from multiple vendors to allow OASIS to cherry pick the best features from all contributors to create the best Open Formula yet.
So when it comes to comparing MSOOXML and ODF v1.0 on the basis of the inclusion of "Formula Definitions", it becomes clear that the anti-ODF folk have not much to shout about. In fact MSOOXML's "Formula Definition" is deficient and inaccurate. As Rob Weir pointed out, in addition to this CEILING problem, most of the other formulae are inadequately defined and not well tested.
yk.








The mathematical definition includes complex numbers. Will we see those supported at all? Is there a need for them. It looks like it does CEIL(MOD(x)).
Posted by: Richard | Thursday, 12 July 2007 at 09:34 PM
You guys are correct. I didnt notice the "negative significance for negative values" requirement.
Ill try to locate the ODF Formula subcommittee to see if it would be better to keep the significance the same sign no matter what.
BTW, the MSOOXML definition is deliberately vague on this issue. If you note the MS Excel 97 Help file, it defines
CEILING( -2.5, 2 ) equals #NUM
while the MSOOXML spec leaves the case where differing significance signs as unsaid. Did they know something, or was #NUM too application centric?
Anyway, lets see what the OASIS guys think.
Regards!
yk.
Posted by: Yoon Kit | Thursday, 12 July 2007 at 08:23 PM
Following what Bill said (and agreeing with him)...
Surly the mathematical definition of Ceiling should apply precisely to a 'significance' of 1, even for negative values of the first argument. I don't see why you *need* a negative significance to cope with a negative argument x.
Then for s>0 you would have have the simple relationship:
Cei(x,s) = s*Cei(x/s,1) = s*Cei(x/s)
The sensible way to extend to negative values of s (if you choose to do this) is to use the above as the definition of Cei(x,s). For s=0, you notice that there's a well-defined limit as s-->0, and so define Cei(x,0)=x (not 0 as ODF does).
Your function is then mathematically sensible, and the two argument version matches the excel definition when it's defined. (Of course you would probably still want the mode flag to precisely preserve the Excel behaviour.)
Posted by: Robert Whittaker | Thursday, 12 July 2007 at 08:05 PM
The mathematical notion of ceiling doesn't involve a second parameter. So I don't think it makes sense to say that CEILING(-2.5,-1) "doesn't follow the mathematical definition". There *is* no standard two-argument ceiling function in mathematics. How do you expand the mathematical notion, explained on the Mathworld page, above to one which does deal with fractional values? There are multiple ways, but one way is to say CEILING(x,p) gives you the value k*p where k is the smallest integer with k >= x/p. I think that pretty much gives you Excel's behavior. Seems pretty reasonable to me, except for some reason Excel won't let you do CEILING(-2.5,1). If it did, then *that* would be the one I would expect to match mathemetics and return -2.
Posted by: Bill Baxter | Thursday, 12 July 2007 at 05:11 PM
Converting from Excel to ODF, the translation would be CEILING(N,S) -> CEILING(N,S,1). The third arg in ODF is there for interoperability purposes, supporting well-known incorrect implementations.
Going from ODF to Excel, the translation might be:
CEILING(N,S) -> SUM(CEILING(N,S),IF(N lt 0, IF(MOD(N,S)!=0, -S, 0), 0))
and
CEILING(N,S,1) -> CEILING(N,S)
Somehow, I think that SUM function is what ought to be called STUPID_CEILING
Posted by: Mike TIllberg | Wednesday, 11 July 2007 at 09:03 PM
Nick, when a hypothetical ODF supporting version of Excel converts a legacy CEILING call to ODF format, it will of course use the legacy flag that has been provided by the specification. For it to not to do so would be mathematically incorrect. What is your point?
Posted by: Mike Chamberlain | Wednesday, 11 July 2007 at 08:52 PM
I mean, of course, CEILING(-2.5; -1)
Posted by: Nick Barnes | Wednesday, 11 July 2007 at 06:53 PM
Much though I hate the idea of maintaining compatibility for Excel users, it's clear that, for interoperability reasons, CEILING(-2.5) needs to continue to be -3.
No, it's not the well-known mathematical function. Tough.
Unless you're going to convert occurrences of CEILING to STUPID_CEILING when importing from legacy formats such as Excel? And convert back on export?
Posted by: Nick Barnes | Wednesday, 11 July 2007 at 06:51 PM
again - microsoft making a mockery of the ISO process by submitting bogus and inaccurate standards just so they can keep their monopoly going.
please ISO kick ooxml out the door it came in on. just from this information here it should rejected.
Posted by: james | Wednesday, 11 July 2007 at 03:52 AM
Excellent post! Thank you for describing this case so explicitly.
Posted by: Ben Langhinrichs | Tuesday, 10 July 2007 at 10:35 PM