A review of Rick Jelliffes talk at Catalyst on the 26th of July, 2007

NOTE: This was originally an email on the NZOSS mailing list. I’m posting this here upon request by several readers.

———————–

Hi folks,

As you know Rick Jelliffe has been traveling around New Zealand talking to SSC and various other groups about Microsoft’s proposed OOXML, the ISO Standards process, and “Wiki-gate” (which for those who don’t know was Microsoft looking to hire Rick to edit Wikipedia’s OOXML entry and the resulting media furore). Rick has a history in various ISO standards, Schematron (XML validation), and the original XML working group at W3C.

Last Thursday Rick did a talk at Catalyst IT here in Wellington and afterwards I was asked by a few people to post a review of the evening (I’m an SGML/XML guy too, I do docvert.org and analysis of ODF and OOXML).

Rick’s (and Microsoft’s) explanation of Wiki-gate is that he was paid to tell the truth — to correct inaccuracies about OOXML. Rather than Microsoft editing the article themselves (which is apparently against Wikipedia’s “Conflict of Interest” rules) they say they paid someone to correct mistakes.

While correcting mistakes is good, Rick challenges people to accuse him of “Microsoft [payment] for my opinion: i.e. that I would change my opinion to suit them or whoever paid me”.

So he says he’ll get his meal-ticket angry if he wants because he’s an independent man, and to say otherwise is to attack his credibility.

In my opinion the Wikipedia thing was no big deal, and I think to focus on whether he’s been bought or not is a VERY weak argument because in theory he could be posting what he wanted regardless of money.

I say “in theory” because the real issue is that he doesn’t post balanced stuff which should really be the focus — not the payment.

The talk was due to start at 4:30 which gave the 30 some people in attendance (mostly Catalyst folks) time to have a drink and mingle. I was introduced to Rick and he seemed an approachable friendly guy. We talked a little about how if Government want to use OOXML or ODF they need to define a “profile” (a subset so that people don’t embed proprietary binaries or use poorly supported parts of the spec) in the same way that they do with the Web Standards specification which constrains use of HTML and CSS. Soon after this it was time for the talk.

The first topic was “Wiki-gate”. Rick explained how the event unraveled and took on a life of its own: the initial email where Microsoft were worried about their competitors changing Wikipedia; how he posted Microsoft’s offer on his blog before any contractual agreement; and how news sites took that as Microsoft buying him. Rick pointed out quite clearly that the ‘Conflict of Interest’ rules as shown at http://urltea.com/13b9?wikipedia-coi applied to biased views, not what Rick was asked to do. Certainly there was a great deal of misinformation and many people had accused him of accepting Microsoft bribes (he had screenshots).

Rick never denied taking payments, he only said that he would have the same opinion regardless (and this may well be true, he comes off as a likeable and sincere guy).

IBM’s Rob Weir had this to say about “Wiki-gate” (it’s a tired cliche, that *-gate): Rob Weir’s blog: Crocodile tears

(go read that post by Rob, I’ll wait…)

So this post by Rob Weir was interesting (the approach of seeing what Wikipedia OOXML entry looked like before this — was it as bad as Microsoft said?), but in particular read the comments that followed the post where Rob and Rick talk it out and Rick sees accusations of bribery where there were none. Rick posts under his full name, so it’s easy enough to find his posts.

What I wanted to see whether Rick would present a balanced view of OOXML and ODF, because his online posts didn’t give me much hope…

Rick Jelliffe writes,

The OOXML specification requires conforming implementations to accept and understand various legacy office applications

Several readers respond,

I’m not sure if I’ve heard that one. I know that in the OOXML spec just about everything is optional, so the compliance issue isn’t really the point.

A more relevant point is that real-world OOXML files will contain stuff from all over the spec, and even external file formats like RTF. There’s no reasonable way that I, as a software developer, can handle that. This is a practical concern, not a legal issue.

[...]

Perhaps I’m missing something but I thought the purpose of a standard was for people to create interoperable implementations. Having something like “strings” spew out bits of ascii text from an ODF document would be “conformant” under your definition it would just not implement 99% of the “bits” and say so. This seems even more absurd.

Rick got it quite wrong here. OOXML conformance is defined in the spec as “A conforming consumer shall not reject any conforming documents of the document type expected by that application” (OOXML. Part 1, Section 2.5 ‘Application Conformance’). In other words conformance is defined as anything that application decides to read, rather than any feature-set or compliance with the standard. See http://urltea.com/13cw?rob-weir-ooxml-conformance

Rick Jelliffe writes,

On the other hand, the kind of openness that a completed external specification like OOXML can have is different from the kind of openness that a work-in-progress external specification like ODF affords

Several readers respond,

Er, um, ODF is an ISO standard — ISO/IEC 26300 — hardly a “work in progress”.

[...]

Every format is a work in progress, even Microsoft are working on improvements to OOXML. What’s that kind of snide comment supposed to mean Rick?

Heh… the slant he’s put on that one speaks for itself.

Rick Jelliffe writes,

As I have mentioned before on this blog, I think OOXML has attributes that distinguish it: ODF has simply not been designed with the goal of being able to represent all the information possible in an MS Office document

A reader responds,

This is inaccurate to the extent that it requires qualification. ODF was not designed specifically for MS Office, but it _was_ designed to be able to represent any office document, including, but not limited to, old, current, and future functionality of MS Office. This, of course, requires an explanation and I will explain.

Before I explain, I want to note that I know the ODF spec well, and I’m familiar with the EOXML spec. The only thing I’ve noticed in EOXML that can’t be done with ODF features is inserting RTF files.

Now my explanation:

ODF can be extended. ODF section 1.5 explicitly allows extensions to the spec, with only minor conditions (essentially, you have to use your own namespace).

Microsoft is free to use ODF features for the bulk document and create their own, separate namespace for functionality they couldn’t map. This is permitted. This doesn’t make the file invalid, and I know of one other ODF application that does this too.

Of course, it is highly preferable that MS standarize their extension through a standards body, but that is a separate argument now.

So, as I was saying, I didn’t hold much hope for hearing a balanced view at this Catalyst talk, which this blog post should now get back to…

The next topic Rick covered was the OOXML format and in particular BITMASKS

A bitmask is a technique to encode multiple values inside a single variable, by assigning a meaning to each individual bits of the variable. For example, the binary 10110001 (decimal 177) would mean Yes/No/Yes/Yes/No/No/No/Yes and contain the answers to 8 different yes/no questions.

Rick brought up bitmasks because it’s a popular argument for OOXML detractors. He said that the complaints about OOXML using bitmasks for storing the code pages that a given font supports were exaggerated and this was an acceptable use of bitmasks, that XML processors wouldn’t touch this, and that it was like an hexadecimal RGB value in that it had it’s own non-XML string syntax. There was — he thought — was only a single other place that used bitmasks in the spec (when actually there are at least 8 other places – but I’ll get to that).

XML itself avoids endian issues and is encoding independent; it doesn’t have ways of referring to bits. You can however store strings of “010101101″s and then validate these strings but as W3C Schema has no bitwise operators this will be an arduous task of string comparison. This is completely true, and I brought this up with Rick. His response was to say that you can validate a bitmask serialized as a string of “010110110″ by doing lots of string comparisons. For example, in XSLT you could write…

<xsl:choose>
<xsl:when test”element[substring(@attr, 1, 1) ='0']“>
….
</xsl:when>
<xsl:when test”element[substring(@attr, 1, 1) ='1']“>
….
</xsl:when>
<xsl:when test”element[substring(@attr, 2, 1) ='0']“>
….
</xsl:when>
<xsl:when test”element[substring(@attr, 2, 1) ='1']“>
….
</xsl:when>
<xsl:when test”element[substring(@attr, 3, 1) ='0']“>
….
</xsl:when>
<xsl:when test”element[substring(@attr, 3, 1) ='1']“>
….
</xsl:when>

[...]

</xsl:choose>

As opposed to something half that size which was more readable/maintainable/extensible/bugfree because they’d used attribute values. Rick’s specific suggestion is validating via “regular expressions or enumerations or unions or numeric ranges”. He also suggests a somewhat cleaner Schematron schema here, http://urltea.com/13cf

The other areas that define bitmasks are these,

  • Section 2.3.1.18, Paragraph conditional formatting (page 842).
  • Section 2.4.7, Table cell conditional formatting (page 1085).
  • Section 2.4.8, Table row conditional formatting (page 1087).
  • Section 2.4.51, Table style conditional formatting settings (page 1211).
  • Section 2.4.52, Table style conditional formatting settings exceptions (page 1213)
  • Section 2.15.1.86, Suggested filtering for list of document styles (page 2034)
  • Section 2.15.1.87, Suggested sorting for list of document styles (page 2036)
  • Section 6.1.2.7, tableproperties attribute of shape group (page 5227)

That’s the kind of stuff I do in Docvert so it’s definitely something that I’d want to be able to manipulate (people use such formatting to suggest headings which Docvert then converts into DocBook <section> tags and such).

But even if I didn’t want to manipulate a bitmask (or an RGB value) I may want to validate it. For the RGB example it’s good to be able to notice that a value of “#FF00GF” is invalid.

Personally I consider bitmasks to be one of the weaker arguments against OOXML. It’s a poor design decision but at least it’s documented, unlike many other parts of the specification.

But enough about bitmasks,

NOW TO ISO STANDARDS

A question from the audience asked a great question of whether OOXML can achieve interoperability with other implementations.

Rick said that Standards are not about interoperability but rather about people with common interests coming together and agreeing something mutually useful.

I was a little shocked at Rick’s view, as were others in the room. Fortunately the ISO/IEC appear to disagree with Rick here and they set higher standards…

A purpose of IT standardization is to ensure that products available in the marketplace
have characteristics of interoperability, portability and cultural and linguistic adaptability.
Therefore, standards which are developed shall reflect the requirements of the following
Common Strategic Characteristics

● Ιnteroperability;
● Portability;
● Cultural and linguistic adaptability;

- http://urltea.com/13bg

He mentioned that ODF didn’t define the ZIP format, and that OOXML did (Open Package Container, or OPC as it’s called). This was actually a good point! Even though it’s well known ODF should have defined the container format.

After the presentation we all transformed (*Insert cool Transformers noise here*) into groups.

I had a chat with him about the legal issues of OOXML. Microsoft have granted patents over the required parts of the standard, but not the non-required parts — effectively restricting competitors to only implementing a subset of OOXML.

His argument was that it didn’t matter if Microsoft would only grant patents over the required parts of the spec. Microsoft wouldn’t win any court case because courts would frown on defining a standard only to legally restrict people from implementing it (bait and switch) — and that there were several court cases to do with pipe fittings that established precedence.

So the obvious response is that precedence is good if it goes to court but the affect can start much earlier (Eg, Microsoft claiming that Linux violates 235 patents without ever saying which patents they are or taking it to court). Similarly I see any OOXML patents having a life of their own outside the court for marketing and legal threats. Many others agree with this idea…

http://www.youtube.com/watch?v=6YExl9ojclo

I also asked what he thought about “Office Open XML” having a similar name to OpenOffice.org and he didn’t have a problem with it. He said that what ISO name it won’t affect what Microsoft call their product/feature so ISO don’t have much influence there. British Standards Institute panelists have suggested the name “RODDL” (see http://urltea.com/12nw ) but this won’t affect what Microsoft calls their effort.

There was a question about ‘stacking the vote’, how companies interested in OOXML had suddenly joined standards groups all around the world in order to vote for it. What happened was this,

“As you can see, at the start of the year, V1′s membership consisted of seven organizations, six of whom on Friday voted “Disapproval, with comments”, and one (Microsoft) who voted “Approval, with comments”.

The membership spurt came at the very end, in the last month, when 16 new members joined V1. Of these 16 new members, 14 of them voted, “Approval, with comments” on Friday.”

- http://urltea.com/13cu?rob-weir-blog

Rick excused this by way of saying that as the voting process allows companies to join it was acceptable.

And I don’t have a good response to that. I don’t know what I could suggest that would be better. As Rick quite rightly said are you going to disallow companies who are interested in the tech from voting — if so, many people are guilty, not just Microsoft and OOXML supporters.

Finally he talked a little more about whether the OOXML and ODF standards can be merged, and again he quoted an ODF author as saying they couldn’t be. Luckily Microsoft’s Alan Yates and Tim Bray (co-lead developer of XML1) think it can and I do too… see http://urltea.com/12nt .

…and that was the talk.

Rick is a smart guy and — as I hope I’ve described — he’s very pleasant and likable. Wikigate was overblown — I think we can all agree on that. I didn’t write this to personally hassle Rick, just to explain his arguments as I understood them and if I got any of it wrong I’ll say so.

I wrote this to point out that his analysis of OOXML is typically one-sided and it ignores established facts (again he talked about OOXML requiring legacy support at the Catalyst talk).

I think Rick is very wrong in his analysis of OOXML, and I hope I’ve got enough references here to prove that.

Leave a Reply