Separation of Concerns

In the field of programming there’s universal appreciation for abstraction. It’s been beaten into programmers since day zero through the painful experience of changing brittle software full of code duplication and poorly expressed metaphors. Programmers learnt to remove and refactor duplication and to architect beautiful abstractions that identify and reuse commonalities in code.

The programmers were right about abstraction, but most have had a blind-spot when it came to HTML.

In the past few years there’s been increasing interest in the semantics of HTML and debate about interpretations of the HTML 4.01 standard. Menus that were previously in <table>s were moved to <ul>s. Then <table>s were relegated to tabular data and layout was achieved with the <div>ision tag. FAQ lists were expressed as <dl>, <dt>, <dd> and even the archaic <address> tag has been seen a comeback with it’s semantics more clearly defined than ever before. There comes a time however when pawing over the W3C specification will not reveal more semantics and when this particular markup language will not be sufficient for the demands that programmers put upon it.

And the demands are mounting. Pages should be able to be repurposed into syndication feeds (RSS 0.92/2.0, ATOM) which means programmers need article metadata. Interactive forms with validation should be available in HTML 4, HTML 5, Flash/Flex, XUL and XAML (with native controls where possible). Generating PDFs, ODTs and OOXML/DOCs would be nice, and data-tables within pages should be exportable to CSV or ODS.

HTML 4.01 is currently stirring the same kind of doubt in programmers that duplication in source code does. The abstraction in HTML 4.01 is insufficient. It seems wrong to code an application to produce HTML 4.01 when there are commonalities with HTML 5, XUL, Flash/Flex, XAML, etc. A series of tabs could be native in XUL and XAML but emulated with a hack of <div> tags in HTML. A slider control slider.gif or a date picker could be native in HTML 5 but emulated in HTML 4. A programmer putting slider controls on a page shouldn’t code to a particular output format, they shouldn’t have to care.

Most programmers could easily imagine how an HTML 5 date picker could be automatically converted into HTML 4 and JavaScript but the reverse is not so easily conceivable. All the inconsistent hacks that HTML 4 pages use to emulate date pickers makes a conversion from HTML 4 to HTML 5 impossible for all but contrived and simplistic examples. Converting HTML 4 hacks will be considerably more difficult precisely because HTML 4 lacks standard semantics for expressing these widgets. The 4.xx series of HTML has been with us for so long that perceivably it has been a constant, and applications have been hard-coded to output HTML 4. Any abstraction from HTML 4 was considered over-engineering, widget hacks became common place, and the blind-spot grew.

The proliferation of JavaScript libraries to emulate widgets shows an inherent weakness in HTML 4 but the problem isn’t just in form widgets. Textural content also has limitations in HTML 4, as anyone who knows DocBook or TEI will tell you.

After recognising this problem of abstraction programmers are now looking for higher-level languages that can be converted into lower-level ones such as HTML 4, RSS, PDF, ODT, etc. This conversion may occur server-side with an abstract page definition… perhaps a source in HTML 5, Flex, XUL, or XAML that is converted into output formats such as HTML 4 with JavaScript, HTML 5 with JavaScript, XUL, XAML, RSS, ODT, PDF, and so on.

What would a source format that straddled all these formats look like? Well, the output formats all can represent tables, paragraphs, headings, images, text-boxes, date pickers, hyperlinks and so on. Some features clearly overlap. Beyond that it’s more difficult, and more debatable (do XUL and XAML have features that map poorly to HTML and SVG? Is an abstraction still useful without these features?) so I’ll save that for another blog post.

Of course the devil’s in the details, and the feasibility of straddling a useful subset will only be proven in the semantics chosen. Still, going through the process will be an interesting exercise.

What would wxWidgets for the web look like? Stay tuned…

(Please note: Although I specifically mention HTML 4 as being a poor abstraction or internal data-model for modern applications the same goes for XHTML 1 as they have the same semantics, obviously)

4 Responses to “Separation of Concerns”

  1. maetl Says:

    Not having to code templates to a particular output format is nice in theory, but is it really just a matter of layers to the cake (or trifle, if you prefer)?

    I still don’t quite understand why you would *need* to have a site outputting a UI in both HTML5 and HTML4 and other formats all at once… Surely, HTML4 is good for a certain limited range of content and hypertext stuff, but there are other better choices for more complex or dynamic UI’s. I would expect it to be more of an either/or proposition in most cases.

    It may be a restrictive convention, but surely, you can derive most of the transforms you mention from just ‘id’ and ‘class’ attributes?

  2. admin Says:

    Hi Mark,

    Not having to author input for a particular output format is nice in practice too, I assure you (yes, there’s working software already). Of course this isn’t just about abstracting “templates”: it’s content, forms, menus, tabs, and every part of the output that has previously been hard-coded and inflexible.

    Some of your questions would be better answered by going into great detail about the similarities and dissimilarities between these formats. Contrasting the different table models, image models, etc., would surely take more than a blog comment, however I’ll just say for now that the majority of semantics available in DocBook5 and HTML5 could not be easily encoded in the @class and @id attributes of HTML4. The exact choice of semantics will come out as we go through the tag mappings, which may help justify a new abstraction rather than a naming convention in HTML4.

    Speaking more generally about “better choices for more complex or dynamic UI’s”. I would like to see HTML5 become the future of the web more than I would XAML, Flash/Flex, or XUL because HTML5 truly is an open standard. The other platforms are advancing quickly… to paraphrase Brendan Eich “the obvious conflict of interest between the standards-based web and proprietary platforms advanced by Microsoft, and the rationales for keeping the web’s language small while the proprietary platforms rapidly evolve support for large languages, does not help maintain the fiction that only clashing high-level philosophies are involved here.” –Source

    An abstraction of the authoring format in order to support multiple output formats would encourage HTML5 without losing the HTML4 audience. I see this as necessary for HTML5, but there are many ways of achieving this abstraction and I imagine that this project will just be one of many.

  3. Family Holloway » Blog Archive » Bear Patrol Says:

    [...] Family Holloway: Blog « Separation of Concerns [...]

  4. Malcolm Says:

    Hi Matthew,

    If you’ll cast your mind back to 1998 for a moment and remember your days at University House, yes it’s Malcolm Phillips from room AL.01 here.

    Just writing to say Hi, and that I saw you on the cover of a Computer World the other day (nice glasses). It’s good to see that you’re doing well, some 10 years furthur on. I can’t think of anyone more qualified to be representing NZ’s views on OOXML.

    btw, my own website is a bit prehistoric (though at least I have one now).

Leave a Reply