future.web

ON THE HORIZON: HTML 4.0 and Beyond

A new version of the HyperText Markup Language (HTML) will significantly advance the accessibility, interoperability and internationalization of the World Wide Web.

"The medium is the message," according to Marshall McLuhans's famous paradigm of the modern mass-media era. In the case of the Web, the medium is controlled by HTML code. When created by Tim Berners-Lee, HTML's primary guiding principle was the separation of content and presentation. The goal was to make content truly browser independent. Different browsers would present page layout in different ways, but content would always retain its coherence.

However, the web has fallen short of this laudable goal. Content developers were used to the print medium, where content and layout were fixed and inextricably merged. We shunned bare-bones HTML and designed graphically-intensive web pages following the print model: the pages looked great on a single combination of browsers, operating systems, and screen sizes, but didn't fare so well when that combination was changed.

Browser developers, eager to give the content developers advanced layout features mimicking those in Desktop Publishing, added new proprietary tags to HTML. The mix of content and presentation makes today's web nearly unusable by the blind, who rely on text-to-speech or text-to-braille conversion. Web pages are also inoperable across a variety of devices like WebTV or on small-screen browsers like those to be offered on PDA's, hand-helds, and cell phones. Furthermore, the provisions for languages besides English are extremely crude.

Recognizing the need for a standard HTML and a return to original principles, the World Wide Web Consortium (W3C) was formed in 1994. At first, the W3C tried to get ahead of the curve and offered drafts (HTML+ and HTML3.0) that would advance the state of the art. However, the king of browser developers, Netscape, was enjoying incredible success with its Navigator browser and the widespread use of its proprietary tags, SO had too much momemtum to be swayed by these drafts. When Microsoft released Internet Explorer, supporting nearly all of the Netscape extensions, the W3C apparently realized that a de facto standard could be had. So, it offered the HTML 3.2 "Recommendation", confirming a core set of the new extensions. The acceptance of HTML 3.2 by the web development commmunity gave the W3C leverage which it can now use to push its latest recommendation, HTML 4.0.

HTML 4.0 separates content and style by moving nearly all layout control to style-sheets. A style-sheet is basically a separate part of the HTML page that controls the presentation of the page content. The W3C drafted the CCS1 cascading style-sheet convention currently supported by Netscape and Microsoft in their latest browsers. With HTML 4.0, the W3C would have the importance of style-sheets increase greatly. All font specifications, paragraph formatting, color choices, and alignment directives would move to the style sheet.

The style sheet would take over use of tags like <FONT>, <CENTER> and specifications like BGCOLOR=, COLOR= or ALIGN= are discouraged and should be replaced by the enhanced power of the style sheet. Furthermore, the web author can define different style sheets to meet different needs. For example, an "aural" style-sheet designed for text-to-speech browsers (with appropriate sound cues), "tv" for television sets, "handheld" for PDA's and "print" for printing hardcopy. The widespread support for style-sheets in this manner will greatly enhance the operation of the web across different browsers, platforms, and for a greater number of users. Furthermore, in all graphical elements, HTML 4.0 provides a clean way to provide alternate text.

Another challenge to the handicapped is the current limitations of tables and forms. HTML 4.0 adds the capability to specify alternate text for table data and keys that reference table headers. This will make the table understandable when you "hear" it. It also adds tags to the TABLE specfication that will speed up page rendering, allowing the table head, foot, and columns to display before the table data is loaded. Forms would be greatly enhanced under HTML 4.0, allowing tab-key switching between form elements, grouping and hot-key short-cuts to specific form areas.

The character specification for HTML 4.0 is identical to Unicode 2.0. Unicode is an attempt to create a universal character set, supporting all the major languages of the world. Web authors could create pages in any of these languages, as well as access the full range of characters through the &#NNNN; (numeric character reference) notation. Also, HTML 4.0 includes support for altering the direction of text (Japanese, for instance, runs right to left).

Additionally, HTML 4.0 supports the Portable Network Graphics (PNG) image file format that allows very small image files at 24-bit color, and lossless compression. PNG is also special in that it can encode transparency on multiple color channels.

As with its predecessor, HTML 4.0 recognizes some of Netscape's most significant developments; notably, frames and scripting. Also, it supports Microsoft's <IFRAME> inline frame.

In summary, HTML 4.0 is a major step forward in the evolution of the web. Although no browser fully implements it yet, Netscape and Microsoft have pledged to support it in the next major releases of their browsers (Internet Explorer 4 has some support for it now). In addition, the W3C has released a "testbed" browser/editor client called Amaya that can read and write HTML 4.0 including style sheets. Amaya is available for UNIX, Windows95, and Windows NT. You can get started today by using style-sheets in all your web documents. Prepare for the next generation of the web!


Dynamic HTML and Math Support

HTML 4.0 does not directly support the latest Netscape Dynamic HTML tags such as <LAYER> ( absolute positioning, relative positioning, and z-ordering ). But work is underway on an enhanced style sheet specification that will include full positioning and z-ordering. When complete, web developers will be able to provide the best of both worlds - fine control over positioning to achieve exact layout and interoperability through using differing style-sheets. Also under construction is a complete Document Object Model (DOM) standard. Microsoft has pledged support for DOM, and if Netscape follows, then there will finally be convergence on the differing implementations of JavaScript.

In another auxillary development, the W3C is working on a markup language for equations and scientific notation, termed MathML. MathML consists of a series of tags similar to HTML's table markup syntax. The equation (x + 2 )2 would be written as follows in MathML:

<MSUP>
	<MROW>
		<MF>(</MF>
			<MROW>
				<MI>x</MI>
				<MO>+</MO>
			</MROW>
		<MF>)</MF>
	</MROW>
	<MN>2</MN>
</MSUP>

As you can see from a simple equation (even without subscripts!) MathML is extremely unwieldy. The MathML working group claims that this complexity is necessary for MathML to "flexible and extensible" and that they don't intend for this markup to be created by equation-editors, not by hand. Instead conv erters from TeX and Mathametica notation are being produced. However, I'm still skeptical; I don't know why an easier format such as Mathematica's could not be used "in-place."


Links
HTML 4.0 Reference
The W3C Home Page
The Web Accessibility Initiative
Cascading Style Sheets
Math Markup Language Draft

Guy McArthur (smiley@seds.org)