WEB DESIGN LECTURE NOTES  back to
WEEK 2: THE "ML"S--HTML, SGML, AND XML  


html, the document format of web pages
html, the current universal format for most web data, stands for "hypertext markup language". there's a lot of meaning in that little acronym. understanding it will help you understand why html was chosen as the web document format and what its benefits and limitations are.

as we saw in week 1, the web was originally conceived of by tim berners-lee as a hypertext system--a global database of documents linked to each other. the inclusion of "ht" in "html" is, therefore, an obvious reminder of hypertext's role on the web. after all, no hypertext, no web.

which leaves the "ml" in "html". it too has important meaning...here's a short history of why: for berners-lee, conceiving of a world of linked electronic information was one thing; building the actual linking mechanism was quite another. before the web could exist and thrive, it needed a format for the information it contained. simply put, how exactly would a web-document author go about specifying that a word or phrase in their document should connect to another document? web authors needed a labelling system that allowed them to mark the start and end of linked sections in their documents while also indicating the destination of the marked links.

but there's more to web pages than links: even in the beginning, authors also needed a document format that could accomodate some text display features like bold, italics, lists, and paragraphs. so there were really two aspects to the proposed web documents: 1) the content itself, and 2) the meta-information that described the features of the content--things like links, display, and information types.

on top of linking and display features, the yet to be born web format had other logistical requirements. it had to be platform independent, that is, people on any kind of computer should be able to access the information, no matter what kind of system it was created on. ideally the document format would also be text-based, and not owned by a single corporation so documents would be easy to create, efficient to transfer (remember that in 1989 a 2400 baud modem was pretty speedy), and not tied to a single piece of costly software whose fate was determined by only one company. given these requirements, the most appropriate choice of format for the web in 1989 was, *drumroll please*, a markup language (the "ml" of "html").

what's a markup language?
so what was (and is) this illustrious thing called a "markup language"? let's start with the notion of markup itself: markup is traditionally defined as annotation. when an editor or a teacher writes comments on a printed document, they are "marking it up", providing supplemental information about the content of the document. in the electronic world, "markup" is, similarly, supplemental information embedded in a document that describes aspects of the documents' content. usually, electronic markup is used by computers to either manipulate or display data. in its simplest form then, a markup language is a single, limited set of labels that are used to describe a particular kind of information.

let's make that definition a little more tangible with an example. here's what an individual label (piece of markup) might look like inside a document:

example.1


hi there, my name is <NAME>colin</NAME>.

the label (or "tag," as labels are called in the markup world) shown above is used to identify the word "colin" as a name. it indicates the beginning of the labelled data with a "start tag": <NAME>, and the conclusion of it with an "end tag": </NAME> (note the forward slash in the end tag). most markup surrounds, or "contains" the real content in this way.

a tag like "NAME" is used to give semantic meaning to data that would otherwise have none. this kind of descriptive tag allows for sophisticated machine-based data manipulation--things like better searching. imagine how much more accurately you could find references to a person named "will" if you tell a computer to ignore all uses of the word not contained by a "NAME" tag. pretty useful in an international hypertext web of linked documents (even though semantic tags are currently under-used on the web).

other kinds of tags describe the formatting and structure of a document. for instance, the tag "<I>" could be used to mark a piece of text as being italicized. but formatting tags don't have to be so specific. remember that one of the requirements of html was that it be "platform independent", because web documents would be viewed by potentially any electronic display system. well, ideally html should also be "device" independent, meaning that you should be able to view it with any hardware or software. a markup language can offer that flexibility: in the case of the <I> tag, we could have used something like "<EMPASIS>" instead of <I> for italics, leaving the decision of how to render the importance of the content up to the device displaying it. then if you were accessing the document via a voice application instead of a web browser, the emphasized words could be spoken more loudly and the extra meaning in the markup could still be conveyed.

let's look at an example of a slightly longer document that shows these various types of tags:

example.2
<LETTER>
    <SALUTATION>hi there,</SALUTATION>
    <BODY>
    my name is <NAME>colin</NAME>. how are you?
    </BODY>
</LETTER>


in example 2 we see something much closer to a complete marked-up document. some additional features of markup languages are shown: some tags can contain other tags (this is known as "nesting"); some tags are purely structural ("BODY"); and the first tag of the document indicates the type of document we are dealing with ("LETTER"). but the key concept that example 2 demonstrates is the notion that markup languages are not all about hypertext. there are more "ml"s than "html". for instance, based on example 2, we could theoretically create "lml", a markup language for writing letters. typically, separate markup languages are created to describe different types of documents. airplane parts manuals, mathematical notation, pharmaceutical documents, and of course, web content--each of these has its own markup language. and though their tags differ, the syntax of the languages (eg. the angle brackets used to define the start/end tags) is identical. which leads us to the ultimate question...what governs the syntactical composition of markup languages? the answer (we might need another drumroll here) is "sgml".

sgml is the "standard generalized markup language" which sets the rules for how to make a markup language. when tim berners-lee and his colleagues decided that a new markup language (html) was needed for web documents, they had to follow the guidelines laid out by sgml in order to create it. the connection between html and sgml is an extremely useful thing for a serious web designer to know. here are a few reasons why:
  1. html (like other markup languages also based on sgml) was at first primarily an information handling format, not a multimedia format. as a designer, it's useful to realize that philosophically and technically, anything html is forced to do in the realm of advanced multimedia is in some senses quite opposed to the original vision for the language. there is ongoing tension on the web between data manipulation and design or multimedia. currently, one very often suffers when the other is favoured (eg. the text on a page created in flash is not searchable, and flash sites are not accessible outside the graphical world of applications that can display them). the web designer's constant goal is to find the balance between data and design.
  2. you now know that html is not an island. it belongs to a much larger family of data formats, and as such, html may one day connect much more extensively to the wealth of information buried in computers around the world. "xml" is the w3c's current attempt to make that happen.
  3. you'll be able to understand the history behind the limitations of html, and the cause of its structural rules. as you learn more html, you'll start to see that it's not a collection of obscure jargon you need to memorize, but a structured system for composing documents. once you've learned the principles of that system, more advanced html usage is often simply a matter of looking up tag names.

your first web page
the simplest kind of web page is a lone html document. to create an html document, all you need is a text editor (notepad, simpletext, vi, etc). go ahead and start one up now, and create a new file. every html document has a required skeleton that you will always start with. here's a sample skeleton that you can cut'n'paste into your text editor (you'll have to remove the line numbers):

1    <!DOCTYPE HTML PUBLIC 
        "-//W3C//DTD HTML 4.0 Transitional//EN" 
        "http://www.w3.org/TR/REC-html40/loose.dtd">

2    <HTML>

3    <HEAD>
4    <TITLE></TITLE>
5    </HEAD>

6    <BODY>
7    </BODY>

8    </HTML>

some explanation: line 1 ("!DOCTYPE...") states the version of html you are using. note that a tag can span many lines. line 2 ("HTML") declares that the document will be written in html. line 3 ("HEAD") starts the "header" of the document. the header contains the document's title (line 4), and can also store general information about the document (such as the author and summary), and any programming (scripts). line 5 ("/HEAD") signals the end of the head, while line 6 ("BODY") starts the document proper. all of the normal content for the web page will go between the body start tag and the body end tag (line 7). the last line formally identifies the end of the document.

once you've got your skeleton in your text editor, you can add some content to your page. you're only 4 steps away from seeing your page:
  1. between the title tags, type "my first web page".
  2. between the body tags, type "hello world! this is my first web page!".
  3. save your file as somewhere on your computer. name it "mypage.html".
  4. open up a web browser. from the browser's "file" menu, choose "open", then use "browse" (ie) or "choose file" (netscape) to locate the file you just saved, and click "ok" (ie) or "open" (netscape).
congratulations! you haven't even coded a single tag of html yet, and you've already got a page. here's what it should look like. you should see the title you gave your page at the very top of the browser, and the "hello world!" sentence displayed in the browser. notice that the <HEAD> tag isn't displayed--it's a structural tag that contains the title. the placement of the title is up to the display device.

while you're looking at your page, select the "view" menu and choose "source" (ie) or "page source" (netscape). the browser will display the html code for your page. this can be a useful problem solving tool as well as a means of examining code used on other sites.

now then, this is a web *design* course, so you're probably dying to make your first page look a little more interesting. to do that, you'll have to learn a few tags.

how a tag is structured
first let's look at the general structure of a tag. here's a paragraph tag:
<P ALIGN="CENTER">your content goes here</P>
once again, some explanation: the first angle bracket "<" identifies the beginning of the tag. the letter "P" is the name of the tag. tag names may not have spaces, and can be upper or lower case. the first space after the tag name marks the end of the tag name. the next word, "ALIGN", is an attribute of the tag. attributes are used to state characteristics of the tag in question--for a paragraph you might want to set the alignment, for an image you might want to set the height or width. these things are considered attributes of the parent tags. most attributes assign a value with an equals sign and quotations, as in ALIGN="CENTER". after the ALIGN attribute, our paragraph start tag ends with another angle bracket. nearly all start tags have a corresponding closing tag. though it is not strictly mandatory, we end our paragraph with a close tag which is simply the name of the tag surrounded by angle brackets and preceeded by a forward slash.

working with fonts and text layout
now let's add those tags i said would make the page more interesting. first you'll likely want to give a heading to your page. originally, the mechanism html provided for headings was the "H" tags: <H1>, <H2>, <H3>, <H4>, <H5>, and <H6>. these tags were very much in keeping with the data-oriented tradition of html. the "H" tags simply specified the strutural headings in a document. the title of a chapter might be an <H1>, while a subtitle would be an <H2>, and so on. the rendering style of the text contained in each "H" tag was completely up to the browser. usually, the visual representation of the "H" family was simply based on font size, with <H1> being the biggest, and <H6> being the smallest. try adding the following to the top of your first web page, right after the body start tag:
<H1>my first web page</H1>
with the <H1> tag added, your page should now look like this. functionally, the <H1> tag does the trick, but visually it leaves a little to be desired. not surprisingly, as soon as the web started to become a mass medium, web page creators wanted more control over the look of their documents. the interim solution that was introduced by the browser makers (netscape & microsoft), and only ever partially recognized by the w3c (in html 3.2), was the <FONT> tag.

using the font tag, you can set the size, face, and colour of fonts. around the words "hello world!" in your mypage.html file, add the following start and end tags, just like you see them below:
<FONT FACE="helvetica, arial" SIZE="+1" COLOR="#0000FF">hello world!</FONT>
the font tag has attributes that describe the treatment of the font on the tagged text. the "FACE" attribute gives the name of the font family to be used. in order for the font to appear on the user's screen, the user must have the specified font installed on their system. to accomodate for this limitation, the "FACE" attribute allows multiple fonts to be listed ("arial, helvetica"). the browser will try to display the text using the first font in the list, but if it can't find the font, it will try the next, until it comes to the end of the list. you can count on the fonts below being present on a user's system (note that there aren't any overlaps so you'll have to list windows and mac equivalencies). more info on default system fonts can be had from microsoft.
windows98 default fonts
system, arial, book antiqua, calisto mt, copperplate gothic bold, copperplate gothic light, courier new, century gothic, impact, lucida handwriting italic, lucida sans italic, lucida sans unicode, lucida console, marlett, matisse itc, modern, news gothic mt, ocr a extended, ms serif, small fonts, ms sans serif, symbol, tempus sans itc, times new roman, verdana, fixedsys, terminal, webdings, westminster, wingdings

mac os fonts
times, courier regular, helvetica, symbol, chicago, new york, geneva, monaco, palatino, charcoal (added to mac os 8)
the "SIZE" attribute of the font tag increases or decreases the size of the font by an arbitrary amount (about 2 points). possible sizes are -3, -2, -1, 0 (default), +1, +2, and +3, with -3 starting roughly at 7 or 8pt, which is pretty much unreadable on most systems.

the "COLOR" attribute changes the colour of the text. the colour itself is entered as a pound sign (#) followed by the hexidecimal equivalent of the decimal rgb value. in hex, the decimal value 255 is written as "FF", and the decimal value 0 is written as "00". "0000FF" then is r: 0, g: 0, b: 255. if you want to choose colours without doing the conversions, you can use my colour scheme machine.

with the addition of the font tag, your page should now look like this. but as i mentioned earlier, the font tag isn't recommended officially by the w3c. instead, they advocate the use of stylesheets, which are separate general descriptions of visual the formatting in your documents. the stylesheet format for the web is known as "css", introduced officially in html 4.0, 18 December 1997. css is an extremely powerful component of current web design, and is much more in keeping with the original philosophy of the web: keep the data and the formatting separate so the display of the information doesn't jeapordize its manipulation and re-use. css has taken a while to catch on because it was not supported by netscape 3. fortunately, though, enough of the web's population has updated to version 4 browsers and css is now a very appropriate means of formatting web content.

add the following css code after the "TITLE" tags and before the closing "HEAD" tag in your page:
<style type="text/css">
     body { color: #CCCCCC; 
            background: #006600; 
            font-family: arial, helvetica;
          }
</style>
your page should now look like this. in the above code, the style sheet is contained by a pair of style tags. if the browser doesn't recognize the tags, the content of the stylesheet is ignored. as you can see from the example, a stylesheet simply lists tag names, and then gives them display or formatting characteristics. the characteristics of each tag's style are enclosed in curly brackets "{", "}". you can add style to any html element. for more on css, try my css primer, or the w3c's intro to using css by the w3c's dave raggett, or the wdg's reference centre for css. many authoring applications also generate stylesheets automatically.

that should give you a good start on html's text-display capabilities. html also allows for font-embedding, but the technology is currently pretty young, and needs to be developed before it becomes practical for the mainstream web (as usual, microsoft and netscape each have their own implementations). perhaps even after font embedding becomes a commonality, there will often still be times when the only way to get exactly the look you want with your text is to create an image of it in a graphics app like photoshop, and embed the image on your page. this is a sadly necessary evil in the current consumer-driven web that yearns for absolute control over corporate look and messaging. there are lots of trade-offs in web design, and the need for visual control is a powerful force in many designers' approach to making websites. when you're wondering whether or not to use images for text-based design on the web, always remember that when you convert text to pixels, you are losing the value of the information contained in the image you create. no search engine can find it, no screen-to-voice reader can speak it, only a few applications can display it, and nearly none can manipulate it.

ok, i don't want to scare you off using images on your web sites...you just need to be aware that there are some quite serious implications to the use of images on the web, and often these implications go overlooked, particularly in the area of image-based typography. that said, let's see how to put an image on your page.

putting an image on your page
netscape and internet explorer currently support the inline display of two kinds of images: gif and jpeg. gifs are usually used for images with flat areas of colour while jpegs are used for photographic images or images with subtle gradients this convention is based mostly on the fact that the different formats compress the respective kinds of data better, but also on the additional colours offered by jpegs. to put an image on your page, you use the "IMG" tag. to try it out, first save the test image below to the same location as your "mypage.html" file. to save the image, right click on it (windows), or click-and-hold (mac) and choose "save picture as" (ie) or "save image as" (netscape).

my first image

now that you've saved the image, you have to place a reference to it in your html. add the following code after the sentence that ends "this is my first web page!":
<BR>
<IMG SRC="myimage.gif" ALT="my first image" HEIGHT="60" WIDTH="210">
just like the <FONT> tag, the <IMG> tag has a name ("IMG") and a list of attributes. the first attribute, "SRC" specifies the location of the image you want placed on your page. the second attribute "ALT" gives a text description of the image that is displayed when a non-graphical device is being used to render the page. you also get a free mouse-over effect in the 4.0+ browsers (hold your mouse still over the image to see what i mean). the "HEIGHT" and "WIDTH" attributes let the browser know in advance how much space to allocate for the image, causing the page to start displaying faster.

once you've added the image tag, your page should look like this.

finally...your first hypertext link
so you've successfully made most of your first page. once you make others, you're going to want to link to them. you do that with the "A" tag ("A" stands for anchor). under your image, add the following code:
<BR>
i learned how to make this page from
<A HREF="http://www.moock.org/webdesign/lectures/">moock.org</A>
whatever text you surround with the "A" start and end tags becomes the hypertext link. the only attribute of "A" you'll need to worry about for now is the "HREF" attribute, which is where you identify the page you want to link to. to link to a page, enter its url as the value of your "HREF" attribute. to link to a page on the web (as shown in the example), you use exactly what you might type it into a browser as the value of your "HREF" attribute. you can also link to pages on your own system. if you had made a different page called "myotherpage.html" and saved in the same folder as your first page you could link to it by using simply "myotherpage.html" as the value of the "HREF". but your other pages might not always be in the same folder as the page you are linking from. you can link to pages in other folders using what's called a "relative url". here are a few examples that show how relative urls work:

../sample.html
	links to a page called "sample.html"
	in the folder above the current page
	
../../sample.html
	links to a page called "sample.html"
	two folders above the current page

products/widget.html
	links to a page	called "widget.html"
	in a folder called "products" that
	resides	in the same folder as the
	current page
your page should now look like this. and that's it. you've finished your first web page. there's only really one question left to answer: where do you go from here? well, you've only just scratched the surface of making web content. there's lots more fun stuff to learn. luckily, the web is its own best resource. to learn more about creating web pages, all you have to do is go surfing...

getting help online
with the multitude of websites that offer information on web building, there's nearly no need to buy an $80 book about web design that is likely a year out of date the minute it hits bookstores. part of learning to build websites is learning to research problems effectively online. you can't trust everything you read on the web, but in combination with thorough testing, you should be able to find answers to your problems.

below i've listed some of the best resources i've found in my years of building websites. if you can't find what you're looking for on one of those sites, try typing your question into a search engine like altavista or google and see what you get. lots of developers have personal sites with help for the web design community, and often a keyword on a specific issue will find someone else's notes on how they dealt with it. speaking of community, newsgroups and mailing lists are another super way to get answers to questions quickly. many of the html and javascript newsgroups are just so crowded though, that it may be more efficient to use dejanews (a web-based archive of most usenet newsgroup activity) to see if the information you're looking for has been discussed in the past. the flash dev. community is a little more manageable. you can get in on the lively discussion of flash design by subscribing to the most popular current mailing list for flash developers: flasher-l.

if you just love the feel of paper in your hands, i recommend spending your money on designer books more than technical ones (unless they're official programmer's references like the kind o'reilly publishes). try lynda weinman and john warren lentz's deconstructing web graphics, or lynda's own designing web graphics, or david siegel's creating killer websites. for flash, pick up my book co-authored with rob reinhardt and john warren lentz: the flash 4 bible (available december 99).
(copyright colin moock, fall 1999)