Return to Links

A gentle, rambling HTML tutorial

Introduction

It's not always appreciated that when you use a Browser, such as Internet Explorer, what you are actually doing is running a computer program - called, not surprisingly in that case, Internet Explorer! (There are many other Browsers; some better, some better still.)

As everybody knows, most computer programs require data in order to work.

The basic set of data a Browser needs is a Web Page. (Key concepts, such as Browser and Web Page, will, for emphasis, receive capital letters.)

Computer programs are usually pretty finicky about their data: they like the data to be just so or, in the jargon, in the correct format.

A Browser is no exception: if you present it with garbage for data, the results will be unpredictable - or, actually, only too predictable! Remember GIGO: Garbage In Garbage Out.

So, if for some crazy reason you want to produce Web Pages, you have to try to humour the Browser by getting the format right.

That's where HTML comes in.

Some preliminaries about HTML

The idea of HTML is to provide you with a set of rules for writing a Web Page so that a Browser will know what to do in order to produce the display that you intend.

Why is it called HTML? The letters stand for HyperText Markup Language. Three little words, so let's take them one at a time - in reverse order, if you don't mind.

My dictionary says that Language is 'the method of human communication, either spoken or written, consisting of the use of words in an agreed way'. So that's alright, then! HTML is the method of communicating with a Browser in an agreed way, so that the poor Browser can try to do what you intend. The good news is that, as with many languages, it's not difficult to get started with HTML - trust me!

Inexplicably my dictionary has never heard of Markup. In prehistoric times - that is, before the Internet - Markup was the standard way of communicating Text instructions to typists or printers. (Nothing to do with increasing prices, by the way!) Markup consisted of an agreed set of symbols - it was hardly a Language, really - and there were rules for using the symbols. Similarly HTML has an agreed set of symbols, called Tags, and a set of rules for assembling these Tags into a Web Page. A Tag consists of one or more symbols - typically a word, but not always - enclosed in a pair of angle brackets, such as <a>, <br />, <table> and so on.

Finally HyperText: fancy word; simple idea. HyperText allows you to tell the Browser to jump about, either within a Web Page or between one Web Page and another. It is no exaggeration to claim that HyperText is by far the most important concept behind the Internet; there would be no Internet to speak of but for HyperText. Imagine only being able to look at a single Page of Text, without the ability to Click and thus jump to somewhere else. For some reason, the Tag in HTML which effects the jump is called an Anchor and that's what the aforementioned <a> Tag is about. All will be made clear!

The ideas encapsulated in HTML have been around ever since the Internet first began. We needn't go into too much history but, as with any Language, HTML has developed over the years. The dialect many people use is HTML4.01, which is certainly fine for producing perfectly adequate results but, from the point of view of software purists (such as your humble servant), it is a shade unhygienic - and where would we be without hygiene? The version I normally use these days is called XHTML, standing for eXtensible HTML, in fact XHTML1.1.

As with any other speciality, life in computerland is replete with grotesque acronyms.

It's worth being aware that there is a sort of governing body of recognised standards on the Internet - nothing trivial such as moral standards, but relating, for example, to what should or should not be recommended components of HTML in its various incarnations. This body is the W3C, which stands for the World Wide Web Consortium. They try to persuade the writers of Web Browsers and Web Pages to adhere to a common set of standards. Needless to say, they have a Web site which, among other things, offers an extremely useful free on-line HTML validation service, enabling HTML authors to quickly check for any error or lack of conformity with whatever standard is being claimed.

Incidentally, the W3C tries to set and maintain software standards in relation to a variety of trinkets, such as mobile 'phones and so on.

The great pioneer of the Web is the British scientist Tim Berners-Lee, who came up with, amongst other things, the HyperText idea while at the Particle Physics Laboratory CERN, Geneva. He is a W3C director.

OK, enough waffle. Let's see some HTML.

A skeletal Web Page

The following is the general shape of a properly written Web Page document:

[One or two lines telling the Browser what HTML or XHTML standard we are 
adopting so that it knows how to interpret our instructions in our document.
We'll come to these lines shortly.]

<html>

 <head>
  <title>A skeletal Web Page</title>
 </head>

 <body>
  [A sequence of lines of HTML Tags telling the Browser what to display.]
 </body>

</html>

and that's all there is to it!

Well, OK, maybe a few bits are missing - such as any content within the <body> whatsoever - but you get the overall structure. One or two comments are in order.

First, notice that the encompassing <html> Tag is opened at the top and closed with </html> at the bottom. The <html> and </html> obviously form a kind of outer pair for everything else which identifies the part of the document containing the HTML Tags.

Next, observe that, within the <html>...</html> pair there are two other main Tag pairs: the <head>...</head> pair and the <body>...</body> pair.

The <head>...</head> pair contains descriptive information about the document, such as its <title>...</title> which will appear at the top of the Page when it is displayed.

The <body>...</body> pair obviously contains the real meat, the document's displayable content.

Although there are about a hundred different Tags (in XHTML1.1), there's a core of about twenty of them that are worth being familiar with. That's not too many, is it? You have already met about half a dozen of them.

Provided you obey the appropriate set of HTML rules, most Browsers are not concerned with the visual appearance of your document: a rat's nest that toes the HTML line is perfectly acceptable to a Browser - but from the point of view of the human reader you would do well to write prettily, in a clearly structured manner, for example using suitable indentation where one Tag occurs within another - rather like the layers of an onion. Programming - for that is what we are doing - is best regarded as an artistic endeavour.

You can always see how the author of any Web Page on the Internet has done it: just click on the word Source in the View menu heading at the top of the Page. It's a wonderfully instructive revelation, sometimes - and, sometimes, a huge disappointment! It's also a great way of pinching (sorry, sharing) other people's ideas; don't worry, everybody does it on the Internet!

Finally, note that it is now standard to write Tag names in lowercase - not in UPPERCASE, although many Browsers still allow this. (By the way, people who habitually communicate over the Internet in UPPERCASE - the equivalent of SHOUTING - are regarded within the Internet community as somewhat lacking in the social graces!)

A few practicalities

Lest we forget: we've seen a bit of what you do, but how do you do it, using what, and with what result?

Perhaps the simplest tool for producing a Web Page is a Text editor. On a PC there are two of these supplied free: WordPad, which is more or less OK for Web Page authoring (though pretty grim for wordpro), and NotePad, which is pretty basic. I recommend the former, at least to start with. Both can be found in the Accessories folder in the Programs folder, which can be got at under the Start button.

I suppose I ought to mention that there are proprietary tools for designing and writing Web Pages. They have two big disadvantages: you have to pay for them; and they usually produce execrable HTML. (There's no law which states that someone writing a tutorial may not have a dyspeptic opinion of something, is there?)

Having produced your wondrous work, say using WordPad, you need to save it in an appropriately named file. Again, for PC users, this means giving it the correct Extension, namely .htm or .html. (These extensions are a reflection of the fact that Windows sits on top of DOS - which requires extensions - which sits on top of ... and so on.) So, you might end up with a file called Masterpiece.htm or Masterpiece.html.

Finally - and very importantly - one Browser does not behave like another, believe it or not. You would be amazed how differently they display a given Web Page. It is therefore necessary to inflict your masterpiece on as many Browsers as you can lay your hands on.

Document Types and other highly important rubbish

Let's quickly dispose of the one or two strange lines that come before the <html> Tag.

We are talking about letting the Browser know what HTML dialect we are supposed to be using: for example, if we are using strict HTML4.01, the first line of our document should be

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

Don't ask what it all means, just do it!

If we are using the strict XHTML1.1 dialect, then start the document with the two lines

<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

The first of these two tells the Browser to expect words typical of, say, the English (or rather American) language: no funny Japanese or Russian words, for example.

Unfortunately, if you use XHTML1.1, the <html> Tag should be a bit more complicated

<html xmlns="http://www.w3.org/1999/xhtml">

Again don't ask; just copy.

If you really must know more, see the subject of Document Type Definitions, DTDs, on the W3C Web site. There you will also learn that an even bigger and better HTML dialect, called XHTML2.0, is on the stocks, together with its attendant DTD paraphernalia.

Another Page turned in life

So a Web Page in strict HTM4.01 looks a bit like

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

<html>

 <head>
  <title>A skeletal Web Page</title>
 </head>

 <body>
  [A sequence of lines of HTML Tags telling the Browser what to display.]
 </body>

</html>

and a Web Page in XHTML1.1 looks a bit like

<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">

 <head>
  <title>A skeletal Web Page</title>
 </head>

 <body>
  [A sequence of lines of HTML Tags telling the Browser what to display.]
 </body>

</html>

So: already we have a dialectical dichotomy and, if you're confused by that, so you should be and, what is more, prepare to be confounded; there are several more dialects, none of which you need take any notice of whatsoever.

Notice, by the way, that we have sneaked in here our first example of a Tag which includes an Attribute, so that, instead of a mere

<html>

we have the far more impressive

<html xmlns="http://www.w3.org/1999/xhtml">

where the xmlns is an Attribute and the equality sign announces the Attribute Value. An Attribute modifies the meaning of a Tag. Attributes require Attribute Values that are set with an equals sign and are enclosed within single or double quotes.

Gosh we are making progress!

Get a head

Whilst the <head>...</head> pair must contain a <title>...</title> pair, several other Tags might also appear, but mostly don't.

While you are not looking, maybe this is a good moment to sneak in another bit of confusing terminology: Element. Up to this point I have talked about pairs of Tags, such as the <head>...</head> pair. The strictly correct term for a pair of Tags is an Element. So we should really talk about the <head>...</head> Element.

A fairly simple <head>...</head> Element might look like

<head>
 <title>A fairly simple head</title>
 <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
 <link rel="stylesheet" href="stylesheet.css" type="text/css" />
</head>

which gives us the excuse to talk, briefly, about the two new Elements: <meta /> and <link />

Whatever they mean, the first thing to notice is that neither of them comes with a closing friend - there is no such thing as </meta> or </link>. This is because both these Tags are so-called Empty Elements. Empty Elements do not enclose content and thus need no close Tags. In fact, as you can see, they contain their own closure: the space before the / character is not a misprint; it is required by the rules of XHTML and should probably be got used to in other HTML dialects.

Perhaps the simplest example of an Empty Element is <br /> which, as you might guess, is used to insert a line Break on the Web Page.

Before we sketch their meanings, note that both <meta> and <link> have Attributes, just as we saw with the XHTML version of <head>. Note also that, whereas the title Element caused something visible to happen - namely, a title to appear at the top of the displayed Page - the effects of these two Elements are not directly visible: they work behind the scenery, so to speak.

The <meta> (short for meta-information) Element has a number of functions. A common use is to enable so-called indexing tools, such as the Google search engine, to quickly identify information about the Web Page and, indeed, about the Web site containing that Page - assuming that you want such publicity. Another common use is for so-called Client-Pull Page loading, enabling a document to automatically load another document after a specified delay (say 10 seconds). The above example illustrates its use in telling your Browser (at the bits and bytes level) that you are using a particular set of characters, namely those set out in the ISO-8859-1 definition (basically, the usual set for writing in English).

The <link> Element specifies relationships between the current document and other documents. In modern HTML authoring this frequently means linking the document to a Style Sheet. This is a whole topic in itself, rather exotically called Cascading Style Sheets (CSS), which we can only touch upon in a skimpy account of the subject. The idea of CSS is to separate matters of style (such as, for example, which fonts to be used or whether some of the Text should be in red or blue) from those of document structure. A Style Sheet, linked to by means of the <link> Element, will contain the appropriate instructions to effect this.

That's enough on the head of the Web Page beast; now for it's Body.

Vile body

Arguably the basic ingredient of a Web Page is Text; after all, the T of HTML does stand for Text and, not surprisingly, there are more HTML Tags dealing with the processing of Text than anything else. For example, Text can appear in a variety of sizes, of use in the creation of Headings; it can be laid out in Paragraphs; it can be subject to Breaks; it can be italicised or emboldened; it can even come in a variety of colours.

To illustrate, the Headings Tag comes in six levels <h1> to <h6> and has the following effects:

<h1>Heading level 1</h1>

displays as

Heading level 1

namely, quite large, whereas

<h6>Heading level 6</h6>

shows up as
Heading level 6
quite a bit smaller.

You might have thought the original designers of HTML would have chosen the smaller number for the smaller size - but no, they didn't think of that and so we are stuck with it the wrong way round.

The effect of a Break in a line is to cause the line to <br />
drop to the line below.

<pre>
It's  a  slight  oddity  that  Browsers  ignore     multiple     spaces,
	returns,
		tabs  and  other  formatting  characters.
So, if you want the Text to appear,  Preformatted,  more or less
			as    you    typed    it,
	then the <pre>...</pre> Element is what you need.
</pre>

<p>
It's sometimes nice to group one or more parts of your Web Page by use of the Paragraph Tag <p> and this has the effect of causing the Browser to insert a blank line before and after each group.
</p>

There are several more Text Markup Elements (or, in XHTML1.1-speak, the Text Module), but that's about enough for a lazy introduction, such as this, so we'll just mention one last fun Element: acronym. It's perhaps best illustrated by holding the cursor over the following set of letters for a few moments: XML.

How is that done? Like this:

<acronym title="eXtensible Markup Language">XML</acronym>

in which you will also notice another example of an Attribute (namely, title), together with its Attribute Value (in quotes).

Listmania

People love making lists; HTML is list heaven for such people because there are so many tricks one can play. Here we shall content ourselves with just two examples. HTML lists come in two basic forms: Unordered, using the <ul> Tag, and (you've guessed it) Ordered, using <ol>. Each comprises a set of List Items with Tag <li>.

An example of an Unordered List: the HTML sequence

<ul>
 <li>This Unordered List Item</li>
 <li>That Unordered List Item</li>
 <li>The other Unordered List Item</li>
</ul>

displays as

And an example of an Ordered List: the HTML sequence

<ol>
 <li>The first Ordered List Item</li>
 <li>The second Ordered List Item</li>
 <li>The third Ordered List Item</li>
</ol>

gives
  1. The first Ordered List Item
  2. The second Ordered List Item
  3. The third Ordered List Item
Turning the tables

It would be difficult to exaggerate the importance of the <table> Tag, together with its associated Tags, three of which we touch upon here: not only does it enable the obvious function of laying out a simple set of data; it actually provides a convenient structure within which to design entire Web Pages. In this sketch of the subject we can only illustrate the former and merely mention the latter.

A <table>...</table> Element contains a set of <tr> Rows.

Within a <tr>...</tr> Element there is a set of items of <td> Data.

An item of Data looks something like this <td>...</td> Element.

It's sometimes nice to grace each column with a table Heading Element: <th>...</th>

A simple table:

<table border="0">
 <tr>
  <th> &nbsp; </th>
  <th> Column1 </th>
  <th> Column2 </th>
 </tr>
 <tr>
  <td> Row1 </td>
  <td> Datum11 </td>
  <td> Datum12 </td>
 </tr>
 <tr>
  <td> Row2 </td>
  <td> Datum21 </td>
  <td> Datum22 </td>
 </tr>
</table>

displays as

  Column1 Column2
Row1 Datum11 Datum12
Row2 Datum21 Datum22

Only moderately exciting - but it's progress. It's probably best to leave the reader to figure out what is happening by comparing input with displayed output. The only non-obvious item is the occurrence of the rather strange &nbsp; in the third line, standing for Non-Breaking space, which is always used to fill in a 'vacant' cell in a table, namely the one in the top-left corner.

A slightly different simple table:

<table border="1">
 <tr>
  <td rowspan="3"> TallDatum </td>
  <td> Datum11 </td>
  <td> Datum12 </td>
  <td> Datum13 </td>
 </tr>
 <tr>
  <td> Datum21 </td>
  <td> Datum22 </td>
  <td> Datum23 </td>
 </tr>
 <tr>
  <td> Datum31 </td>
  <td> Datum32 </td>
  <td> Datum33 </td>
 </tr>
</table>

displays as

TallDatum Datum11 Datum12 Datum13
Datum21 Datum22 Datum23
Datum31 Datum32 Datum33


You probably get the general idea of the <table>...</table> Element. Try and imagine this greatly expanded up to lay out a whole Web Page.

There is another popular way of laying out Web Pages which you should, at least, have heard of: it's called a Frame. In this method, several separate partial Pages are combined to produce one complete Web Page.

Pulling up the Anchor and jumping ship

Now for the big one: this is what makes the Internet the Internet. No doubt much to the relief of the reader, it's the last bit of HTML we shall expand upon: the famous Anchor Tag <a>. This Tag, when used in the form of the <a>...</a> Element indicates the portion of the Web Page that is a HyperLink and names the target destination for that HyperLink. This is what it looks like:

<a href="http://www.w3.org/TR/xhtml11/doctype.html">W3C XHTML1.1 Tags list</a>

and this is what you see on your Web Page:

W3C XHTML1.1 Tags list

and, assuming that the target destination actually exists, you will be transported there in a trice or two.

You can also use the Anchor Tag to leap about in the current Web Page:

Back to Top

Java jive

As we have seen, HTML (or XHTML) is a fairly crude Language which ultimately grew out of the primitive Markup idea. But HTML is still fairly limited. It is possible to considerably beef it up by importing components from other computer Languages. It would not be appropriate to detail these here, but mention might be made of JavaScript, Java, Flash, Perl, VisualBasic, etc.

For example, it's fairly easy to implement a simple graph-drawing capability within a Web Page using JavaScript. This would be an example of the so-called Client-Side computing; namely an embedded program running on your own computer.

By contrast there is Server-Side computing: a program running on the remote computer on which your Web Site resides. For example, a Perl program, running on the remote computer, might count how many hits (visitors) each Page of your Web Site experiences and compile appropriate statistics.

Infamous last words

To describe - as we have done - the foregoing account as an HTML tutorial is a gross abuse of language. Basically, we have just walked by on the other side of the road. The aim has really just been to familiarise the reader with a few relevant words from the subject and to provide a bit of background to those words. The rest is up to the reader.

One suggestion: there are loads of free (proper) tutorials out there on the Internet; try Googling.

It's quite hard to quote any sort of appropriate reference: the books on the subject are either too fat and expensive or too thin and superficial. The fat, expensive one many professionals use is called:

HTML and XHTML The Complete Reference, by Thomas A Powell, published by Osborne/McGraw-Hill

I suggest you persuade your local library to buy the latest edition of it!


Back to Top

Return to Links