Dark Moon Velvet

Semantics is everything

Posted on: April 15, 2009

What is the first words you hear when tags such as <b>, <i>, <small>, <big> come into discussion? The most common I keep hearing lately is: they are not semantic, use the <strong> and <em> tags instead. But are they really not semantic.

The purpose of HTML

As I see it a HTML document is (or should be) two very simple things, in no particular order:

  • readable by machines, so we can form an aggregate so as to make use of the information distributed. The most obvious example here being the common web search engine, with google as the main candidate even though there are other older ones.
  • readable by human beings, since if all we do is turn thought to 1s and 0s, the sum of our efforts may have zero value.

So from this we can assert a document needs to have both human semantics and machine semantics.

What we have

Surprisingly the original html is pretty well designed for this. Most tags will accomplish both tasks, by encapsulating both purpose and meaning in the same envelope. Take the humble h1, h2, h3 tags. A machine will understand them as the start of a section and also use them to subsequently determine the stacking (nesting) of sub-sections and a human will perceive them as titles.

But not all tags accomplish this bridge in understanding. Consider the anchor tag, it offers machine-only semantic sense since as human beings we can not magically parse text/url data; at best we might guess based on the words contained within. We also have tags which only apply as human semantics, here the i, b, big, small, to mention just the prime candidates, are there so humans can understand html.

They are not just style! This so called “style” has existed and been understood long before the web even had the foundations to stand on its own two feet. It is obvious with the state and common missuses on the web today, human expression can not be captured with generic terminology and encapsulated into some box like text. Not in the near future at least. That is why it is often necessary to hint to the meaning behind the words rather then to clearly butcher it by slicing it up.


Lets take something which color coding is supposedly good at keeping us away from. Not the best example but let us say I have the following:

I had a car crash, the driver will pay for this.

What exactly am I saying here, did I have a car crash or did one of my drivers crash my car. I could emphasize the text with either a <em> or a <i>, but here’s the catch: in the context of the rest of the content I supposedly have it would make absolutely no sense to use a em. It may share both machine and human understanding but unfortunately in this document I just want to avoid the confusion, and the rest of the content has nothing to do with any of the two possible meanings of the incident, so adding machine semantics would be an error and would only skew the meaning. Just think of a document that has a lot of catchy phrases like that; if the topic is not them but how they are conceived does it make sense to add a em or strong to every clarification on them, after all there is no meaning for the words they contain in the context given.

The awful truth

Presented with the above some may press the following question: can you not place it in a <span> and style it with css. And the answer is: how would that then be separation of semantics and presentation.

Presentation should be something that goes on top of semantics in a document it should not be something that guarantees semantics in a document. These are not just empty words because someone wants it to be so. Lets think for a moment, what exactly guarantees that the screen of the client device will be 800 x 600 at least or that the client device supports let alone the exotic properties you used, even the first edition Css. Nothing guarantees this.

With current technology even the boundary between what the interpreter can be if vague; forget about exotic stuff such as a screen reader, it may be a simple thing like some aggregation service such as RSS or integrated into the network where you reside; think of WordPress.com and posts on this blog for example. You can subscribe to them and WordPress will show them in a sort of blog surfing listing. I don’t think anyone is under any misguided assumption that listing has any of the originating blog’s Css styling, so much for any span styled semantics.

So, even if you are stuck to the <em> and <strong>, consider if having your site look like a big blob of text with divs and spans striped out (the true presentation markup) is desirable and worth it.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s




%d bloggers like this: