clsc.net
The search engine benefit of valid HTML explained - clsc.net

The search engine benefit of valid HTML explained

I was probably one of the first guys to post the statement that “SE’s and bots like valid html”. That is: Valid markup does not cut it by itself, it takes a wee bit more than that to get rankings. So, what’s the deal?

The SE benefit of valid HTML explained

From code to pages:

By validating your documents (pages, and you should validate your links too, btw) you will be forced to correct errors. Although validating is a #*$! when you’re not used to it, it will become increasingly simpler quite quickly – you simply learn as you go, and make fewer mistakes along the way.

So, validating is actually the easy part.

The really interesting benefit comes not so much from the validating itself, but from the mindset you will find yourself in after valid markup becomes routine. You’ll start by learning about the markup codes, and their specific use of course – quite probably you’ll be able to trim your pages in the process, giving you faster download times and eliminating unnecessary code.

Then you’ll find that “the codes” are specific elements of “a page” that all serve specific purposes, and, as you start working with these elements, and learn more about their intended use, you will start making changes to the way you create pages in the first place. This will lead to changes in the way you create web sites, and this will, again, lead to changes in the way your particular web site integrates with the rest of the web. At the end of this long road lies the real benefit.

The following points will illustrate this:

From Pages To Documents:

No, you won’t have a web page anymore. In stead you’ll have a structured document. A structured document has characteristics such as a title, headline, paragraphs, sections, and perhaps even sub-sections with sub-headlines.

Working with document structure is essential. Although it’s not as easy as validating (ie. there’s no one-size-fits-all formula) you’ll be able to set up your own rules (templates, even) for what types of content goes where, and when.

This might sound basic, but it’s not: A well-structured document contains all the necessary information to easily determine what the document is really about, put in the right places. Determining what web pages are really about is the most important task for a search engine – do I need to say more?

From Websites To Documents:

No, you won’t have a web site anymore. In stead you’ll have a structured collection of structured documents. Yes, your whole web site becomes one big document that contains the individual pages. Think “book” – one big document with separate parts, hosting separate chapters, with separate sections, and …pages.

Working with site structure is essential. Although it’s not as easy as validating, or structuring individual documents (ie. there’s no one-size-fits-all formula) you’ll be able to set up your own rules for what types of content goes where, and when.

Your task when structuring a set of documents is to make sure that each document is grouped with related documents, and can be reached from the relevant other documents, both on higher, same, and lower levels (ie. your internal link policy). In particular, you should make sure that it is easy to locate the most relevant pages on the particular subjects that your users will find interesting (yes, your money terms, or keywords). You do that by creating the right sections and sub-sections, and by creating the right linking between those and the relevant pages.

Why is this good? Because search engines seek clues that your individual page is really about widgets, not only from the widget page itself, but also from the pages surrounding it. These clues can be anchor text, section names, URLs, in- and outbound links, and a variety of other stuff.

From “The Web” To Documents:

No, your web site will not be “your website” anymore. In stead, it will be a structured document in a bigger, more-or-less-structured set of structured sets of documents, called the web. Think “library” – a collection of topics and sub-topics containing “books” with sections, chapters, and so on…

Working with web structure is essential. As it’s not easy and you can’t really do it totally on your own you’ll have to to set up your own rules for what types of content goes where, and when.

The task here is to make your “book” stand out as the right one for the chosen subject. A web site (“book”) about “horses, printers, and railroads” will most likely not be thought of as an authority on any of these issues by any potential reader. Also, it will be quite hard for the librarians to find the proper shelf to put it on, so how should anyone potentially interested in one of these subjects ever find it?

Find your shelf for that book (yes, your niche for the site) and stick to it. Group with (link to homepage of) other books on the shelf or quote sections (deeplink). Get links back, not only from related sites, but also from relevant niche directories. If necessary, split the site up in separate sites about horses, trains, and whatever.

But, that wasn’t W3C, that was SEO 101?

Exactly, and that’s the beauty of it. As you get accustomed to working with the structural elements of your page (the HTML markup) sooner or later you will “automagically” start to see the bigger picture. The W3C is working towards a thing called “the semantic web”, which (apart from being fancy words) is a rule set for a flexible, yet standardized way to show the intrinsic meaning of documents.

Each one of those HTML tags is there for a reason, it’s no longer to format the look of your text (as many people think), it is to help you give an even better presentation of what your page is really about. Get it? Study those tags, and their meaning (ie. the intended use) and use them as intended.

Working “from the bottom up”, so to speak, you will start by correcting errors, then you will get your individual pages more focused and on-topic, then you will do this to your site, and then your site will fit better into the relevant sections of that one great document known as the web, and hence it will become easier to find you.

Yeah great, but that’s what i do without valid html

You think you do. You will learn, eventually, that “doing the right things” will imply that you also “get things right”. It’s not about pixel-perfect design, CSS, tables, or about having one font or another; it’s the mindset you will enter (with some routine) from the very beginning of putting a page together – you will be forced to think about the relation of one element on your page to another element, and the relation between this page and that page, and so on.

It’s simply a framework that will enable you to build better pages, ie. pages that easier convey what they’re all about. You can do this without valid html, but in the long run, you can’t take the valid html route without doing this.

So, …

… it’s not the valid html that does it – it’s the choices that validating will force you to make, and the understanding of the semantics of the web that you will gain in the process.

Here’s a quote from The W3C Semantic Web page. Read this, and think about the concept of a “Search Engine” for a moment… exactly what is a SE except for a machine that tries to understand your data?

Facilities to put machine-understandable data on the Web are becoming a high priority for many communities. The Web can reach its full potential only if it becomes a place where data can be shared and processed by automated tools as well as by people. For the Web to scale, tomorrow’s programs must be able to share and process data even when these programs have been designed totally independently. The Semantic Web is a vision: the idea of having data on the web defined and linked in a way that it can be used by machines not just for display purposes, but for automation, integration and reuse of data across various applications.

…valid html is only the first step.


(This is a slightly edited version of a post of mine on the WebmasterWorld discussion forum, September 19 2004)

Tags: , , , , , , , ,

Leave a comment




 
 
 
 
 
 
clsc.net | © clsc.net | If it’s complicated, just call me. | Log in
2010.09.0302:52