A discussion and brief typology (taxonomy) of links and maintenance of these.
You might already have read Cool URIs don't change (1) by Tim Berners-Lee. While he makes a fair point as to why cool URIs should not change it remains a fact that they do so - 24 hours a day and 365 days a year. And often there are fair reasons for this behaviour. Just like people in the real world will never be found on the same address forever and ever for a mulititude of reasons, pages on the WWW will not be found on the same addresses always.
Perhaps the WWW at large should be considered un-cool but then, if relocation, change, and termination is un-cool, so is real life. These issues are not likely to go away, and we just have to deal with it although we might desire something else.
So, if we can agree to the un-cool nature of the WWW and life in general, let's start being cool about it.
Before we decide what to do about it, we have to realize that links are not just links. A basic grouping might look like this:
But this is not all. Actually, there's seven different types of links (2) to manage, each being there for separate reasons, having separate possible problems, and requiring different actions.
The "easy parts" and those normally attended to by webmasters, are 1 and 2. These are the ones that are under the webmasters full control and can be changed any time. Webmasters normally also have a very high interest in 3 (and 6) due to the need to be found, and to rank well in search engines. There's even a term, "Link Development", that normally refers to type 3, or a combination of types 2+3 (reciprocal links; i link to you, you link to me).
Link types 3 to 7 are generally "risky" for webmasters. They can be implemented without the webmaster knowing anything about it and without him/her being able to influence it in any way. Types 4 to 7 are especially risky, as their "life" is normally longer than types 1-3 (they are not changed easily or often). Printed matter, especially, tends to be stored for ages.
In this article i can only scratch the surface of the problems involved with the different types of links. It is not intended to become a book after all. Hence, i will not describe every single issue in detail.
Let's start by realizing that you will be changing the architecture or structure of your site at one point or another. You will be doing this because no matter how hard you think your initial structure will not likely fullfill your needs five or ten years from now.
When considering the implications, it's helpful to distinguish between "navigation" and "context" (or "content").
Some software used in web development will automatically check that site navigation is okay as new pages are added, old pages are removed and some pages change name and/or location. That is: As soon as you are on the site, you will not experience problems getting from page a to page b. This does not safeguard you against link type 3-7 problems though.
You should always make sure that whenever a page shifts location or name, there's always a replacement page (or other mechanism) that will point to the new location. Over time, this "redirection system" will indeed become a system in it's own right, further complicated by pages that are deleted for whatever reason. Such mechanisms can be, eg. 301 redirects or a redirect database suggesting new locations for your pages to people who would otherwise end up on your 404 page.
For context/content, the problem is somewhat different. Typically an article is written, and it includes one or more links. As time goes by, the whole site changes, pages get relocated, and they might even get different names. Even if a navigation system is in place, the article may still point to a page that has been moved. This is a real challenge even on highly sophisticated sites, as articles and links herein are rarely part of the same system as the navigation and/or outbound links.
For outbound, your main interest will be to make sure that your users are not met by a 404 or by unwanted content.
First, there's the same "context issue" as above. Only, now you don't always know that the documents you link to have been moved to another location than when the article was written, or if the content of the page you link to is still what you think.
Then there's the "link directory" - if links are assembled into some directory-type section, it will be easy to locate them, but checking them (unless done automatically) will take time.
Third, there's the content partners, acting as part of your site (seen from the end user perspective) but really being external sites that you link to - if they change locations or fail, it will harm your site.
As stated above, the term "Link development" describes the pursuit of inbound links, mainly due to Search Engine considerations. Although this topic is relevant to most webmasters, it is off-topic for this article.
When you change your site, you should inform people that have critical deep links to you well in advance. Critical deep links could be links to your payment processing system, or other similar services involving trusted partners. Also, as you will not know everybody that links to you, you should establish measures to catch the error traffic that will follow from the changes. (See #1, subtype: Navigation)
Management of inbound links should be considered part of your navigation system, although the links in question are physically located outside of your site.
Consider a site with 1 million users per month. Let's say that only 10% of the users have a bookmark to the site. That's 100,000 links to portions of the site that you simply cannot control - it might very well be that this site has more inbound links than pages. And this is just link type 4.
Management of inbound bookmarks is similar to management of all other types of inbound links.
A specific treatment of link types 5-7 is not included in this article at present.
Step one is to investigate and realize what links you actually have, plus where and how they are stored. Then, you can begin to check them.
Some kind of guideline or framework is actually a must, but i do not see it employed often, if at all. I see people checking their "directory section" at some arbitrary frequency, or issue a statement like "these links worked when we planted them, they might not do so forever". Some rely on the software for making sure their internal navigation works, and then they just plainly forget the rest.
An automatic link checker is a good starting point. There are several different ones, and i will not recommend one above the rest. Make sure, though, that the chosen tool does not use too much unnecessary bandwith at the site that you link to.
Starting point, because: It is not always enough. There are two types of problems here.
Number one will be caught by a program, but if you link to eg. www.example.com, and this link works great, this does not mean that you can just forget about it. Sometimes sites are closed, and popular ones will often sell their URL so that other parties can use it for their (not necessarily related) purpose. Even less popular ones will not necessarily terminate, rather they tend to turn into commercials for some hosting company or domain name service.
That is number two. This might be okay for some types of sites, but for certain other types (eg. banks, isp's, authorities, shops, associations, etc.) it's simply not an option to link to, say, pornography by accident because the link was pointing at something else once and it still works.
Please note that manual inspection is the only 100% fool-proof solution. You can not easily program yourself out of any of these two situations.
This, in turn, implies that either you should not have that many outbound links, or you should use more ressources on them. Only if you've got no other options should you post one of these "these links worked when"-statements. If you choose to do so, however, do make it clear for your visitors at which date the links were last found to be okay.
The automatic link check, the navigation etc. - all that concerns "the site" (eg. link types 1 and 2) is normally the responsibility of the webmaster (this might be a whole department in some cases).
Manual link check, though. No, it would not be feasible to spend (high) webmaster salary on this. It would be better to hire a few students to do this. Especially if you have a lot of links.
The classical response: It depends.
Navigation (#1, sub-type 1) should be working always. Literally. You simply do not put up a new page, move it, or delete it, without implementing the relevant changes in your navigation system.
Context-links (#1, sub-type 2) should be treated as navigation when they are pointing to other pages "on-site". This means always. If you have an old article pointing to some old version of your Terms Of Service, then, when the TOS changes, the link in the old article should point to the new version, or to the older version if that is the right case.
This can involve a lot of work as it's not just link checking, it's also decision making. Even locating links "hidden" in content might be a difficult task for some sites.
Outbound links (#2, sub-types 1 and 2) will differ. If you link to the main domain, my experience is that you don't have to check as often as when linking "deep" (eg. www.example.com/folder/sub-folder/page.html).
For non-critical links i'd say a frequency of once per month or max once per week should be enough. A link to the White House front page should not need to be checked at all, as it should simply be the right one (these types rarely change). Still: Check it. Although the White House has not moved to another address, typo's might still be in effect on your own page and thus the link might point to an entirely different house than the one you think (3).
On the other hand, a link to a personal homepage or a deep link to any site (eg. a well known portal) might need a check more frequently as these do change more often.
Once per day, i'd employ only when we are getting close to something "critical" (eg. Outbound, sub-type 3). An example could be your hosted shopping cart solution or hosted payment gateway. If these sites are not up and running, you will lose sales. Even though they are external sites, often they will let you customize layout and integrate them into your own sites.
The user will see them as part of your site due to the look and feel, but in fact they remain external and what you provide is a link, so you should monitor that link closely if your business depends on it.
(2) This list builds on my personal experience including all seven types. For types 5-7 especially during the last 5-6 years or so. If you re-publish it, include a link to this page and my name (the link will then become a type 7, or 3 if it's on www).
(3) The White House example was not selected at random. Although the official site resides at the address "www.whitehouse.gov" other sites with substantially different content reside at the dot-net, dot-org, and dot-com versions of this address.