A discussion and explanation of the Google localrank patent dealing with ranking and re-ranking.
New patent means new way of ranking.
In this thread of a similar name zafile pointed me towards a news.com article on a new Google patent. The link in the news.com article turned out to be wrong, but i found the patent anyway. Specifically, it is this one:
You can find the details at the US Patent & Trademark Office, Patent Full-Text and Image Database - search for 6,526,440. Text from this patent -published to the public domain by USPTO - is also quoted in this post, where necessary to make a point. There are no copyrights on patent texts, but i have tried to keep quotes to a minimum anyway.
As the title suggests, this is a patent that deals with ranking and re-ranking of results.
The patent was granted on February 25, 2003, and filed January 30, 2001 - so Google researchers have known about it for at least two years already. Still, a patent grant means that the source description is published. This is the reason for (as well as the "Google News" of) this post.
I have spent a few hours studying it, and it clearly has implications for users of this forum. I'll get to the nitty-gritty of it, but let me point out the major points first.
It's not an easy read. And there are 7 unknowns as well as some scope for flexibility and judgement (either by trial-and-error or by manual or automated processes). It's really interesting though.
It's a patent. Nothing more and nothing less. A description of some procedure for doing something. This does not mean that it will ever be put to use, as lots of patents are granted and never used. Patents don't come with a release date, but some elements of the confusion we are seeing now could be explained by this.
Chances are, however, that this one will be put to use. Having spent a few hours on it, i must say that it makes some sense. It is intended to provide better and more relevant results for users of the Google SE, and at the same time (i quote the patent text here) :
... to prevent any single author of web content from having too much of an impact on the ranking value.
Sounds serious, especially for the SEO community. And it probably is, too. But don't panic. Notice that it says "too much of an" and not "any". It's still a ranking derived from links, not a random Google rank.
We know about the Page Rank algorithm. This is the tool that Google uses to make sure that the pages it has indexed are displayed to the user with the most important pages first. Without being too specific it simply means that, for each and every page Google calculates some value that ranks this page relative to the other pages in the index.
This is something else. Rephrase: This is the same thing plus something else. It is, essentially, a new way to order the top results for any query.
What the new patent implies is a ranking, then a reranking, then a weighting, and then a display. It goes something like this:
In this process there will actually be "two ranks", or rather, there will be three: The initially ranked documents (OldScore ~ Page Rank), and the reranked documents (LocalScore ~ Local Rank). The serps will still show only one set of documents, but this set will be ranked according to the "NewScore ~ New Rank" which is (sort of) a weighed average of PR and LR.
Don't be confused by the fancy words. It's more straightforward than it seems. In other words, this is what happens:
Well, if you are one of the few who knows all about chea.. hrm... optimizing for Google, then it means that the world just got a bit tougher. And then again, perhaps not, there are still tricks in the bag for the very, say, experienced. Nuff said, just a feeling, i will not elaborate on that.
It will become harder, it seems. If not for anything else, then only because you now have to pass not only one, but two independent ranking filters. In stead of optimizing for PR you will now have to optimize for both PR and LR.
Let's assume, as a very simple example only, that values 0,1,2 are the only values for both PR and LR: If You get a PR of 2 and a LR of 0, then the NewRank will be 0. If you get a PR of 0 you will not even make it to the top set of 1000 that will ever get a LR calculated. On the other hand, if you get a PR and a LR of 1 then you're better off than the guy having a top PR but no LR.
It's a device constructed to yield better results and (repeat quote):
... prevent any single author of web content from having too much of an impact on the ranking value.
I have been looking at the patent for a while and this intention could very well be inforced by it. That is, if "authorship" is equal to "domain ownership", or "some unspecified network of affiliated authors".
The LocalScore, or Local Rank, is both a filter and a ranking mechanism. It only considers pages among the 1000 or so selected by the PR.
Here, "same host" refers to three octets of the IP. That means the first three quarters of it. In other words, these IPs are the same host:
"Similar or affiliated hosts" refers to mirrors or other kinds of pages that (quote) "contain the same or nearly the same documents". This could be (quote) "determined through a manual search or by an automated web search that compares the contents at different hosts". Here's another patent number for the curious: 5,913,208 (June 15, 1999)
That is: Your on-site link structure means zero to LR. Linking to and from others on the same IP means zero. Near-duplicate pages means zero. Only one page from your "neigborhood", the single most relevant page, will be taken into account.
So, although you positively know that you have 1,234 inbound links, only the top "k" of these, that are not from "your neighborhood" or even "part of the same neigborhood" will count.
LocalRank = SUM(i=1-k) PR(BackSet(i))m
Again, the m is one of those annoying unknowns (quote): "the appropriate value at which m should be set varies based on the nature of the OldScore values" (OldScore being PR). It is stated, however, that (quote) "Typical values for m are, for example, one through three".
That's it. Really, it is. There's nothing more to the Local Rank than this.
This is getting a very long post you know... Well, luckily it's simple, the formula is here, it's as public as the rest, you can't have a patent that's also a secret (quote):
NewScore(x) = (a+LocalScore(x)/MaxLS)(b+OldScore(x)/MaxOS)
x being your page
a being some weight *
b being some weight *
MaxLS being maximum of the LocalScore values, or some treshold value if this is too small
MaxOS being maximum PR for the original set (the PR set)
* Isn't this just beautiful (quote): "The a and b values are constants, and, may be, for example, each equal to one"
Inbound links are still important. Very much so. But not just any inbound links, rather: It is important to have inbound links spread across a variety of unrelated sources.
It could be that blogger sites on blogger.com, tripod sites, web rings, and the like will see less impact in serps from crosslinking. Mirror sites, and some other types of affiliate programs, eg. the SearchKing type, will probably also suffer.
My primary advice from the algebra right now is to seek incoming links from "quality unrelated sites" yet still sites within the same subject area. Unrelated means: Sites that are not sharing the first three quarters of an IP or are in other ways affiliated or from "same neigborhood" (at the very least not affiliated in a structured manner). Quality means what it says.
Links from direct competitors will suddently have great value, as odd as it sounds.