LocalRank: Google's 2 rankings & you

A discussion and explanation of the Google localrank patent dealing with ranking and re-ranking.

New patent means new way of ranking.

Abstract:
A discussion and explanation of the Google localrank patent dealing with ranking and re-ranking of web pages. This article does not intend to prove or suggest that Google is, or will be, employing this patent to the ranking of web pages. In stead, it offers discussion of the potential implications of such use for webmasters.
Published here on March 23, 2004.
Copyright: © 2003-2004 Claus Schmidt, clsc.net
Citations (quotes, not full-text copy) are considered fair use if accompanied by author name and a link to this web site or page.

The new Google patent

Google hijack?
You might want to read this: Page Hijack Exploit: 302, redirects and Google

In this thread of a similar name zafile pointed me towards a news.com article on a new Google patent. The link in the news.com article turned out to be wrong, but i found the patent anyway. Specifically, it is this one:

Ranking search results by reranking the results based on local inter-connectivity

You can find the details at the US Patent & Trademark Office, Patent Full-Text and Image Database - search for 6,526,440. Text from this patent -published to the public domain by USPTO - is also quoted in this post, where necessary to make a point. There are no copyrights on patent texts, but i have tried to keep quotes to a minimum anyway.

As the title suggests, this is a patent that deals with ranking and re-ranking of results.

The patent was granted on February 25, 2003, and filed January 30, 2001 - so Google researchers have known about it for at least two years already. Still, a patent grant means that the source description is published. This is the reason for (as well as the "Google News" of) this post.

I have spent a few hours studying it, and it clearly has implications for users of this forum. I'll get to the nitty-gritty of it, but let me point out the major points first.

It's not an easy read. And there are 7 unknowns as well as some scope for flexibility and judgement (either by trial-and-error or by manual or automated processes). It's really interesting though.

What is it?

It's a patent. Nothing more and nothing less. A description of some procedure for doing something. This does not mean that it will ever be put to use, as lots of patents are granted and never used. Patents don't come with a release date, but some elements of the confusion we are seeing now could be explained by this.

Chances are, however, that this one will be put to use. Having spent a few hours on it, i must say that it makes some sense. It is intended to provide better and more relevant results for users of the Google SE, and at the same time (i quote the patent text here) :

... to prevent any single author of web content from having too much of an impact on the ranking value.

Sounds serious, especially for the SEO community. And it probably is, too. But don't panic. Notice that it says "too much of an" and not "any". It's still a ranking derived from links, not a random Google rank.

What does it do?

We know about the Page Rank algorithm. This is the tool that Google uses to make sure that the pages it has indexed are displayed to the user with the most important pages first. Without being too specific it simply means that, for each and every page Google calculates some value that ranks this page relative to the other pages in the index.

This is something else. Rephrase: This is the same thing plus something else. It is, essentially, a new way to order the top results for any query.

The ultra-brief three-step version:

What the new patent implies is a ranking, then a reranking, then a weighting, and then a display. It goes something like this:

  1. The usual pagerank algo (or another suitable method) finds the top ranking (eg.) 1000 pages. The term for this is: the OldScore.
  2. Each page in this set is then going through a new ranking procedure, resulting in the LocalScore for this page.
  3. Finally, for each page, the LocalScore and the OldScore are normalized, assigned a weight, and then multiplied in order to yield the NewScore for that page.

In this process there will actually be "two ranks", or rather, there will be three: The initially ranked documents (OldScore ~ Page Rank), and the reranked documents (LocalScore ~ Local Rank). The serps will still show only one set of documents, but this set will be ranked according to the "NewScore ~ New Rank" which is (sort of) a weighed average of PR and LR.

Confused?

Don't be confused by the fancy words. It's more straightforward than it seems. In other words, this is what happens:

  1. you search for a keyword or phrase - as usual
  2. pagerank finds the top 1000 results (or so) - as usual
  3. localrank calculates a new rank for each page - this, and the rest is new
     
  4. each page now has two ranks (PR and LR)
     
  5. the two sets of ranks are multiplied using some weights.
  6. the multiplication gives a third rank.
     
  7. each page now has one rank; the NewRank (sort of equal to PR times LR)
     
  8. pages are sorted according to the NewRank
  9. and finally displayed with the best "NewRanking" ones on top.

- better?

What does it mean to me then?

Well, if you are one of the few who knows all about chea.. hrm... optimizing for Google, then it means that the world just got a bit tougher. And then again, perhaps not, there are still tricks in the bag for the very, say, experienced. Nuff said, just a feeling, i will not elaborate on that.

It will become harder, it seems. If not for anything else, then only because you now have to pass not only one, but two independent ranking filters. In stead of optimizing for PR you will now have to optimize for both PR and LR.

Let's assume, as a very simple example only, that values 0,1,2 are the only values for both PR and LR: If You get a PR of 2 and a LR of 0, then the NewRank will be 0. If you get a PR of 0 you will not even make it to the top set of 1000 that will ever get a LR calculated. On the other hand, if you get a PR and a LR of 1 then you're better off than the guy having a top PR but no LR.

Got it - what's that LR thing then?

It's a device constructed to yield better results and (repeat quote):

... prevent any single author of web content from having too much of an impact on the ranking value.

I have been looking at the patent for a while and this intention could very well be inforced by it. That is, if "authorship" is equal to "domain ownership", or "some unspecified network of affiliated authors".

Here goes:

The LocalScore, or Local Rank, is both a filter and a ranking mechanism. It only considers pages among the 1000 or so selected by the PR.

  1. The first step in calculating Local Rank for a page is to locate all pages that have outbound links to this page. All pages among the top 1000 that is.

  2. Next, all pages that are from the same host as<tis page or from "Similar or affiliated hosts" gets thrown away. Yes. By comparing any two documents within the set, the one having the smallest PR will always be thrown away, until there is only one document left from the (quote) "same host or similar" as the document that is currently being ranked.

    Here, "same host" refers to three octets of the IP. That means the first three quarters of it. In other words, these IPs are the same host:

    111.111.111.0
    111.111.111.255

    "Similar or affiliated hosts" refers to mirrors or other kinds of pages that (quote) "contain the same or nearly the same documents". This could be (quote) "determined through a manual search or by an automated web search that compares the contents at different hosts". Here's another patent number for the curious: 5,913,208 (June 15, 1999)

    That is: Your on-site link structure means zero to LR. Linking to and from others on the same IP means zero. Near-duplicate pages means zero. Only one page from your "neigborhood", the single most relevant page, will be taken into account.

  3. Now, the exact same procedure is repeated for each "host" in the set, until each "host/neigborhood" has only one page left in the set.

  4. After this (tough) filtering comes another. Each of the remaining pages have a PR value that they are sorted according to. The top k pages pass, the rest gets thrown away. Here "k" is (quote) "a predetermined number (e.g., 20)."

    So, although you positively know that you have 1,234 inbound links, only the top "k" of these, that are not from "your neighborhood" or even "part of the same neigborhood" will count.

  5. The remaining pages are called the "BackSet". Only at this stage can the LR be calculated. It's pretty straightforward, but then again, the filtering is tough (quote, although not verbatim, but deviations keep current context):
    LocalRank = SUM(i=1-k) PR(BackSet(i))m

    Again, the m is one of those annoying unknowns (quote): "the appropriate value at which m should be set varies based on the nature of the OldScore values" (OldScore being PR). It is stated, however, that (quote) "Typical values for m are, for example, one through three".

That's it. Really, it is. There's nothing more to the Local Rank than this.

What about the New Rank then?

This is getting a very long post you know... Well, luckily it's simple, the formula is here, it's as public as the rest, you can't have a patent that's also a secret (quote):

NewScore(x) = (a+LocalScore(x)/MaxLS)(b+OldScore(x)/MaxOS)

x being your page
a being some weight *
b being some weight *
MaxLS being maximum of the LocalScore values, or some treshold value if this is too small
MaxOS being maximum PR for the original set (the PR set)

* Isn't this just beautiful (quote): "The a and b values are constants, and, may be, for example, each equal to one"

Wrap up

Inbound links are still important. Very much so. But not just any inbound links, rather: It is important to have inbound links spread across a variety of unrelated sources.

It could be that blogger sites on blogger.com, tripod sites, web rings, and the like will see less impact in serps from crosslinking. Mirror sites, and some other types of affiliate programs, eg. the SearchKing type, will probably also suffer.

My primary advice from the algebra right now is to seek incoming links from "quality unrelated sites" yet still sites within the same subject area. Unrelated means: Sites that are not sharing the first three quarters of an IP or are in other ways affiliated or from "same neigborhood" (at the very least not affiliated in a structured manner). Quality means what it says.

Links from direct competitors will suddently have great value, as odd as it sounds.

 


Document URL, if printed: http(s)://clsc.net/research/articles/googles-2-rankings.htm

Originally posted by the author at 1:15 am on July 8, 2003 (utc +1) here (defunct URL): www.webmasterworld.com/forum3/15073.htm
This is a copy of the original post which has now been taken down. A lenghty discussion followed it, that discussion is not published here.