Anand Rajaraman recently spoke with Peter Norvig, who revealed that:
their best machine learning algorithms is already as good as, and sometimes better than their current hand roled relevancy algorithmsbut they still prefer to use their hand roled algorithms because of hubris, and they feel that machine learning algorithms may be more inclined to have catastrophic errors on searches that do not look much like those in the training setI think a third piece (that you will never hear Google employees admit to) is that as the web's structure changes Google feels they have use FUD to police the web and help ensure Google has revenue entry points into important markets. In their 2007 Google search quality rater guidelines they used a typical Commission Junction link as an example of a sneaky redirect. It is doubtful that Google would ever do that with AdSense code or a Performics link (since they own those).
In the follow up post about his chat with Peter Norvig, Anand highlighted how Google measures relevancy. In the post he stated why Google prefers internal review data relative to using direct usage data:
Peter confirmed that Google does collect such [usage] data, and has scads of it stashed away on their clusters. However -- and here's the shocker -- these metrics are not very sensitive to new ranking models! When Google tries new ranking models, these metrics sometimes move, sometimes not, and never by much. In fact Google does not use such real usage data to tune their search ranking algorithm.
Exposure from top rankings already creates a self-reinforcing effect because of the power of defaults. Further tying in search usage data directly into relevancy might not add much benefit to searchers, especially as more people click on the first search result. Anand further explained why direct usage data is not used to refine Google's relevancy algorithms:
The first is that we have all been trained to trust Google and click on the first result no matter what. So ranking models that make slight changes in ranking may not produce significant swings in the measured usage data. The second, more interesting, factor is that users don't know what they're missing.
Check out our new Archives. For more information please check these links Medical Information Medical Information 1
Comments
Post A Comment