Following up on my post about Blekko Banning “Content Farm” sites because a some of their users marked the 20 sites as not useful, I decided to take a look at how big of an impact these blanket blacklistings put in the Blekko search engine.
As I mentioned in the original post, I noticed the absence of sites like eHow actually caused many more “no results found” for specific searches that no other website took the time to write about. So I decided to see how detrimental the absense of the 20 sites might be on the Blekko search index.
Blekko’s Big Black Hole
Okay, probably not the most attractive headline, but Blekko’s sweeping ban on the 20 sites I list here does have a large impact.
I performed a site:sitename.com command in Google to see how many pages Google has indexed that Blekko is now missing out on. The site: command isn’t the most accurate, but its the best I’ve got and should give some sense of how many pages Google decided to keep in its index. Google’s approach is to determine which pages on these sites deserve to rank for certain queries, while Blekko’s new approach is to ban them altogether.
After totaling up all the pages in Google’s index from those sites, I see that there are at least 322 million pages missing from Blekko’s search index. 23.2 million alone come from Demand Media’s eHow.com.
It’s hard to grasp how large of a hole this creates in Blekko’s search index. Performing the same method I see that in Google’s index, removing 322 million pages is the equivalent of two Wikipedias! Or, looking purely at Blekko, it is the equivalent of removing all pages from their index that include the word “because” or “since” (177 million results for each).
Speaking of Wikipedia, isn’t that a user-generated content farm that Blekko might want to remove? And what about Yahoo answers? What about ebay? Heck, you might as well remove Facebook and LinkedIn as well.