Finally: Google Wants Your Help To Identify Scraper Sites And Further Improve Their Algorithm

Scraper sites outranking the source in search results is not a new story.

panda-farmerThis has been one of the biggest problems of Google search, and it appears that no algorithmic update has been able to filter it out yet.

Meanwhile, Panda arrived.

If a site has a high volume of low quality content, lets hammer down the entire tree in one stroke. It’s time to move on to the domain levelinstead of page level- unique theory of Google’s infamous panda algorithmic update.

Honestly, I am not very sure how effective Panda is but without any doubt, I can conclude that this animal is color blind and sees things only in Black and White. Why the hell then scrapers are outranking my page for the content I have written? Okay, I agree that my site may be low quality but then again, this scraper site is a piece of junk and it should NEVER BE RANKED for Christ’s sake.

Before moving on, you may want to take a peek at the following   video where I have showed how scraper sites can beat authority sites who have good backlinks, social mentions, brand awareness and everything else.

I know.

This doesn’t make any sense at all but this is the way it has always been. Panda made things worse! (the wearer knows where the shoe pinches).

Here is another typical scenario when a number of spam sites are outranking a blog post of Matt Cutts, the guy who leads the web spam team at Google.

Algorithmic Changes For Detecting Scraper Sites

Until now, no special attention was given to detect scrapers and imposters who blindly rip off content from RSS feeds. There is only one universal algorithm, which does not make exceptions and it would continue to work on its own, for every other website on earth. This includes scrapers, imposters and blogs running on auto-pilot.

But things are soon going to change.

Matt Cutts just tweeted about a form where Google is requesting webmasters to help Google with the data about scraper sites. The form reads:

Google is testing algorithmic changes for scraper sites (especially blog scrapers). We are asking for examples, and may use data you submit to test and improve our algorithms


This is not to be confused with the webspam report in Google webmaster tools and the public DMCA form for copyright infringement. The data you submit will be used for testing purposes and the observations will be used to improve Google’s algorithm going forward.

Long story short: Google wants to find a pattern that can uniquely identify scraper sites. Millions of websites are born every single day   and it is impossible to scale the web at every instance of time. There should be a formula which will automatically detect scrapers and keep them at bay.


And to devise this algorithmic improvement, Google engineers need data. The more data and real examples they have, the more precisely they can improve their algorithm.

Does that mean Panda is in a way incompetent? A scraper site is a low quality one and it has been 6 months since Panda was released from her cage. If Panda worked the way they thought it would, everything would have been fine by now.

Why Google engineers are requesting data from webmasters around the globe?

Why now?

Refinement? Further improvements? Revisions? Iterations? Trial and error? You name it.

I am just wondering how the web is going to explode, if this new update falls flat on face. The way Panda did, and changed the fate of thousands of webmasters.


Published by

Amit Banerjee

Amit has been writing for Techie Buzz since early 2009 and keeps a close eye on web apps, Google and all things Tech. He also writes at his own tech blog, Ampercent. Follow him on Twitter @ amit_banerjee