Statistics is not only for junkies. It is used by businesses, webmasters, mobile industry and practically everyone who has access to some sort of data, inventory or anything tangible or intangible. However, Statistics in general has been biased towards data one agency has and does not reflect reality in general. I could not think of any industry where statistics was not based on speculation or was based on what one company has to say.
In webmaster’s world, Alexa is a base of ranking in many cases. However, Alexa ranking is highly biased. A site which receives 300K page views per month is ranked higher than a site which receives 2 million visits. How is this possible? Because, Alexa uses their own data which is collected through their toolbars and other hidden methods to judge the ranking. If people don’t install a toolbar to visit your website your website will be ranked lower.
This is not a problem, it is a very big problem, because advertisers use this data to judge your site. And it is an irony because it does not in any way display reality.
Today, Hit Wise an analytics company says that 1 in 4 users in US are referred through Facebook. This data is not wrong at all, but the bigger question here is how many users in US are being tracked by Hit Wise? It is 1 in 4 or is it 1 in 100? Though these companies make their data public, it is really hard to sit and figure out data from hundreds of sources to put two and two together.
This is why statistics is a funny business. One company will walk up and say Facebook does well. Then a second will come in and say Google does well. Who should we believe in? People have to understand that this data is coming from individual companies which do not have access to more than 10% of the entire data (see I added my own 10% here) and in no way represents what is actually happening in the US or anywhere else.
So What Is The Big Problem?
The big problem in every industry including television is that no one controls a right to ascertain who visits which website or watches which television show. However, no central agency can step in here and assert that they want this data too. This is of course done to protect a user’s privacy. But several individual services/agencies track views across websites. One of the things these services track are referrers which is where this type of data comes from.
So assuming that a company tracks 100 million users a month, they will determine that out of it 30 million came from US and out of it 10 million came from Facebook and then say that 1 in 3 users in US came from Facebook. Now there is no problem in that data, it is only based on the data that company has and not the entire US internet traffic. So saying that 1 in 3 visits came from Facebook is true for that company. However, in a pure statistical view that data is speculation at it’s best when you take the entire traffic into consideration.
Imagine 10 companies have a survey to determine something. Each company can only target 100 people. Each of this company will have something different to report, however, it does not mean that each of these companies are right. However, when you take the data from all these companies and collectively measure it, it will have a true measure of what all of those people wanted.
What Is the Second Big Problem?
The second big problem is that none of the big companies including Google, Facebook or Twitter actually make their traffic public. So everything is left to speculation. Considering how many companies run analytics we will always see different results from different companies. If this is not a joke, what is it?
Agreed that we have to use some sort of data to judge popularity of websites and it has been done successfully throughout all these years, but the way we determine it is always based on partial data accumulated by one company or the other, which might be a fraction of the entire industry in general and will never lead to a fair conclusion.
If we need to really determine who is ruling in this industry, let’s just base data on one thing or nothing. There is no point in sitting up every morning to see that one stats company say that this one is the biggest one and then the next morning having to read some other site saying that this is the big one.
Let’s have these statistics company contribute basic data to a central agency/source and then let’s figure that out by adding two and two up. Let’s make this joke some sort of a reality where we can truly measure something rather than having conflicting reports come up everyday.
(Image Credit: http://www.stat.columbia.edu)