Ergasiophobe's Error
25th May 2016
Some folk have said, with justification, that this article is dull. So as a TL;DR option I present this short explanation, added 20/9/2016.
Imagine a water pipe. Lots of water goes into it, but Ergasiophobe finds a report from a credible source on the web saying that 60% of the water is lost due to a leak in the pipe.
Wings over Scotland is at the far end of the pipe with a bucket. After some time he reports that he has caught 10 buckets of water.
Ergasiophobe says, "no, you've only really caught 4 buckets of water, because 60% of the water gets lost due to the leak".
The 60% figure Ergasiophobe has found may be correct, but Ergasiophobe is making a huge error by using the figure wrongly. The missing water has already been factored out of the calculation. Even though 60% of the water has leaked, Wings may still be being perfectly honest about the amount of buckets he caught.
I've been creating websites since the early 1990s, and it's been my day job now for almost two decades.
It's difficult to call yourself an 'expert' in web programming, as the breadth of knowledge required in the industry has grown at the same rate as the web itself and it's impossible to keep up with everything. However, I'm not embarrassed to say that I've picked up a fair bit of knowledge in a number of areas over the years.
Once of those areas is web stats or analytics. We used to talk about 'hits' when we discussed how much traffic we received on a website, but this was an oversimplification too far. In truth, it's impossible to give precise, definite figures on the number of people viewing your site.
Of course it's very important for publishers to be able to do so - they want to be able to tell advertisers what they're getting for their money, but in fact all publishers are only able to give guestimates - guestimates based on the best info available, but guestimates nonetheless.
This is nothing new - even in the days of print a newspaper might boast of having a million 'readers', but how does it know? How many people read a single copy of a newspaper once it's been bought? How many buy it only to do the crossword and line the bottom of the budgie cage?
Online, you can tell for certain how many times a document has been requested from your server, but it's impossible to determine whether every one of these requests came from a human being or a bot. Or whether the human being actually read the page, or whether three people did, all gathered round the screen. Or whether a single human requested the page on their phone's Twitter app, then opened it in the main browser app, then decided to read it on their PC instead.
So when publishers refer to ' online readership ', as they often do, they're really referring to a term used by web professionals called 'unique users'.
A Unique User is the industry's best guess at the number of readers a website has - the analytics system a website uses will take all the data at its disposal to determine if a request for a page is from someone who hasn't read it already. It'll use the IP address and what it knows about the device requesting it to decide that person's visit hasn't already been counted. But if you view an article on your work PC, then view it again on your tablet at home, you'll be counted as two 'unique users', even though in the real world you're only one 'reader'.
Every publisher uses this data when they talk of their online reach, whether selling advertising or just boasting about how well they're doing. It's not an accurate number, but it is the industry standard best guess, and it'd be ridiculous to single out one publisher and criticise them for doing so.
Ridiculous or not, that's what the odd Ergasiophobe does in his article about Wings Over Scotland's web statistics . [Link is to an archive, live copy can be found here .]
But there's a much more blatant error in his article.
Ergasiophobe likes to forensically take apart things which Wings Over Scotland has said and printed in his attempts to discredit him. You'd imagine that someone with that sort of mindset would be keen that his own writing was beyond reproach, but despite this error being pointed out to him, he has yet to correct his article.
Last night he tweeted this.
. @G4rve Your assertion that there is an error doesn't make it so. It just makes it an assertion based on your incomplete knowledge.
— Ergasiophobe (@Ergasiophobe) May 24, 2016
So let me explain at more length than Twitter allows.
A normal web page is simply a text file with a number of links to other resources. Resources like the JPG image files which your browser embeds in it, CSS stylesheets which set the colours and much more, videos, audio files, and increasing amounts of Javascript files. Javascript basically tells your browser what it should do with the page you've requested once it's loaded it. For instance, it might create the behaviour which opens a big image up in a box when you click a thumbnail.
Javascript is also the method which is used to trigger Google Analytics. By viewing this page (unless you've tailored your browser not to) you've triggered a request for the file at this address:
http://www.google-analytics.com/ga.js
This javascript file then tells your browser to pass details through to Google Analytics which stores lots of anonymized data about you and counts you as an entry in my website stats.
This is how Wings Over Scotland's stats are counted too.
In his article, Ergasiophobe says Wings claims "just under 300,000 unique visitors or 'readers'" in October 2015, and goes on to say that is "very impressive".
However, he then works to whittle those numbers down, which is after all the point of his article. He has done a little research, and has found this article by Incapsula which states that 62.3% of traffic to sites like Wings is generated by automated bots.
So Ergasiophobe subtracts 62.3% of Wings' reported unique users to give a figure 'exponentially less' [sic] of 95,400.
He does a little more whittling further on in the article, but the main basis of it is this calculation he has made, but it's completely wrong.
It's wrong because 90% or more of automated bots, do not trigger Google Analytics in the first place. When someone creates a bot it's for a purpose - maybe to harvest email addresses from websites, maybe to probe for vulnerabilities. It makes no sense at all for the bot to request a file from Google Analytics.
So Wings' figure of around 300,000, (which remember, comes from Google Analytics and is only triggered when requested), already discounts almost all automated bot traffic, and there is no need to apply the 62.3% reduction.
Ergasiophobe compounds the same error later in the article, talking about some of Wings' traffic being down to DDoS (Distributed Denial of Service) attacks. Such an attack is an attempt to overwhelm a website by sending more requests to the site than it can handle. But you'd have to be the world's stupidest DDoS programmer to set your system up to make requests to Google Analytics, halving your botnet's efficiency in one swoop.
So, Wings' figures are as accurate as anyone else's, which is to say a good guess.
As for Ergasiophobe, next time he tweets about a factual innaccuracy he believes he's found in Wings Over Scotland, ask him to put his own house in order.
☝