So, many thanks to the thousands of people who have now completed my Estimation Quiz. Special thanks to Michael Williams, who posted a link to del.icio.us, Dave Weeden, Chris Bertram of Crooked Timber, Nick Barlow, Chris Brooke, and many others for linking to the site, including a user of Metafilter -- which link drove most of the traffic. (I was expecting to have to wait months for enough people to complete the thing for the results to be interesting, as with my Political Survey. Instead it took two days. That should probably tell you as much as you need to know about my ability at estimating things.)
So, when I posted the link to the quiz, I said that I had an ulterior motive for building the thing. Michael Williams speculated that my purpose was to,
do something terrifying with the data.
I'm not sure whether the below will actually terrify you, but I'll try my best. (There's quite a lot to say, and some will have to follow tomorrow.)
For those who didn't do the quiz, I'll quote from the description:
How far is it from Edinburgh to Cardiff? When did the English Civil War break out? How long does light from the sun take to reach the Earth? You probably have some idea of the answers to questions like these -- or you could make a guess. But do you know when your guesses are right, and when they're wildly off?
This is a general knowledge quiz which tests you on how well you can answer questions like these -- and whether you know how good your guesses are.
For each question, you will give an answer in the form,
a ± ba should be your best guess at the answer. b is your idea of roughly how far off your guess might be. If you're absolutely sure of the answer, you can tick ``this is the exact answer''; but if you do, and you are wrong, your score will suffer.
You get points for how good your guess of a is, and whether b was an honest estimate of how wrong you were.
The quiz asks for estimates of thirty-one quantities. Most are straight general knowledge questions, for instance,
Others require more specialised knowledge, such as,
And some ask for things which few people are likely to know, but which are very easy to estimate, for instance:
(I hadn't realised that the term `carrier bag' isn't understood to mean a disposable plastic shopping bag outside the UK. I adjusted the wording of the question when I discovered people asking, ``what's a carrier bag?'' In fact the quiz as a whole was rather Anglocentric, basically because I expected it to be answered by this web log's half-dozen readers -- mostly in Britain -- and their friends. The results below incorporate data from about 3,000 responses.)
Note that some of the quantities -- like the three astronomical quantities above -- vary or aren't actually known exactly. More on this later.
So, the first question you might ask is, ``are people actually any good at estimating things?'' The answer is that... it depends.
For some quantities -- especially ones which some respondents actually do know exactly -- the crowd's wisdom isn't bad. For instance, asked to estimate the latitude of London (51.5°N, but ±0.1° or so because London's quite big), the results look like this:
What I've done in this and the plots below is to combine the results from every answer to the question. Every answer x±dx is treated as specifying a normal distribution having mean a and standard deviation b; the combined distribution is the sum of all those distributions, divided by the number of responses. Answers which are exact (i.e., with b = 0) appear as single, thin spikes on this plot, like those at 0°, 80°, 90° and 100°. The red curve tells you, roughly, `how probable is this value for the latitude, according to the combined opinion of the respondents?' The blue curve is the corresponding cumulative distribution; it tells you `what fraction of the respondents think that the latitude is smaller than this value?'
So, the peak of the distribution -- the mode, the most frequent value cited by the respondents -- the middle (median) value, and the mean, all lie close to the correct answer; and the distribution is quite strongly peaked -- for comparison, the black curve shows the single normal distribution having the same mean and variance as the red curve. This obviously isn't an efficient way to find out the latitude of London -- to design the quiz, I looked it up in Google just like anyone else would (and just like some of the respondents no doubt did, despite strictures against cheating in the rubric) -- but at least the technique works.
Note also that about 3% of people thought that the latitude of London was 0° -- suggesting that they're confusing latitude and longitude -- and that about 8% of them thought that the latitude was more than 90°N. Well, they're in good company. Even some trained economists don't understand what latitude is.
Similarly, despite attempts by the gutter press to incite mass hysteria about immigration and asylum, some respondents had a decent idea of how much money ``scrounging'' asylum seekers receive in benefit: £37.77 per week:
but many others did not: more than 80% overestimate the amount; about 50% believing that asylum seekers receive £100 per week or more, with 16% believing that they receive more than £300 per week, an error of more than a factor of ten. (Note, of course, that these results do not come from anything which resembles a representative sample, especially not a representative sample of the UK population. Nevertheless I was shocked that about 3% think asylum seekers receive more than £1,000 per week, though clearly some of these -- like the person who answered `2345' or `323232' were taking the piss.)
One thing you might ponder at this point is whether asking people to estimate their uncertainties actually makes any difference. Here's a version of the above plot, with an added purple curve -- the empirical cumulative distribution of the answers, ignoring uncertainties -- and a brown curve, giving a smoothed (`kernel density') approximation to the distribution of answers:
While the two distributions are fairly similar, ignoring the uncertainty information clearly decreases the accuracy of the estimate. (Note also how the brown distribution is peaked at round numbers; this isn't true of the distribution incorporating uncertainty information, because most people who pick £50 or £100 or whatever obviously know that they're guessing, and put in sensible error bounds.)
The crowds turns out to be pretty decent at guessing dates. For instance, asked to identify the start of the English Civil War (1642), they came up with,
Note that, as Sellar and Yeatman pointed out a long time ago, for many people 1066 is the only memorable date in English history, and they're prepared to state it without uncertainty as the date of any significant event (many people gave 1066 as the date of the 1707 Act of Union, too). I don't know whether the same is true of 1861 for Americans, or whether that was a misunderstanding about precisely which Civil War was in question here. Asked to identify the date of the first space flight by a woman (1963), respondents suffered the same problem:
Here other popular choices -- memorable years in space history, so to speak -- included 1961, the year of the first human spaceflight, and 1986, the year of the `Challenger' space shuttle accident which killed seven astronauts. Given this I was slightly surprised that 1969 (the year of the first moon landing) wasn't a more popular choice.
(The results for the question on the height of the Eiffel Tower had a peak at 1,789 feet. This suggests the following splendid thought process: `the thing was built by the French to celebrate something French; that can only be the French Revolution; that happened in 1789; so I must be expected to know that the thing was built to be 1,789 feet high'. It's a nice idea -- full marks for imagination -- but sadly (a) the technology of the time wasn't up to building a 1,789-foot-high tower, and (b) it would have had to be a round number in meters, you parochial bastards! Many others said that the tower was 300 feet high, suggesting a units confusion which may have been partly my fault. On another question, asked to give the time taken for light from the Sun to reach the Earth, this `nice round number' effect led many people to state with absolute certainty that the time was some integer number of minutes.)
Moving on, it's hard to describe the `crowd's' response to other questions as anything like `wise'. `Haphazard' is nearer the mark. People know the distance from the earth to the moon surprisingly well, but haven't a clue about the length of the Nile and very little idea of the distance from Edinburgh to Cardiff. They have no idea at all about the GDP of the UK:
the most popular answer being about a tenth of the true value. Asked to estimate the number of words in Pride and Prejudice -- I was going to ask about `a typical novel', but of course there's no such thing, so I had to pick one everyone would have heard of -- left the crowd totally stumped:
with about two-thirds underestimating the length of the novel and many believing that it's only ten thousand words long. The shape of these we-haven't-a-clue distributions seems to be pretty characteristic; in another example, asked to estimate the maximum take-off weight of a 747-400 airliner, we get the following:
Now, I suspect that some of the people who estimated ten tonnes (the weight of enough fuel for an hour's flight, in the ~230-tonne 'plane) thought the question was asking for the weight of the passengers -- this is one of several cases, like the plastic bag one -- where I didn't word the questions as clearly as I should have; but even the people who got the order of magnitude about right don't seem to have a very good idea of what they're grasping for. I've plotted the curve for Benford's law (which gives the frequency of leading digits of numbers drawn from a scale-invariant distribution) for 100, 200, 300, ... tonnes but I'm not sure this really applies here.
Straight estimation questions -- many of the above can be answered by trying to plug in plausible numbers, but they are quantities which you could reasonably know -- show the same pattern. Asked to estimate the number of petrol stations in the UK (about 12,000) gave this:
-- the mode is 1,000 (a quantity which would leave each station to be shared by 58,000 people). An even easier question -- how many plastic shopping bags are used every year in Australia (~20 million Australians buying something like one item in a carrier bag every day gives about 7 billion bags per year) -- left respondents completely adrift:
-- the mean is out by about a factor of ten here.
So, that concludes today's foray through slightly eccentric statistics. I'll leave the last few bits and a summary for tomorrow (hopefully), including some comments on the scoring of the quiz, which many people (quite rightly) thought was rather silly.
Not much to say here, but I'm going to point you all at my Estimation Quiz, which might entertain some of you. You should bear in mind that it is quite difficult -- even after having compiled the questions, I don't usually score much more than 80% on it. (That may just tell you about my memory, of course.) Other guinea-pigs did rather worse.
Comments and questions appreciated. Feel free to link to it, if you want.
(I should say that there is an ulterior motive for building this quiz. But I'll tell you about that later. Also, sorry for not writing a more exciting piece. Later, I hope.)
So, long time no 'blog. Sorry about that (and isn't ``'blog'' an ugly word?). Meanwhile, lots of interesting things have happened, but instead I thought I'd write something about this article by Oliver Kamm, some of whose rants are now published in The Times (`a British tabloid', as I would say if I were obeying the Economist's style guide; I've linked to the piece on Oliver's site, rather than on the newspaper's, since The Times doesn't have a proper archive any more).
Oliver has decided to criticise the comments of the Information Commisssioner, Richard Thomas, who the other day -- in an interview with the very same Times -- stated that,
My anxiety is that we don't sleepwalk into a surveillance society where much more information is collected about people, accessible to far more people shared across many more boundaries than British society would feel comfortable with.
and pointed out how dangerous this could be:
Some of my counterparts in Eastern Europe, in Spain, have experienced in the last century what can happen when government gets too powerful and has too much information on citizens. When everyone knows everything about everybody else and the Government has got massive files, whether manual or computerised.
These are fair and reasonable concerns, as has been pointed out here before, most notably by Chris Williams. Roughly speaking, the usual reasons for a country to get (for instance) an ID card scheme are one or more of (a) conquered by the Germans; (b) conquered by the Russians; (c) subjugated by a home-grown autocratic government. Richard Thomas points out that the experience of countries which have suffered those unhappy fates should inform our own.
Oliver, of course, thinks that Richard Thomas is being silly, chiefly it seems because the text of his interview contained a grammatical error:
``I don't want to start talking paranoia language,'' said Mr. Thomas, his indifference between noun and adjective serving as a cipher for his wider confusions....
Now, Oliver Kamm is by his own admission an objective commentator, so we can be certain that he has checked that these were the actual words of the Information Commissioner and that a more grammatical statement (for instance, ``... talking the language of paranoia...'') had not been rearranged by a subeditor or the journalist who wrote the piece. In any case I am sure that The Times never makes such typographical mistakes.
Anyway, he doesn't really explain what he thinks that Thomas's `confusions' are (so far as I understand it he seems to think that anybody who makes a comparison between what goes on in happy democratic Britain and a bad thing that happened in another country is bad evil and wrong). Instead Oliver goes on to make the usual unthinking defence of the schemes -- the National Identity Register, the ID card, a separate population register and the Children Act database -- that Thomas warns against. He writes,
How much information a democracy should amass on its citizens is plainly important. When we know that some British citizens support terrorist groups, then the balance between personal liberty and national security may need to be reassessed. There is a plausible case that better information would reduce the State's intrusiveness for the peaceable, while circumscribing the activities of the malevolent. More widely, as governments have duties beyond public order and national security, welfare, for example, they require accurate records of earnings and employment.
The last part especially suggests that he doesn't actually know anything about the schemes which Thomas has criticised. The National Identity Register won't contain information on earnings and employment; indeed, that information is already held by the Inland Revenue. The arguments made for the `nothing to hide' position may be `plausible' for Kamm, but it's rare to see them actually articulated, and Oliver of course doesn't bother.
This is actually an example of a wider problem with Kamm's style which this piece nicely illustrates. In The Times, (though not, oddly, on his web log) his piece was subtitled,
Anxiety about data protection diverts the West from the genuine threat.
which is a fair summary and will save you the bother of reading it. It's pretty bog-standard Kamm stuff -- complete with slightly obscure cultural reference, portentous tone, grammatical sniping, the inevitable bitching about a Liberal Democrat, and a rhetorical attack on Soviet communism, only fifteen years too late -- and in sum is pretty content-free. He finishes up with a return to his conversational strange attractor -- the `genuine threat' of the subtitle, which is, of course....
... terrorists and bloody ``weapons of mass destruction'' again: (emphasis mine)
The forces of theocratic totalitarianism aim at the destruction of Western civilisation and its replacement by a restored Caliphate. Armed with technologies that they must never secure, they could in principle inflict grievous harm on us and our way of life. The more our public servants talk of totalitarianism without really meaning it, the less serious will that threat be taken. That really would be, as the Information Commissioner put it, ``a danger, yes''.
-- though, rather than talking about the vague and meaningless category of `weapons of mass destruction', he's now talking even more vaguely about completely unnamed `technologies'. A long time ago Scott Adams wrote, in Dilbert,
Stupidity is like nuclear power: it can be used for good or evil.
-- and this, coincidentally, is about all that chemical, biological and nuclear weapons have in common. As I've said before, anybody who talks seriously about `weapons of mass destruction' without saying whether they are talking about nuclear, chemical or biological weapons either doesn't know what they're talking about, or is trying to mislead. As for his strange attractor, Oliver makes no case for his idea that -- completely reasonable -- concerns about data protection are in any way distracting us from concerns about terrorists and the vague `technologies' about which he is so worried but too lazy to name. It's hard to see how he could without saying anything specific about either subject, which would be an unimaginable break from his usual style.
In other news, I see that the Police have now charged some of the alleged terrorists they arrested last week with various offences, including the marvellously-named,
conspiracy to commit a public nuisance by using radioactive material, toxic gas, chemicals or explosives,
-- yes, I suppose that would be a nuisance (though note how the `WMD' idiocy has burrowed its way into the law) -- and the completely ludicrous,
possession of information of a kind likely to be useful to a person committing or preparing an act of terrorism,
because, apparently, they had `a reconnaissance plan' (presumably Police for `a map') of some financial institutions in New York, and `an extract of the Terrorist's Handbook'. (I had a look on Amazon but, to my surprise, couldn't find a copy of the Terrorist's Handbook. Perhaps they meant the Anarchist's Cookbook -- let's hope that's the kind of material on which the terrorists are relying!)
For those who haven't seen this latest piece of legal lunacy, the offence of `possessing information...' is defined in s.58 of the Terrorism Act 2000:
(1) A person commits an offence if--
(a) he collects or makes a record of information of a kind likely to be useful to a person committing or preparing an act of terrorism, or
(b) he possesses a document or record containing information of that kind.
(2) In this section "record" includes a photographic or electronic record.
What, one might wonder, is ``information of a kind likely to be useful to a person committing or preparing an act of terrorism''? So far as I can tell, the answer is, `almost anything'. For instance, the other day, the Government sent me a paper copy of their booklet, Preparing for Emergencies. (Coincidentally, the leaflet arrived on my birthday. I was jolly chuffed that the Government had given me the gift of safety from terrorists, I can tell you.)
Much of the information in this booklet is ``of a kind likely to be useful to a person committing or preparing an act of terrorism.'' For instance, it would tell the prospective terrorist how people have been trained to behave when a bomb goes off, which would be useful information to a terrorist who wanted to know where to plant a second bomb. Of course, now I've told you that, this page is information ``of a kind likely to be useful to a person committing or preparing an act of terrorism.'' Of course, there is a get-out clause for the person caught by the Police in possession of a copy of `preparing for emergencies':
(3) It is a defence for a person charged with an offence under this section to prove that he had a reasonable excuse for his action or possession.
-- you have to prove to the authorities that you had a `reasonable excuse'. This is, suffice it to say, completely nuts. How on earth can you prove anything about why you have information in your possession?
Anyway, it's promising that the Police have actually charged the latest bunch of alleged terrorists with something. Presumably this means that they're all British citizens, since otherwise they could just be locked up in Belmarsh indefinitely, saving the expense and bother of a trial. It will be even more interesting if there is any actual evidence against them. I'm guessing that Oliver didn't know about this in advance, but we anyway we should admire his luck in talking about the `technologies that [the terrorists] must never secure' at such a propitious moment....
Both Anthony and Francis argued, in reference to my comments on Tory `bed blockers' and claims made about them that, rather than analysing the data in terms of the age of the individual MPs, I should have used the length of their service in Parliament. In fact, this was what I was originally intending to do, but I couldn't find a good reference which collated the lengths of service of the various MPs; I assumed that their age would be a reasonable proxy. Since I wrote that, Anthony has very kindly complied with my suggestion and provided me with a table of the relevant data, for which gift many thanks are due.
It turns out that (as you'd expect), age and length of service of Tory MPs are pretty well-correlated: (this and later plots ignore various special cases -- speakers, ill MPs and outliers)
but treating the data in terms of length of service does change the results slightly. While there are no significant correlations between MP ages and their performance on the indicators for which I have data, there are with length of service. In particular, both the number of written questions asked and attendance at divisions are inversely correlated with length of service -- that is, MPs who've been in the Commons longer ask fewer questions and vote less often:
(Slightly surprisingly, although the plot -- not shown -- looks convincing, it turns out that there's no significant variation of Fax Your MP response rate with length of service.)
There's not a whole lot to say about this, really. It makes the Torygraph's story a little more plausible, though I'd be cautious about broad-brush use of these indicators. Anthony suggested in email that I see whether it makes any difference whether an MP has served several disconnected stretches or one continuous term; from what I've looked at, it doesn't. If you want a causal theory to inform these vague statistical stumblings, it would be as well to start with the theory (expounded, I think, in Yes, Minister, though I can't find the quote at the moment) that a typical government consists of three-hundred-and-fifty-odd MPs. About a hundred will be too old to be Ministers; about a hundred will be too young; this leaves about a hundred and fifty to fill about a hundred Ministerial posts.
Some of these will be too useless for the job, leaving the government with rather little choice in matters. As MPs age, they risk passing from `eligible' to `elderly', or fucking up badly enough that they become `useless'. (In Matthew Parris's autobiography, he recalls being called to one side by a whip and told that, had he not forgotten to vote in an important division, he would probably have achieved office of some sort; as it was, he was left to languish on the back benches....) As MPs carry on through their Parliamentary careers, with the prospect of high office receding, the incentives to attend divisions, speak and hope to be noticed, and ask endless written questions with the hope of uncovering hidden Governmental scandal must recede with it.
Which is a plausible enough theory if you like that sort of thing. I'm sure if the correlation had come out the other way -- it's hardly striking, after all -- I could have come up with something equally plausible-sounding about the loyalty and work ethic of the long-serving, or some other such nonsense.
This is all done with wwwitter.
Copyright (c) Chris Lightfoot; available under a Creative Commons License. Comments, if any, copyright (c) contributors and available under the same license.