January 2004

Try Political Survey 2005!
Or complete my older Political Survey!
And a sillier, much-less-involved survey: Am I Sig Or Not?
General Election 2005 Estimation Quiz
and the original Estimation Quiz
Links and things
Make me happy: buy me books!

28 January, 2004: Silliness

It's snowing. I don't approve, but of course doing so is not a necessary prerequisite to mindless photography:

And in other news.... Earlier, Tom wrote me an email in which he complained that the quote at the end of an email I'd sent,

drags your sig average down.

Being nothing if not quick to respond to criticism, I therefore present Am I Sig Or Not? in which you, my half-dozen readers, get to choose which of the quotes that get stuck on the end of my emails you like most. Go mad! Vote early, vote often.
27 January, 2004: Pig out

So, the wisdom of Microsoft has visited upon us yet another email worm, which will no doubt please the `anti-virus' software vendors who rely on this sort of rubbish to create a market for their software. Microsoft will presumably squash these people in a little while by flogging its own `anti-virus' product; as an alternative, it could try closing some of the security holes in the Microsoft internet applications, the most severe of which appears to be the way that Windows will guess the type of a file in a web page or email attachment when it's opened, so that an executable program with (say) a .zip extension is presented to the user as a compressed archive but run by the system as an executable.

This latest worm is, apparently, the work of some teenage idiot who's trying to start a denial-of-service attack on the web site of SCO, the internet's favourite litigants. It's nice to see Linux advocates doing their best to make the Free software community look mature. (Alternative conspiracy theories welcome.)

This worm doesn't seem to be too bad, compared to some of the previous ones. Here's the traffic to my personal mailbox: (this is the number of emails having the right size range that were stopped by the spam checker, so it might be a slight overestimate)

and on a server which handles mail for a couple of hundred people: (this was done by looking for mails of the proper size range and with no message-ID in the SMTP logs, which seemed the least privacy-invading way to do this; looking at data from before the worm appeared, this seems to give one or two false positives per day, so this plot is probably reasonably accurate)

So, this thing pales in comparison to the amount of spam and forged bounces I receive (hundreds per day), and is nothing really to worry about (unless you're still using Microsoft Windows, but it's hardly my place to mourn for those who will do nothing to help themselves).

But Bill Gates is worried, proposing

... some solutions to stop [people] receiving unwanted messages.

and predicting that

In the next 18 months, spam will no longer be a serious problem....

(His statements were made at the Davos shindig for very rich semi-celebrities. Err, I mean `global leaders'. Whatever.)

Gates didn't really say very much, but apparently his death-to-spam scheme will be based on one or several of,
- Small monetary payments for sending mail, the idea being that a user sending personal email could easily afford a fraction of a penny per message, but a spammer could not afford tens of thousands of pounds to send millions of emails.
- Turing-test-style tasks which must be completed by email senders before mail is delivered to its recipients, the idea being that the recipient of an email can then be certain that it was sent by a person.
- Computational tasks which must be completed by the sending machine (for instance, factorising a large number), which would be trivial for small-scale email but prohibitive for sending millions of emails.
The first and last are designed to stop bulk mail; the second to stop automatic mail. (Oddly, none of them resemble a strategy which turns out -- surprisingly -- to be effective in practice, which is filtering based on the content of the mail. The reason that the effectiveness of content-based filtering is surprising is that, on the face of it, spam is defined by the fact that it's sent automatically in bulk, rather than what it's about. But typically its content contains enough clues for techniques like Paul Graham's to work.)

Although each of the proposed schemes sounds pretty good in principle, they all suffer from a common flaw: there's a lot of legitimate bulk and automatic mail. For instance, every time Amazon dispatches a book, it notifies the customer by email (as must Microsoft properties like Expedia and so forth); Microsoft itself sends out bulletins by email to numerous customers, and email discussion lists distribute messages among thousands of subscribers. Of course, you could imagine a system in which some central server provided a token which could be attached to an email in return for a payment or whatever for recipients could verify -- and which would need to be done only once per message, rather than once per recipient. But that doesn't help much, because a spammer could do exactly the same. And consider the computational load on a system like, say, Hotmail if it had to factor 100-digit numbers every time it sent an email.

Another story suggests that recipients could choose whether or not to charge the sender of a mail, based on whether they judged a mail to be spam. This is a slightly more interesting idea. Presumably the idea is that the sender of a mail would get a token which identifies them as the sender, and put it in their mail. The recipient could test for the presence of this token -- presumably assuming that if it is present the mail is legitimate -- and if it turns out not to be, then the sender's account is debited by an amount set by the recipient. Evidently there would have to be a protocol to discover the fee the recipient has set, since the recipient could arbitrarily decide whether or not to charge.

(Note that this shares the same problems as many of the other schemes, but in a slightly more subtle way. For instance, it can't be used when sending mail to mailing lists, since the sender may have no way of knowing who is on the list and therefore what they may be charged for their mail. Equally, it doesn't help much for Amazon's automated emails. In other respects it has a certain elegance.)

But from Microsoft's perspective this is a great scheme, because it requires a huge and complicated directory of senders and recipients with which their software -- and only their software -- is integrated. They can patent the `technology' and try to stop anybody else from writing software which consults the directory, and then advertise Outlook and Outlook Express as being spam-free email clients. From the perspective of a near-monopoly vendor, making email proprietary is clearly an excellent move. If they think their strategy through properly, they'll be able to make the new scheme compatible with existing email, and build support for it into Microsoft clients and servers, with the intention of convincing desktop users to stick with Windows, and server users to move to Windows.

The obvious response -- to get to market first with an open, non-Microsoft version of a similar idea -- has a number of flaws. The first is that, if they have any sense, Microsoft will patent the technique in use. Nowadays the existence of prior art (or even an existing patent on the same thing) doesn't seem to affect the patentability of an idea, so we can assume that Microsoft won't have any trouble doing so. And the second is that, even if a competing system were up and running, to win any market share it would have to be integrated with Microsoft email products -- which would require Microsoft's consent to do well. (Microsoft Outlook Express doesn't even have a `plug-in' architecture, as I recall, making integrating third-party software with it a total pain.)

Expect more bad news on this front in the future.
23 January, 2004: After-dinner discussion

Here's another thing that annoys me:

Home Office figures show that violent crime has risen 14%

-- as reported by BBC radio all day, and in stories on their web site like this one. No doubt the Daily Mail had an equally fair and balanced piece on the subject. (John Band has also written about the crime figures, and about CD Wow too. Oh well. I suppose the ``'blogosphere'' is nothing if not unoriginal.)

`Home Office figures', of course, show no such thing. Total crime rates are measured by the British Crime Survey, which asks a 40,000-person sample of the whole population whether they have been victims of crime. This is like an opinion poll: by asking a representative sample of people what crime they've experienced, an accurate estimate of total crime rates in the whole country can be made.

By contrast, the figures which show a 14% increase are rates of crime reported to the police. Of course, not all crime is reported to the police, for one reason or another. The reported crime figures give a self-selecting sample of all crime, and therefore are not an accurate estimate of total crime levels. Of course, those who have an interest in suggesting that crime is rising -- the media, who like sensation, and opposition politicians, who want a stick with which to beat the government -- pick whichever figures look worse; the government prefer the British Crime Survey figures, but (for instance) the BBC report this by saying,

Ministers prefer a different set of statistics....

without explaining either (a) why they prefer them, or (b) that they are more accurate.

Another comment which has been made is that, well, whatever the statistics, the most serious violent crime, murder, has risen over the past year. (Of course, murder doesn't figure in the British Crime Survey, since none of the 40,000 people in the sample can have been murdered. But murders are typically reported, so the recorded-crime figures are likely accurate here.)

There are about 850 homicides per year in the UK (of which something like a third result in convictions for murder; others are manslaughter, infanticide, or committed by somebody who is insane or later kills themself, or whatever). But last year was unusual: because of the activities of the late Dr. Harold Shipman, as revealed by the public inquiry, 172 deaths from previous years have now been recorded as murder in 2003, although they had taken place much earlier. This is about 17% of the total, a significant percentage.

Excluding these, the homicide rate for 2002/3 was about the same as that the year before; nevertheless, homicide has become much more common since about 1960:

-- and this isn't just related to increasing population (more people: more killings). In fact, the total number of homicides doesn't seem to be closely related to the population:

and this is reflected in the plot of homicides per million population:

(The 1919/20 discontinuity in that plot is presumably something to do with the way that population series -- which comes from the splendid What was the UK GDP Then? -- represents casualties of the First World War. It doesn't affect the plot particularly. Update: No, I was being stupid. As Anthony Wells has pointed out, the drop in 1920 is a result of the Partition of Ireland. More observant readers will recall that this took place in 1922, not 1920; however, the study from the UK GDP site explains that the population figures from 1920 onwards are given for Great Britain and Northern Ireland only. The moral? It's worth digging out this stuff to avoid looking foolish, even if you have to wade through a 98-page document to discover it.... Note also that the population series is for the whole UK, whereas the homicide figures are for England and Wales. So the per-head figures are too low, but show the correct form.)

A very interesting question is why the murder rate should have risen during this period. There are lots of theories. Gun nuts like to say that it's a result of increasing gun control; Bufton-Tuftons like to say that it's a result of the abolition of capital punishment; racists that it's to do with non-white immigration; bleeding-heart liberals that it's a result of economic inequality; prohibitionists think it's to do with drugs; puritans that it's to do with booze; trades unionists that it's to do with unemployment; and so forth. Some commentators combine several of these arguments into an incoherent whole, as with this opinion piece by Mark Steyn in (what else) the Telegraph.

The gun control argument is pretty difficult to buy into. Most homicide in England and Wales is committed with a sharp implement, and there's no evidence of large numbers of people scaring off knife-wielding attackers with guns. In any case, handguns -- the most convenient for shooting people -- have only been controlled relatively recently, and the 160,000 handguns surrendered under the 1997 Firearms Act were enough to arm only about one in 250 of the adult population, even if they had been distributed uniformly (they weren't, since many gun enthusiasts preferred to to have several...). Other data show increasing homicide rates with increasing gun ownership.

The abolition of capital punishment could have had something to do with it (hangings stopped in 1964), but it's not clear that the deterrent effect of a life sentence is much less than that of a hanging, and very few homicides are committed by repeat offenders (six out of about 500 convictions in 2000/1, of which two were homicides committed in gaol) so violent people are being removed from circulation fairly effectively even without killing them.

Unemployment has been in decline since the early 1980s, at least. Income inequality (as measured by the Gini coefficient) has been rising since the mid-1970s (see, for instance, the charts in this paper by Alderson and Nielsen). Daly and Wilson show (for instance, in this 1997 paper in the BMJ) that rates of homicide are significantly correlated with life expectancy (correcting for murder victims...) and income inequality among a set of data for muder rates in different parts of Chicago. (Caveat: they use stepwise regression which will overestimate the correlations of some variables.)

But in another study of US and Canadian data, the correlations observed between income inequality and homicide rates over time were much less impressive. Although the argument that income inequality -- by increasing the potential rewards of killing for the least well-off in society -- is sort-of plausible, most murders in the UK don't seem to be done for personal material gain. (An alternative is, perhaps, that the Gini coefficient is a `proxy' for something else; for instance, income inequality will differ between urban and rural areas, as will homicide rates.)

(As an aside, Daly and Wilson are also responsible for a lovely bit of research in which they showed that men's natural `discounting rate' -- which measures the present value of a future material reward -- rises when subjects are shown pictures of an attractive woman just before the experiment. They got the pictures from Am I Hot Or Not, which just goes to show that the most pointless internet site will have its day in the sun. There's no analogous discounting effect for women, or for men shown pictures of material goods -- cars, in their example. Worth a read.)

At this point it's worth reading Ted Goertzel's Econometric Modelling as Junk Science, (or see the longer version which has graphs and tables, but is in `Word' format) in which he argues that many of these types of `econometric' (meaning `regression') studies aren't much use, since they have no predictive value. In the case of homicide rates, I could obviously pick any variable which increased in the years 1960--2000 but not before then, regress the homicide rate against it, find a significant positive correlation and declare that it was the `cause' of the increased rate. But I wouldn't have learned much by doing so unless I could show that it predicted homicide rates in cases which I hadn't used to fit the model. Gortzel concludes that, despite a further quarter-century or so of work,

We are no closer to having a useful mathematical model for predicting homicide rates than we were when Ehrlich published his paper in 1975.

Ehrlich concluded that capital punishment deters homicides using a complicated regression model. Other workers used the same techniques and data to reach exactly the opposite conclusion, and a panel of experts reviewing the dispute opined that,

the emergence of a definitive behavioral study lying to rest all controversy about the behavioral effects of deterrence policies should not be expected.

-- not, perhaps, a surprise. As an alternative, Gary LaFree takes an alternative, `longitudinal' (time-domain) approach in this 1999 review, but concludes, rather unhelpfully,

The greatest impediment to longitudinal analysis is simple data availability.

He's happy that crime rates in the US fell in the 1990s, but isn't able to offer an explanation of why.

So as not to ramble on any further, I'll finish by saying that I have no idea why the homicide rate in England and Wales is rising and has been doing so since the 1960s, and I'm not certain that anyone else does either (though Peter Hitchens claims to with monotonous regularity on Start The Week, which really isn't a good thing on a Monday morning). I haven't found anything which answers the question (nor even approaches it in a very satisfactory way). If any of my half-dozen readers have any suggestions for further reading, I'd love to hear them....
22 January, 2004: Running on time?

There is a rather silly piece in the Guardian's `Online' section by Michael Cross: (my attention was drawn to it by `Spy Blog', which also has some discussion of it)

Anyone arguing that Britain shouldn't repair its railways because a future regime might transport undesirables to death camps by train would be dismissed as a nutter. Yet apparently intelligent people trot out the same argument against proposals to repair the state's outdated data infrastructure.

These self-appointed guardians say we should oppose the proposed national population register because of the use to which a totalitarian government might put it. Likewise identity cards and, with better reason for concern, DNA databases.

[...]

But they [ID card opponents, referred to as `pygmies' in Blunkett fashion] should be honest about what they want, which is the end of the state's role in health and social care.

To start with, the analogy is stupid; railways are genuinely useful as well as being potential aids to totalitarianism. ID cards aren't useful, for reasons rehearsed before.

Moving on, I note that Cross's previous piece was spent bemoaning yet another government IT cockup. Probably his `Public Domain' column on public-sector IT issues gets a lot of mileage out of that sort of thing. Yet he apparently thinks that not only will ID cards work as designed (and presumably at acceptable cost), but that they will provide an `underlying joined-up infrastructure' for `e-government'. If that were true, Cross should be able to explain how and why, rather than just stringing together a few buzzwords. He hasn't managed, which isn't a surprise, given that the government also seem clueless on the purpose of the scheme. (Glossary: here, `infrastructure' means `a list of people', and `e-government' means `a website run by the government. `Joined-up' means `expensive' and `underlying' probably doesn't mean anything.)

And the idea that people who are opposed to ID cards `[want] the end of the state's role in health and social care' is probably the most ridiculous straw man I've seen in, oh, weeks (not counting the one in the first paragraph of the story). Just to get this straight, ID cards have nothing to do with the state's rôle in health and social care. I'm not sure what ID cards are supposed to be for, but they sure as hell aren't going to help the state provide `health and social care'.

While Cross believes that it's a `bizarre accident' that Britain has no national register of population, and that providing services on this basis is `unsustainable', there's no evidence for either claim. Indeed, our last national register of population (during the War) was hardly abolished by accident, and we seem to be doing a perfectly adequate job of providing services without such a register. While there are many complaints about (say) the NHS, very few of them result from having to do without `a list of customers'. (Ignore for a moment the appearance in the Guardian the claim that beneficiaries of social services are `customers'.)

Also worth reading is this evidence to the Home Affairs Committee from various Home Office worthies, which includes the following classic exchange:

Mr Prosser: Have you considered taking samples of DNA?

Katherine Courtney: No, we have not considered taking samples of DNA.

Mr Prosser: I am not suggesting it.

Chairman: Do not put ideas into their heads, for goodness sake.

Also see questions 70--93, in which the members of the Committee try to find out how much the scheme would cost. You can tell from the fact that they took 23 questions about it that they didn't get a very straight answer.... But, as I've said before, expect the cost to be between £120 and £400 (mostly paid for through general taxation), based on the government's figures multiplied by an `incompetence factor' of between three and ten to account for the near-inevitable overruns in the associated IT project.

Nicola Roche, from the Home Office, also repeated an old claim about the consultation exercise: (emphasis mine)

Nicola Roche: Some people do take a principled stand and in response to the consultation exercise we did get responses from those who do object in principle, but the overwhelming response from the public has been in favour.

Oh dear.

The evidence also tried to obtain information on how the cards would help prevent crime. Roche gave (Q16) only one, tenuous, example:

Yes, I think the first example would be ID theft which is a major component of ID fraud, which are two slightly different things. ID fraud is costing the UK economy about £1.3 billion a year and it is an increasing problem. So having a secure Government confirmed ID, by using the biometric with the National Identity Register, will stop that happening. I think the second example would be the use of multiple identities for money laundering. We estimate that about £390 million a year of money laundered is through the use of multiple identities. So we do anticipate the ID card would bite on that.

Leaving aside the fact that spending what will probably be tens of billions to stop £1.69 billion per year in fraud and money-laundering is a doubtful economy, this rather assumes that ID cards are likely to decrease identity theft. Which is unlikely, since the card and National Register would make it much easier to identify yourself as somebody else as soon as you had whatever unique ID number the database uses. Oddly, Roche was unable to give examples of non-identity-related crime which the card would stop. Terrorism didn't figure in the evidence.

Sadly, it looks like I've missed the deadline to submit evidence to the Committee on ID cards. But it seems that STAND and others have done a good job already, so no matter.
21 January, 2004: Shifting stands

Others may be interested to discover that BBC Radio 7 is broadcasting (at 0630h, rather irritatingly) old episodes of Yes, Minister. As far as I can tell, they've just taken the soundtracks from the TV shows and broadcast them over the radio, which works pretty well, since the original programme had very few sets and it's easy enough to visualise the actors and their surroundings.

Unfortunately, the BBC don't offer their RealPlayer `listen again' service for Radio 7 yet, so the only way to listen to this at a sane time of day is to record it and time-shift.

Time-shifting is, of course, one of the few exceptions allowed under copyright law for people in the UK. (Really dodgy link coming up....) One of the others is to allow a copyright work to be imported into the UK for private use. In other cases, you're not allowed to, because copyright works are `licenced' for sale in certain countries; rather than leaving this `licencing' up to contract law, it has been written in to copyright law, in the form of s.22 of the Copyright, Designs and Patents Act 1988:

(22) The copyright in a work is infringed by a person who, without the licence of the copyright owner, imports into the United Kingdom, otherwise than for his private and domestic use, an article which is, and which he knows or has reason to believe is, an infringing copy of the work.

and by part of s.27, which defines an article to be `infringing' if, (emphasis mine)

(a) it has been or is proposed to be imported into the United Kingdom, and

(b) its making in the United Kingdom would have constituted an infringement of the copyright in the work in question, or a breach of an exclusive licence agreement relating to that work.

Basically what this means is that if you buy some CDs abroad and return to the UK with them, you don't have to destroy them to avoid an action for copyright infringement; but you can't set up a business which buys CDs abroad -- where they're often much cheaper, since record companies charge what the market will bear, which is usually rather less than the retail price in Britain -- and sell them here, because that, rather than being a victory for globalisation resulting from the exchange of goods for mutual advantage, would be an altogether more shocking infringement of copyright.

The latest outfit to fall foul of this particular bit of corporate welfare legislation is on-line retailer CD Wow, a company which buys CDs wherever they're cheap and sells them through their web site, posting them to their customers around the world (chiefly in the UK). Presumably they were hoping to shelter behind s.22 by arguing that posting a CD to someone is a private import; in any case, those ever-cheerful bastards at the British Phonographic Industry, noticing an opportunity to rip off the public which was going unexploited, naturally decided to sue CD Wow. In the event, threats were enough; CD Wow have now caved in to their threats, on the basis that they are,

``a small business'' and it would be financially ``imprudent'' for them to try and take the case to the Court of Appeal or the European Court.

and have `agreed' to put their prices up by £2 per album. (They have, according to the BBC, one million `users' per month; how this translates to sales, I don't know.) The BPI have released a rather smug statement crowing about their `victory'; the BBC rather archly comments that,

The BPI would not comment on the impact the settlement would have on UK consumers who had been using CD-Wow!

-- presumably because the `impact' of the settlement on consumers will be obvious even to the dolts at the British Phonographic Industry.

Now (and with the obvious caveat that making infringing copies of copyright works is Bad and Wrong) it's fairly obvious that this will increase `piracy'. After all, customers presumably buy from CD Wow because to do so is cheaper than to buy from UK retailers. That's the same reason that they download music from the internet (or copy it off their friends) rather than buying it here. The BPI have acted to decrease the difference between CD Wow's prices and inflated UK prices; but they haven't done anything to bring UK prices closer to the cost of downloaded `pirated' material. So, expect downloads of music from the internet here to rise.

The record companies' response to this is to threaten lawsuits. Leaving aside the argument that it's never wise for an industry to sue its own customers, it's nice to see that this sort of bluster hasn't made much difference to rates of music downloading in the United States; after an initial fall when the Recording Industry Ass. of America starting suing, rates of music downloading have started to rise again.

I should repeat that downloading infringing copies of music is Bad and Wrong. But the record industry isn't going to stop it unless it's prepared to confront it as an economic problem to be solved by changing pricing, rather than a legal one to be solved in the courts.
20 January, 2004: Figures of convenience

Skimming That OECD Report on the British economy (which happens to contain a section about higher education funding to warm the heart of Tony Blair), I found an interesting plot which is almost a good summary of the debate on the government's top-up fees proposals. With a tiny modification, it is. I can't be bothered to re-plot this in a less ugly form, so here's an amended version of the OECD's diagram:

This plots the fraction of GDP countries spend on higher education against the fraction of higher education expenditure which comes from things other than general taxation. Note that the very highest-spending countries (the United States and South Korea) have very high levels of student contributions; the lowest-spending ones (Greece, Italy, ...) have low levels of student contributions. Note also that the UK presently has relatively low levels of funding, and relatively high levels of student contributions.

What the OECD don't show on their plot -- and what I've added above -- is the effect of the proposed reforms. The report says that these will `generate' (i.e. extract) about 0.2% of GDP from students for higher education funding (this is based on an assumption about how universities set their fees which is noted in the OECD report). Measuring the figures from the plot (these are approximate, of course) this gives the following before-and-after table:

Conditions Total funding (% of GDP) Contribution fraction (% of total)

current 1.03
0.285 fees, 0.745 tax 27.7

proposed 1.23
0.485 fees, 0.745 tax 39.4

This makes an interesting comparison with the other countries in the plot.

(As an aside, I should say that there's one important thing missing here, which is consideration of the numbers of students receiving higher education. That's for another day, I think.)

Conditions	Total funding (% of GDP)	Contribution fraction (% of total)
current	1.03 0.285 fees, 0.745 tax	27.7
proposed	1.23 0.485 fees, 0.745 tax	39.4

17 January, 2004: Hanging on the telephone

Lots of people have produced lists of words which are represented by the same list of numbers in the `T9' text input method on mobile phones. My favourite is the equivalence of `pint' and `riot', which often leads to me inciting civil disobedience:

Fancy a riot later?

Dave suggested the following slightly different question about T9:

What is the biggest change to a word you are typing which you can get by entering a single extra number?

For instance, suppose that you want to type `scorched'. After 6 keypresses your phone displays,

(726724) scorch

and then you press 3, and your phone displays

(7267243) rampage

and your reaction is, to quote Dave, `Holy fuck! Vodafone has something to answer for!' (Actually, T9 is AOL's `technology'. Whatever.)

Then you get over yourself and press 3 again, and you see,

(72672433) scorched

and all is right with the world, though arguably Vodafone still has something to answer for.... (Apart from the bit where you're sending text messages about scorched stuff. Whatever. Actually, this is a nice example because the button you press is the same every time.)

Measuring `biggest change' by the largest smallest number of changed letters, here are the top few:

Changes	Shorter T9	Longer T9	Shorter word	Longer word
6	874653	8746538	uphold	trinket
6	867243	8672437	unpaid	torches
6	787887	7878873	struts	rupture
6	742253	7422533	ribald	shacked
6	7267243	72672437	rampage	scorcher
6	7267243	72672433	rampage	scorched
6	726724	7267243	scorch	rampage
6	726533	7265333	ranked	scolded
6	667874	6678745	onrush	nostril
6	4674464	46744642	gorging	insignia
6	237438	2374387	adrift	cepheus
5	86722	867225	tosca	unpack
5	84676	846766	thorn	vinson
5	78737738	787377385	superset	stressful
5	78675	786759	stork	rumply
5	78675	786753	stork	rumple
5	78625	786253	stock	rumble
5	7833537	78335377	ruffles	steelers
5	78333	783333	steed	puffed
5	782688	7826886	rubout	quantum
5	766638	7666385	sonnet	roomful
5	76625	766253	smock	ronald
5	74784	747845	shrug	pistil
5	744686	7446863	shinto	sigmund
5	74225	742253	shack	ribald
5	737243	7372437	repaid	perches
5	737243	7372433	repaid	perched
5	736733	7367337	sensed	reorder
5	736633	7366335	penned	remodel
5	727423	7274235	scribe	raphael

(Of course, these aren't from the real T9 dictionary, but just from the first word list that came to hand. So your phone may not behave exactly as suggested above. For instance, mine doesn't know `superset', but rather displays `stressfu' when you type 78737738. But you get the idea.)

Update: Matt remarks,

I want to make the point that any T9 word-equivalence list which omits ``Smirnoff'' and ``Poisoned'' (as in ``Get me a bottle of Poisoned Black Ice, would you?'') is incomplete.

Sadly, my phone's tiny mind is lacking a list of trademarks (and swear words). Oh well. Matt also points me to something he mentioned a while ago: this splendid report on British pub etiquette, from the Social Issues Research Center:

Research findings: We observed that, on average, `initiating' round-buyers (those who regularly buy the first round) spend no more money than `waiting' round-buyers (those who do not offer a round until later in the session). Yet `initiating' round-buyers are perceived as friendly and generous, and enjoy great popularity among other regulars, whereas `waiting' round-buyers are less well-liked, and often regarded as miserly. In fact, far from being out-of-pocket, `initiating' round-buyers end up materially better off than `waiting' round-buyers, because their reputation for generosity means that others are inclined to be generous towards them.

-- that's the sort of research I like to see....

14 January, 2004: That Data Protection Act again

I shouldn't just link to things here (that's what BIMBO is for), but I was pleased to see this piece in the Guardian about the Information Commissioner's clarifications of the Data Protection Act. Cf. my previous comments about companies using the Act to make excuses:

Idiot-proof clarification of the Data Protection Act was unveiled today, in a bid to prevent the police and private companies using the act as a smokescreen for their own incompetence or errors.

-- splendid. The spin is slightly -- informatively -- different from The Times and the BBC, but the basic point remains:

Mr Thomas [the Commissioner] told BBC Radio 4's Today programme that British Gas had accepted the act did not prevent it from passing on information. He said he was ``completely surprised'' at the line taken by police in the Huntley case.
13 January, 2004: Goeth the hour

I appear to have succeeded in my quest to find a large public organisation with a sense of humour, though I may have failed to get myself an Extremely Well Paid Job at the same time:

British Nuclear Fuels plc.,
1100 Daresbury Park,
Daresbury,
Warrington.

Dear Mr. Lightfoot,

Thank you for your letter of 28th December, recommending yourself for the position of Chairman of BNFL plc.

Although it appears that you have some of the qualifications required for the position, may I suggest you direct a formal application to the Department of Trade and Industry who placed the recruitment advertisement.

I am sure your application will warrant their full consideration.

Yours sincerely,

Brian Partridge
HR Director

-- so, votes for digging out the advert and making a formal application to DTI?
7 January, 2004: Round objects

Ach, I shouldn't write anything more about this speed cameras nonsense, but this is particularly entertaining. Convicted criminal Idris Francis -- who, incidentally, was on the radio the other day crowing about his latest futile lawsuit -- is now claiming that I've been sending him paper mail in which I claim to,

have some sort of phsycological (sic.) flaw which makes you prefer snail mail to email.

-- well, what can I say?

Read the whole thing (reproduced below with its original typographical errors intact):

Date: Wed, 07 Jan 2004 14:23:33 +0000
From: Idris Francis <irfrancis@onetel.net.uk>
Subject: Your snail Mail
To: Chris Lightfoot

Dear Mr. Lightfoot

Over recent days you have accused me, in an email copied to at least one other, of dishonesty in my analysis of figures, of incompetence and have shown a high-handed supercilious attitude singularly at odds with the gaps in your understanding of these issues.

You have accused ABD of not having data which they did have, you have made totally invalid comparisons between the post 1993 period and earlier periods and in addition made a number of other statements which would be laughed out of a Form 3 arithmetic class.

For all of these reasons I told you some days ago that our correspondence was at an end. Yet, this morning, I received an envelope from you in which there were at least two letters, the first of which explained that you have some sort of phsycological flaw which makes you prefer snail mail to email.

I round-filed the contents without reading any further, as I have no interest whatever in your opinions, not least because it has become clear that you are not prepared to review them in any way when presented with conflicing evidence.

I continue the campaign, with others, confident that regardless of any trivial erros of detail or presentation, our analysis is correct, that most, if not all, speed cameras will be removed and that casualty trends will improve as a direct result.

Please do not waste any more of my time

Idris Francis

-- a stunning communication. I shan't rehearse the arguments on this issue again; Idris Francis is dishonest, the ABD are idiots, and I certainly wouldn't waste my time sending Idris Francis any letters. Though I wish some kind-hearted person would buy him a dictionary and a copy of Eats, Shoots and Leaves to improve his prose style.

(In general, of course, one shouldn't publish private email like this. But Idris Francis is clearly quite unhinged when it comes to this sort of thing; I've already had emails of the form

So you get email from Idris Francis too. You poor bastard.

from several correspondents, and it appears that at least one government department and numerous private individuals have him filed under `vexatious correspondents'. Perhaps if I publish his drivel he'll stop sending me it. We can but hope.)

Update

I have now received another email from Idris Francis, relating to the previous one. It reads: (again, typos as in original)

apologies - misread the name on the letter I binned. There being so very few people who have disagreed with me on any of these issues, having binned the letter and walked through to my computer, by the time Iarrived and looked through the emails to find the address, I chose yours instead of the correct one.

That does not however change anything else

I have now seen your foul web site and its references to me.

I am not amused. Your opinions are not only worthles, but you are a jerk of the first order

Over and out

I note that, unlike the previous email, he did not copy this email to the five-hundred-odd members of the `Unsigned Forms' email list. Again, I think this is an occasion to engage `surprise mode'.

6 January, 2004: Don't take a curve at 50 per: we hate to lose... an argument

(This is a bit of a rant, and, sadly, it's about speed cameras again. Sorry. I'll write about something more interesting later.)

Since writing about speed cameras a little while ago, I've received a bunch of email from ageing boy-racers who are horrified that anybody would dare to question one of their great articles of faith: that speed cameras `cause accidents'. Chief among these have been the members of the `Unsigned Forms' mailing list -- an oddly named group of people who have the ambition to evade justice for any crimes they may commit by exploiting supposed procedural errors -- and Idris Francis, whose major rôle in life over the recent past -- apart from acting as a cautionary example to the `Unsigned Forms' mob -- appears to have been injecting a little seasonal ill-cheer into my life by emailing me ever-less-coherent (and poorly formatted) screeds about my previous article, accompanied by monumental `Word' documents setting out his `arguments' about speed cameras accompanied by poorly constructed graphs. (You can read his stuff by downloading this .zip file, though frankly I wouldn't bother, as it's all pretty feeble.)

Anyway, the basic point here is that a large set of anti-speed-camera people have mistaken a recent statistical fluctuation in data for road casualties for an `effect' of speed cameras; because they believe it to be a positive fluctuation, they immediately scream `speed cameras are bad', and have one or other crank pressure group issue press releases about it. (We can only speculate on what would have happened if these people had observed a negative statistical fluctuation at around the same time. If the experiment were practical, I would be willing to bet that they would not describe a negative fluctuation as a `success for speed cameras'.)

I get the impression that surprisingly many people, having seen the data presented by the ABD, are convinced by the theory. Typically they write something like (paraphrasing)

I agree that the analysis is subjective, but if you can't accept that something went terribly wrong after 1993, it's obvious that you don't know what you're talking about.

For those whose minds are slightly less closed, let's see if we can't kill this stupid theory dead once and for all.

The way to do this is to formulate the theory in terms of a hypothesis which can be tested, then test it.

The hypothesis is that, up to 1993, there was a falling `trend' in road casualties; and after 1993, this `trend' slowed. 1993 is important here because it was the year in which speed cameras were first introduced. (It was also -- coincidentally -- the year when the amount of traffic on the roads began rapidly to increase after a hiatus during the recession of the early 1990s. Make of this what you will. In any case to do this properly the analyst would have to show that 1993 was a significant year; but here we're only testing previous claims, so we skip that step.)

Clearly, the number of road casualties in any given year is the result of a random process. By saying that there is a `trend', we are claiming that the number of deaths in a given year is given by some smooth function which changes slightly year-on-year -- the trend -- plus some random variable, a `residual'. Here are some possible models:

Linear trend: we assume that the mean number of deaths in any given year is some constant fewer than the mean number in the preceding year. Obviously this model is wrong at some point, since the number of deaths must be positive or zero; but for a short interval this may be a valid model:

Linear trend

Exponential model: we assume that the mean number of deaths in any given year is some constant fraction less than the mean number in the preceding year. This has the nice property that it can never be negative.

Exponential trend

(Note that I haven't justified why there should be a downward trend or why it should be linear or exponential. If I wanted to make some kind of positive prediction, I would need to do so; in particular, I would need a model of why this should happen. But I don't need to do that here, since I'm interested in analysing somebody else's claim, which itself assumes a particular trend. To do anything useful with this kind of trend, you really need a theory which explains why there should be a trend and what form it takes. There's also the issue of how you find the trend. In the ABD's case, they appear to have chosen the trend line which is most favourable to their theory; in the plots above I have used the conventional procedure: to find the best fitting curve or line using a least-squares procedure. The two models fit about equally well, but that's not really relevant since we haven't explained why we would expect such a trend in the first place. I should say that I do believe that there is a trend -- but I can't explain exactly why there is a trend. I expect that it relates to increasing road and vehicle safety, though there are other possibilities.)

Another theory which is also popular, and which I'm also not going to try to justify, is that road casualties depend linearly on the amount of road traffic -- measured in vehicle kilometers, so that two movements of one kilometer by different vehicles counts the same as one movement of two kilometers by one vehicle -- and fall according to some trend. (This theory is obviously partly sensible -- increasing road usage will, presumably, lead to more accidents -- but the assertion about linear dependence would need to be tested for this theory to be very useful. In particular, a nonlinear dependence would be easier to justify, since motorists often crash into one another rather than into stationary objects, and the rate at which that occurred would presumably depend upon the square of the number of vehicle kilometers driven.)

Anyway, here's the rate of road usage in vehicle kilometers on Britain's roads from 1993 to 2000 (note the increase starting after 1993):

Road usage

... and here's the plot of deaths per vehicle kilometer driven with an exponentially-falling trend:

Exponential trend, per vehicle kilometer

(Superficially, I'll remark that this looks pretty good. But that doesn't necessarily mean anything much.)

Now we want to ask whether these `trends' changed after 1993. We can't do this just by looking at the plots, because a pattern that appears to be obvious to the eye might just be coincidental. Instead, we need to do some kind of formal test to find out whether the `trend' has changed.

If the trend had indeed changed, the residuals after 1993 would be distributed differently from those before 1993. Either the variance -- i.e., the spread -- of the residuals would increase; or, more interestingly, the mean of the residuals would change. In English, that means that the trend would consistently under- or overestimate the actual number of deaths. (The anti-speed-camera people would like to say that the trend has underestimated the number of deaths, and interpret this as evidence that some change -- speed cameras -- `caused' the extra deaths.)

A standard technique to answer this question is a Kolmogorov-Smirnov test (like everything in statistics, it's named after its inventors); this test can be used to tell us whether two sets of samples -- in this case, the pre- and post-1993 residuals from the various models -- are either (a) drawn from different distributions; or (b) are consistent with having been drawn from the same distribution. The idea of this test is that we take the two cumulative distributions and plot them on the same axes, and then find the furthest distance between the two curves. This maximum distance (called the `Kolmogorov-Smirnov test statistic') can then be compared to a critical value which tells us whether the two sets of samples were drawn from different distributions or not. (Surprisingly enough given my description, this is actually formally correct and backed up by all sorts of hideous maths.) Here's an illustration with the residuals for the exponential case above:

Kolmogorov-Smirnov test

Pleasingly, those nice people at the R Project have implemented software which will do almost all of the work for us. So:

Model	K-S statistic	p-value	Conclusion
linear, total deaths	0.1806	0.9718	same distribution
exponential, total deaths	0.3403	0.4059	same distribution
exponential, deaths per vehicle kilometer	0.1714	0.9943	same distribution

-- that is, in each case, there is no evidence that the trend is any worse a fit after 1993 than before. None of these data support the hypothesis that there was a change in trend in road casualties in 1993, measured either in total or per vehicle kilometer.

This doesn't, of course, tell us anything new about speed cameras, or whether they are good or bad for safety. It tells us that one argument used against them -- by the ABD, and Idris Francis and other assorted loons from the `Safe Speed' campaign -- is bollocks. (As are most of their other arguments, but this post is already too long; the other arguments are mostly handwaving anyway, and can be dismissed without needing to resort to anything like the above, either because they are nonsense or because there isn't any data to confirm them. Another day, perhaps, if these bozos irritate me any more.) This doesn't stop them repeating the same lie, for instance on The World At One on 30th December last year, when the ABD's Mark Macarthur-Christie said,

According to the work we've done, if the road accident trend before 1993, when `Speed Kills' and cameras became the... the major tool of road enforcement... if that pre-1993 trend had continued to date, we'd have around 5,500 fewer people dead.

As above, this is not true; and there's precious little evidence that the ABD's error was an honest mistake. (And, as ever, the BBC -- typically innumerate -- let it pass without comment, as did other interviewees. I emailed World at One, concluding,

Although most of the statistics used in the speed cameras debate are pretty suspect, the ABD's stuff is the worst of the lot. I was saddened that the program repeated their claim without any dissenting voice pointing out that it was rubbish.

-- no response, naturally.)

(Update: I've corrected a typo in the above table. The third column is the p-value of the test, not the critical value. This doesn't change the conclusions. None of the results show a significant change in distributions.)

Over at `Safe Speed', things aren't much better. Particularly hilarious is this collection of graphs of casualty statistics; as ever, these people would do well to read Tufte. But the graphic design isn't the main problem. Consider the graph of annual change in the number of deaths (referred to, charmingly, as `fatals' -- say what you mean, people!) per billion vehicle kilometers. Now, generally, differencing a series like the casualty data is going to give very noisy results, since the random fluctuations are large by comparison with any background trend. So the `Safe Speed' people have decided to plot a moving average of the data too. Well, sort of:

Moving average

Note -- blue box to right of plot -- how these people have, unaccountably, managed to average three quantities and get a result which is larger than any of those quantities. How remarkable that this -- frankly, bizarre -- error should yield the result they are trying to prove -- namely, that road casualties are rising or soon will start to rise. (It would be politic to engage `surprise mode' in response to this stunning coincidence.) Oddly enough they've manage to make the very same error in about half of their `annual change' plots -- always in their favour....

(Update: Paul Smith of `Safe Speed' now appears to be claiming that the black curve is not in fact a moving average, but simply a made-up polynomial that is put on the graph to create the impression that there is a stationary long-term trend in accidents per vehicle kilometer. He gives no justification for the form of the curve; the point is just to make things look Really Bad. Again, to do this properly, the analyst should be using some procedure to filter the data in a window; like a correctly-applied moving average, such a procedure would not yield trend values outside the range of the real data values. A pretty poor effort either way.)

So, basically, where the arguments used by the anti-speed-camera people touch on actual data, they're crap and/or dishonest. The anti-speed-camera campaigners are liars, and innumerate too. They don't understand how to write emails, won't accept criticism, and they spend too much time in the company of Microsoft `Word'. What's not to dislike?

Idiots.

3 January, 2004: Credibility

Much discussion of Michael Howard's credo, published at great expense in the Times and, apparently, sent out by email to 100,000 members of the Conservative party, each of whom was apparently encouraged to pass it on to a further ten friends. Leaving aside our surprise that the Conservative Party has 100,000 members each of whom can receive email, and any tendency to recoil from this policy Ponzi scheme, we move on to the accusations of plagiarism made by the Mirror and the Guardian, apparently based on a similarity observed between one of the Michael Howard statements and one made by philanthropist John D. Rockefeller Jr. in 1941. The Guardian describes the Howard statements as `an almost exact replica' of Rockefeller's, on which basis we'd expect there to be a one-to-one correspondence between Howard's and Rockefeller's beliefs.

Well, there isn't one. You can read the Rockefeller statements; Howard's are -- apparently -- not on the Conservative website, but are linked to above. This tableau presents the two lots side by side; I've reordered the statements to show any correspondence which exists -- in a thematic sense, since as you'll see there is no similarity of construction whatever between the two sets of statements:

Rockefeller	Howard
I believe in the supreme worth of the individual and in his right to life, liberty and the pursuit of happiness.	I believe it is natural for men and women to want health, wealth and happiness for their families and themselves.
I believe that every right implies a responsibility; every opportunity, an obligation; every possession, a duty.	I believe there is no freedom without responsibility. It is our duty to look after those who cannot help themselves.
I believe that the law was made for man and not man for the law; that government is the servant of the people and not their master.	I believe that the people should be big. That the state should be small. I believe red tape, bureaucracy, regulations, inspectorates, commissions, quangos, `czars', `units' and `targets' came to help and protect us, but now we need protection from them. Armies of interferers don't contribute to human happiness.
I believe in the dignity of labor, whether with head or hand; that the world owes no man a living but that it owes every man an opportunity to make a living.	I believe that people must have every opportunity to fulfil their potential. I believe in equality of opportunity. Injustice makes us angry.
I believe that thrift is essential to well ordered living and that economy is a prime requisite of a sound financial structure, whether in government, business or personal affairs.	n/a
I believe that truth and justice are fundamental to an enduring social order.	n/a
I believe in the sacredness of a promise, that a man's word should be as good as his bond, that character -- not wealth or power or position -- is of supreme worth.	n/a
I believe that the rendering of useful service is the common duty of mankind and that only in the purifying fire of sacrifice is the dross of selfishness consumed and the greatness of the human soul set free.	I believe it is the duty of every politician to serve the people by removing the obstacles in the way of these ambitions.
I believe in an all-wise and all-loving God, named by whatever name, and that the individual's highest fulfillment, greatest happiness and widest usefulness are to be found in living in harmony with His will.	n/a
I believe that love is the greatest thing in the world; that it alone can overcome hate; that right can and will triumph over might.	n/a
n/a	I believe every parent wants their child to have a better education than they had.
n/a	I believe every child wants security for their parents in their old age.
n/a	I do not believe that one person's poverty is caused by another's wealth.
n/a	I do not believe that one person's ignorance is caused by another's knowledge and education.
n/a	I do not believe that one person's sickness is made worse by another's health.
n/a	I believe the British people are only happy when they are free.
n/a	I believe that Britain should defend her freedom at any time, against all comers, however mighty.
n/a	I believe that by good fortune, hard work, natural talent and rich diversity, these islands are home to a great people with a noble past and exciting future.

The notion that Howard's statements are plagiarised from Rockefeller's is ludicrous. About the closest similarity is that both contain lots of sentences which begin `I believe', and even then Howard has broken free from the constraints of the form by `not believ[ing]' in various things too. There isn't even a strong thematic correspondence between the two sets; while Rockefeller is concerned with antique notions of social justice -- love, religion, the sanctity of contracts and the purity of thrift -- Howard is much more concerned with economics and public services, talking about poverty, pensions, healthcare, and so forth.

Of course, none of this subtracts from the point that Howard's statements are repetitive and mostly pretty banal. Many others have said the same; I think the person who expressed it best was Matthew Parris, who pointed out that you can discard any statements of this type if the opposite is so ridiculous as to be unthinkable. So, for instance,

I believe it is the duty of every politician to serve the people by removing the obstacles in the way of these [health, wealth and happiness] ambitions.

is perfectly reasonable, but also superfluous, since any of the antonyms of the above statement are absurd:

I believe it is not the duty of politicians to serve the people by removing the obstacles in the way of these ambitions.

Equally stating that,

I believe the British people are only happy when they are free.

is pointless, given that,

I believe the British people can be happy without being free.

is not a description of a policy which is likely to win him any votes.

By contrast, some of the statements are silly rather than meaningless. For instance,

I do not believe that one person's poverty is caused by another's wealth.

means nothing unless he names the one person. (`I do not believe that the poverty of Mr. J. Bloggs of 323 High Street, Nether Wallop, Borsetshire is caused by another's wealth.') Equally, it's obviously true that there do exist people who have been made poor by others who have used their wealth to do so, but that's not really what Howard's getting it. The point is actually about the question of whether measurements of absolute or of relative wealth should inform government economic policy. While an important question, it's not original and it hardly takes a two-page advert in The Times to remind people that the Conservatives don't, generally, believe that wealth inequality is an important issue.

Others have pointed out that,

I believe that Britain should defend her freedom at any time, against all comers, however mighty.

is an odd statement. You can read anything you want into it; Phil Hunt speculated, rather ridiculously, that the `mighty all comer' is the United States, which is absurd. This statement is presumably in fact intended to convey scepticism about European law; the troubling might is bureaucratic, rather than military, which is for the best. Again, we don't need newspaper advertisements to tell us that Conservatives don't much like the European Union.

To be honest, Howard may as well have just told us that he's in favour of good things and against bad things. This would have taken up less space in the newspaper, too.

This is all done with wwwitter.

Copyright (c) Chris Lightfoot; available under a Creative Commons License. Comments, if any, copyright (c) contributors and available under the same license.

Hosted and supported by

January 2004

28 January, 2004: Silliness

27 January, 2004: Pig out

23 January, 2004: After-dinner discussion

22 January, 2004: Running on time?

21 January, 2004: Shifting stands

20 January, 2004: Figures of convenience

17 January, 2004: Hanging on the telephone

14 January, 2004: That Data Protection Act again

13 January, 2004: Goeth the hour

7 January, 2004: Round objects

Update

6 January, 2004: Don't take a curve at 50 per: we hate to lose... an argument

3 January, 2004: Credibility