I tallied the daily and Saturday Times crossword for the calendar year 2008; a total of 313 crosswords. There were 9187 clues altogether – an average of 29.35 per puzzle [this is pretty much as you’d expect from the set of grids – see April 2008 Grids article]. 8085 distinct entries (words or phrases) were used as solutions. (Words like ABJURED and ABJURING were counted as two distinct words – counting roots seemed too difficult without a fair amount of human intervention.) The number of times each entry appeared as a solution is shown in the table below. An overwhelming majority (some 88%) of the entries appeared only once, which is probably testament to the setters’ skills. That said, 946 words appeared more than once, with 131 appearing more than twice.
Value | Frequency | Relative Frequency | Cumulative Frequency | Cumulative Relative Frequency |
1 | 7139 | 0.8830 | 7139 | 0.8830 |
2 | 815 | 0.1008 | 7954 | 0.9838 |
3 | 110 | 0.0136 | 8064 | 0.9974 |
4 | 17 | 0.0021 | 8081 | 0.9995 |
5 | 4 | 0.0005 | 8085 | 1.0000 |
Those which appeared 5 times were: ROD, TEA, TIE, and TOPIC. TEA was also included in many compound words – TEA GARDEN, TEA LEAF, etc. It is a fitting accolade to a national obsession.
The words which appeared 4 times were: ADO, APPLE, APSE, ARCHIVIST, ARENA, ASCOT, ASSET, AWARD, , ERATO, GENTRY, INFERNO, LEGIT, OMANI, PEA, RELAPSE, RUT, SEA. There are some words here that I wouldn’t have suspected, but others such as ERATO, OMANI and APSE I would have declared regulars. SEA and PEA also appear in compound expressions, but do not rival TEA.
The list of thrice-used entries appears below. It contains some of the usual suspects and some which are surprising, such as FAHRENHEIT and INDONESIA. The first phrases start to appear, such as GET UP AND GO.
ADHERENT | ADVANTAGE | ALLOA | ALOFT | ALUMNUS | AMBLE | ANDROMEDA | ANNEAL | ANON | ANVIL |
ASPIDISTRA | ATTEST | BETA | CABIN | CANON | CRACK | DEGREE | DETRIMENT | DOMICILE | DOWNTOWN |
DRESSER | EGO | EIDER | ELITE | EMMY | EQUIP | ERA | ERRATUM | EVADE | EXTOL |
FAHRENHEIT | FELT | GALILEO | GASOHOL | GET UP AND GO | GOO | GRANDSON | HANDS DOWN | ICON | IDLE |
IDYLL | ILL | IMPASTO | INCA | INDONESIA | IRON | KNIGHT | LACONIC | LADEN | LEA |
LEARN | LEONARDO | LIE | LOTTO | LULU | MASTHEAD | MASTODON | MORIBUND | NEIGHBOUR | NIGHT |
NOEL | NONPLUS | NOSE | OAR | OATH | OP ART | OPERA | ORANGE | ORIGINATE | ORMER |
PALATE | PALMA | PAR | PATTERN | PIKE | PLEASANTRY | POSER | PRIME | RALLYING | REALISTIC |
RESERVE | RHINO | RHONE | RIOJA | ROMEO | RUBY | SEASONED | SHYSTER | SINK | SOHO |
STAG | STUD | STUN | TABLE | TADPOLE | TAHITI | THETA | THROW | TITAN | TO WIT |
TORRENT | TRACE | TRINITY | TRIPPER | TRUMP | TRUMPED- UP | TUB | TWEETER | VEIN | ZERO |
My theory is that most of the frequent words are just good grid-fillers, so it should be no surprise that they appear often. No doubt they appear just as regularly in the Sunday Times and Jumbos or the Guardian for that matter, so if you’re doing puzzles from multiple sources their frequency is multiplied, and their hackneyed reputation earned. In fact, there’s an analysis of Guardian puzzles by one of the fifteensquared bloggers, which shows that the list of most-used words in those puzzles (for a 10-year period) is not that similar to the Times “thrice or more” list for 2008. This probably means that the Times list for 2009 will be quite different too.
Of course, a word being hackneyed doesn’t imply the same for the clue to it. Setters no doubt come up with alternative clues for words and store the better ones for later use.
I was surprised to find words such as GONER appearing on a Friday and again on the following Monday, so I began to wonder if the reoccurrence of words was a purely random process or if there was some form of self regulation or editorial intervention. Without getting too technical, by considering the distribution of interarrival times I convinced myself (but probably not Sir J.F.C.Kingman) that the hand of the editor was evident. Peter took the more pragmatic approach of actually asking – My understanding from Richard Browne, Times Crossword Editor, is that he works quite hard to reduce repetition of words in the puzzle, and will tell the setters that he’s noticed particular words coming up frequently, for example.
I’m impressed by the lack of repetition that this analysis shows – it means that once you’ve seen a phrase like MIDDLE OF THE ROAD, you’ll be unlucky to see it more than once again in the same year. We should probably remember, when grumbling about a plant like the recent MANZANITA, that we may have been saved from another MARGARITA.
There’s similar analysis available for the New York Times puzzle, covering a much longer time period, at http://www.xwordinfo.com/. If you remember the rules about US-style crossword grids, it’s no surprise that they have much more repetition – over the period of about 16 years covered, there are currently 344 words which have been used 100 times or more – very roughly equivalent to the 5-timers above.
One of my own favourites, for example, is ELI – apparently, the only priest in the bible! I reckon to see him crop up monthly or thereabouts, in one crossword or another and in one form or another, usually as part of a word and so perhaps not caught in the above..
One day I will set a crossword consisting entirely of cliches. Eli will be in 1ac, and Erato and muse will be there somewhere too…
As for my own project, I entend to write a stage show with dialogue and music consisting entirely of Times crossword clues. The working title for this year’s production is Abednego of Abu Simbel: The Abstergent Acupuncturist featuring the hit songs A Shropshire Lad and Abide With Me. As you can see, it still needs a bit of work.
for example another one of my cliche list, the world’s only ancient city, UR, raised its ugly head again today (11feb).. I forget, 3dn was it? UpriseR .. would that have been recorded as a stat or a fact, I wonder?