David Parlett on word games

HOW TO WIN WORDLE

The junk food of serious word-gamers

Players 1   Type Deductive   Equipment Online resource  

Wordle is an online deductive word game that challenges you to discover a 5-letter target word by typing out test words and getting radar-like responses. A letter in your test word turns green if it appears in the same position in the target word, yellow if it appears in the target word but in another position. These test words are called 'guesses' and the game itself a guessing game. But it's more than that. Skill plays a large part too, though exactly how much varies from word to word.

In fact, the first thing you have to recognise SITAR > BOXER is that you can't expect to win Wordle every time. By way of example, consider this spectacular failure (SITAR - BOXER).

Having got the essential structure of the word with DONER at line 2, there was no other way of reaching the actual result than by simply trying out all 19 possibilities for the fist and third letter. And since those tests happened not to include either F or Y (why should they?) I failed to reach the final target word FOYER

Nevertheless, I think my average success rate of 97-98% shows to what extent the vagaries of chance can be overcome with skill and care. Especially care, as it's so easy to make a silly mistake, such as forgetting to include in your test word a letter that you have already established as being part of the solution.

A couple of examples will make show how skill can be successfully applied.. SITAR-LINED-VIVID In the first one, (SITAR - VIVID), we start with SITAR as it uses five of the commonest letters. Two of them are vowels, as the average number of vowels in English words is about 40%. It also follows a pattern of alternating consonants and vowels, since until we have established the basic pattern of the word it's not yet worth trying to clump consonants together.

At line two, we keep the I and try another vowel and three more common consonants, producing a green for D. Now what? Should we try any of the remaining vowels O U Y? Maybe, but it’s hard to think of a word containing one of them in fourth position.

So the likeliest outcome would be a duplicated I, forming a skeletal word -I-ID. We can reject both RIGID and LIVID because we've already seen that neither R nor L is in it. But LIVID immediately puts one in mind of VIVID, which is a typical Wordle target. Problem solved! (Though, according to next day's papers, this one left many players stumped.)

By contrast, our second example (ROUTE to SHAKE) ROUTE --- SHAKE is solved by about 50% skill and 50% chance. Unusually, I tested for all six vowels in the first two words, together with four of the commonest consonants. Given E at the end, I suspected A in the middle, to produce the skeletal word ‑‑A‑E, which in turn suggested a two-consonant cluster at the beginning. S and H are two more common letters yet to be tested, and together produce a very common digraph. With the structure SHA-E we have now, in effect, solved this one in three turns. The next three shots just involve slogging through the alphabet to get the remaining letter. No skill involved. It could also have been SHADE, SHALE or SHARE, were it not for the fact that these three letters had already proved missing. For those of us who take word games seriously, ambiguous words differing by only one letter are a pain in the brain, and ought not to be allowed.

On these grounds, I now make the following proclamation: that once you have got four greens, or (what comes to the same thing) three greens and a yellow, you have solved the puzzle, and slogging through the alphabet for the missing letter is a waste of time.

Solving Wordle

The first thing you need to know is that the target word in the NY Times Wordle (and probably all its copycats) is selected at random from a stock of 2,500 words. According to one report I've read these are all words known to the inventor's partner, so they're unlikely to include anything outrageously obscure like SYLPH.

Some knowledge of letter frequencies in English is obviously helpful. But there are two problems with this. The first is that letter frequencies vary depending on whether the sample text is horizontal or vertical (my terminology). Horizontal text is words put together to be read for their content as messages; vertical text is an index or word list, such as a list of dictionary headwords - or the stock of 2,500 in Wordle's treasury.

There was no Internet in my teens when I first needed a frequency distribution for word games. At that time the only frequency tables I could find came from books or articles on cryptography, as filtered through the medium of The Children's Encyclopaedia, and/or an appendix to Nuttall's Standard Dictionary of the English Language ('based on the labours of the most eminent lexicographers', if I remember aright).

These in turn tended to be based on Samuel Morse's counting of the relative number of sorts (letters) in compositors' cases, which proved to be:

ETAIN OSHRD LUCMF WYGPB VKQJXZ

But this is a count of horizontal text, in which the frequency of letters is skewed by their appearance in the most frequently used words. Everyday text is full of words like THE, THIS, THAT, THESE, THOSE, HIS, HER, THEM, THERE and so on, from which you might assume that English words are composed almost entirely of S, H and T. By contrast, in word games based on individual words rather than messages it is the frequency of letters in vertical text that counts – that is, the frequency of their appearance in dictionary headwords. When I got my first home computer in 1979 one of the first things I did with it was to list the first headword on every tenth page of Chambers dictionary and from these to calculate the relative frequencies of individual letters. My result (as reported in my Penguin Book of Word Games, 1982) was almost identical to the more recent and sophisticated table at this cryptography page.

For comparison, the following table shows (1) the relative frequencies of letters in horizontal text; (2) the same in vertical text; (3) frequency of letters in initial position; (4) letters in final position (vertical).

hor ETAINOSHRDL UCMFWYGPBVKQJ XZ
ver EARIOTNSLCU DPMHGBFYWKVX ZJQ
ini SPCAMTBRDF HEIWGLOUNVKJQ YZX
fin SETYNLRDCA HMGPOKFWBXIUZ JQV

It can help to memorise a four-fold scheme of letters:

It's certainly worth starting with words containing the vertically commonest letters. Even an unusual word like SYLPH contains three of the commonest, and using this as your first test word would quite likely give you two or three yellows, perhaps even a green.

Vowel-consonant patterns

Before plunging directly into the letter-frequency approach, however, it might be helpful to think about the pattern of vowels relative to consonants in the target word. The relative frequency of vowels in vertical text is about 40 per cent. With x representing a consonant and O a vowel, five-letter words are often based on the pattern xOxOx, such as PETAL; sometimes xxOxO, like CHASE; and sometimes xOxxO, like LATHE. I therefore usually start with three words each containing two vowels and three consonants, and representing all six vowels and nine different consonants. A good selection of three initial test words might be CATER, POUND, FILMY, though the second and third words can be varied in light of any useful results afforded by the first.

Six vowels? Yes - never neglect Y, LATER-POUCH-CLING-CYNIC which is a vowel more often than not. There are many words containing no other vowel - LYMPH, SLYLY, PYGMY, to name but a few. Y is only more or less consonantal at the start of a distinct syllable, as in YEAST and YOYOS, though a delightful counter-example is the (admittedly archaic) YCLEPT. In this one (LATER-POUCH-CLING-CYNIC) I had already tested for A, E ,O and U, and should have tried both I and Y at the third attempt. Even with hindsight I can't see any alternative to CYNIC at this stage of the game, and ought to have got it in three.

But of course you have to modify your approach depending on what results you get for your first words. For example, if CATER turns up three yellows for A, T and E then a sensible choice for your second shot would be something like SATED, thus giving you further information about the positions of A, T and E, and introducing two more test consonants (S, D). I’d be very surprised if SATED did not produce at least one green in addition to the two known yellows.

Even if your first word produces five blanks, LATER-POUCH-POUND at least that's useful information, and your second line may prove correspondingly more profitable – as it did in this one (LATER-POUCH-POUND). It also shows how important it is to avoid words containing letters already shown to be absent, except for a very specific purpose. Here's an example of a specific purpose that paid off (RATED-POUCH-FILMY-SPILL). RATED-POUCH-FILMY-SPILL Since four of the six vowels had proved missing, I needed to test for I and Y. There are few words of the form -I-I- (though PIPIT might have been worth a try if T had not already been eliminated), so it now seemed likely that the structure was xxIxx. FILMY didn't follow that pattern and doesn't include the P that we know to be in it, but it's one of my favourite test words, and it only then remained to test the common letter S to produce the final result. In the event, SLIMY would have been a better choice than FILMY; but no matter.

Following this method, you can usually establish the basic vowel-consonant pattern of the target word, if not the thing itself, by your third shot. To generalise, I suggest your first three shots should include 19 different consonants and all six vowels. After that your fourth shot should usually hit the target, unless (a) you have more than three possibilities for one particular letter, and (b) the target word contains any duplicated letters.

The positioning of specific letters in a word is another part of its basic structure. Given a yellow letter you then need to consider whereabouts it's most likely to appear as a green. For example, B is most likely to be found as the first letter of the target word and least likely to appear at the end. It's also unlikely to be in second position as it can't be preceded by a consonant. If it isn't the initial letter, it's most likely to be in the middle of the word. H, on the other hand, is most worth testing in second or fifth position, bearing in mind that it frequently follows C, G (as in THIGH), P, S, T, and (at the start of a word) W. Letter X is usually best at the end of a word (INDEX, LATEX), but it also frequently occurs in second position following an initial vowel, as in AXIOM, EXTOL, OXBOW.

If you get three or more yellows LATER-SLATE-PLEAT on your first shot it’s often possible, and can be helpful, to just shift them by one position en bloc, as in the example of LATER-SLATE-PLEAT.

British solvers will soon become aware that the target word might be missed because it appears in its American spelling (FAVOR, HUMOR), etc. And it’s easy to get fixated on a particular but misleading pronunciation. On one occasion I tried MOULD, which produced four greens following the M. Being unable to find a word of similar spelling and pronunciation I completely overlooked COULD and WOULD!

Scoring

Accepting Wordle as a game of skill, you might want to devise a method of scoring for your successes. After all, it's pretty meaningless to say 'Solved in three' or 'Solved in six' if the target word is of sufficient ambiguity to admit of half a dozen candidates for just one of its letters. My system is to value greens at 2 points each and yellows at one, and to score for reaching the target word 10 minus the value of letters obtained on my first shot. Thus LATER-POUCH-POUND gave me 10/10, while CATER-ARGON-PARKA-GRASP-SPRAG gave me 8/10 – CATER-ARGON-PARKA-GRASP-SPRAG a lucky outcome, as I didn’t know the word SPRAG and had to look it up before entering it.

Note that my scoring doesn’t take into account LATER-INGOT-POSIT-JOIST-FOIST-MOIST the number of test words it takes to reach the target. Getting it right on the first shot is just a lucky guess, and you quite rightly score 0/10 for it. On the other hand, I got 9/10 for one that took up all six shots, namely LATER-INGOT-POSIT-JOIST-FOIST-MOIST. Having got the essentials of the target word by line 3, it was only a matter of guessing the first letter, which, besides J, F and W, could also have been B (obsolete word for box or cask), H, or R. Hence my dictum that once you have got four greens you have in fact solved the puzzle. The rest is just slogging through the alphabet.

I did consider adding to my score 1 point for each of the six shots I left unused, so my first example (LATER-POUCH-POUND) would have counted an extra 3, giving a slightly anomalous score of 13/10. But, in the interests of consistency (that ‘hobgoblin of little minds’, according to Ralph Waldo Emerson), I now prefer to avoid scores exceeding 10.

One aspect of Wordle that can be annoying is the appearance of duplicated letters, as each duplication reduces by one unit the amount of information provided by each shot containing it. One of the worst possible target words would be MAMMA, and if it were ever set I'd be tempted to 3 points to my score: two for the duplicated Ms and one for the duplicated A. Some really nasty duplicated-letter words that I have yet to see used include ESSES and PZAZZ. Incidentally, one useful thing to remember about duplicates is that the first time you enter one it automatically appears in its earliest position. For example, if the target word is MAMMA, your first M will be the initial, your second will appear in third and your third in fourth position.

As a matter of interest I find that the average number of shots it takes me to hit the target is 3.7, that I get (on average) the equivalent of a green or two yellows on the first test-word, and thus make an average score of 8/10.

Further thoughts on the vowel-consonant pattern of individual words might not come amiss. Amongst those omitted above are Oxxxx (such as the German-derived ANGST, and ANKHS, borrowed from ancient Egyptian), OOxxx (AITCH, EIGHT, etc), and xxxxO (SCHWA, defined by Wikipedia as ‘the mid central vowel sound in the middle of the vowel chart, denoted by the IPA symbol ə’). Even an example of xxxxx exists in Collins Official Scrabble Words – namely CRWTH, a Welsh musical instrument. W is a vowel in Welsh , and 'crwth' rhymes, more or less, with 'sooth'. There are also the lovely patterns xOOOO of QUEUE and OOOxO of OUIJA – a word useful for testing four vowels at a time.

Some oddities

If you can WOW somebody, do they become a WOWEE? Come to think of it, is someone being WOOED a WOOEE?

The ordinal series beginning first, second, third, etc, also includes the indeterminate ordinal NTH. That being so, would its adverbial form be, by analogy with firstly, secondly, thirdly, etc, NTHLY?

Ancestors and copycats

In my youth we played a pencil-and-paper ancestor of Wordle called Bull and Cow. One player thought of a five- or six-letter target word and wrote down a row of as many dashes as it contained. The other sought to deduce it by calling out test words. For each test word the setter would state how many bulls or cows it scored. A ‘bull’ was a letter in the test word appearing in exactly the same position in the target word, a 'cow' a letter that appeared in both words but in different positions. So if the target was ATTIC and you tested with CATER you’d be told ‘One bull and two cows’.

I don't suppose the game originated as late as my introduction to it – it looks more like the sort of thing that might have been invented by prisoners-of-war in 1939-45 or even 1914-18. Whatever, you can readily see this as the essence of Wordle, with green letters equivalent to bulls and yellow ones to cows.

Bull & Cow must have been known to the Romanian-born Israeli telecommunications expert Mordecai Meirowitz, for in 1971 he invented an equivalent game played with coloured pegs and marketed in Britain (by Invicta Plastics) under the name Mastermind. The following year BBC TV started its still popular quiz game of the same name but bearing no relation to the deductive mechanics of the board game. I used to think the deductive board game owed its title, and hence its success, to the popularity of quiz show, and have just been surprised to discover that the board game came first.

Wordle, as you can readily find out by Googling it, was created by a software engineer called Josh Wardle, whose name is felicitously reflected in the title of the game itself. Launching it in 2021, he says he created it during the Covid-19 pandemic, to keep his wife amused at the height of the lockdown. It was then free of ads, as all ethical games should be, but in 2022 he sold it for a large sum to the New York Times, who immediately debased it with irritating ads. (And by excluding from its word-hoard certain words on grounds of political correctness, which I have always regarded as a form of illiteracy.)

The official New York Times Wordle has spawned many copycats, such as Wordlegame, Wordle Play, Wordle.name, Wordleunlimited, and no doubt many others. If you take the original version seriously, the only reason for playing any of the others is for pure practice, as they're not restricted to one puzzle per day.

Wordlegame.org is the one I regularly use for practice, though it has its faults. One of its target words was VITRO, which is unacceptable partly because it isn't an English word, but chiefly because it only ever appears in the technical phrase 'in vitro', and therefore has no independent existence. (See When is a word not a word?.)

When practising, a good exercise is to take the solution to the previous target word as the first test word of the new one.

Copyright David Parlett