Teaching Tips - The Anagram problem solved in `Unix style

Published on: 2005-11-18

Home | Archive

Teaching Tips - The Anagram problem solved in `Unix style'

2005-11-18T17:15:00

What do "parliament" and "partial men" have in common - well, they are anagrams, one sentence can be obtained from the other by simply interchanging the individual characters. Look here if you want some really interesting ones ... Jon Bentley, in his classic Programming Pearls develops an interesting program for discovering anagram sets in a large file of English words (say /usr/share/dict/words). This can be used as a very effective demonstration of the utility of pipelines and the `Unix way' of doing things. Let's take a small sample file, `dat' containing the words:


top
Eat
opt
tea
Pot
ate
Some words begin with an uppercase character, the first step is to convert everything to lowercase:

cat dat | tr 'A-Z' 'a-z'
Now write a program which reads a word from stdin and prints its sorted form together with the original word to stdout - the process is repeated till end of file is encountered. Let's call this program `sign'.

cat dat | tr 'A-Z' 'a-z' | ./sign
The output is:

opt top
aet eat
opt opt
aet tea
opt pot
aet ate
Now, do a dictionary sort based on the first word:

cat dat | tr 'A-Z' 'a-z' | ./sign | sort
The output is:

aet ate
aet eat
aet tea
opt opt
opt pot
opt top
All words which are anagrams have come over to adjacent lines. Now we write a program which will eliminate the signature and bring all words in an anagram set onto the same line.

cat dat | tr 'A-Z' 'a-z' | ./sign | sort | ./sameline
Now, we can print all anagram sets of say, 4 words by giving this output to `awk':

cat dat | tr 'A-Z' 'a-z' | ./sign | sort | ./sameline
| awk '{if(NF==4)print}'
Isn't it interesting? We have ourselves written just two simple programs (sign and sameline) - the others are all standard Unix tools!

Maarten van Emden

Sat Jul 12 15:16:45 2008

I vaguely remembered that Bentley had used this example, so I was happy to find your document to have it confirmed. I also remember Bentley (article? talk?) reporting that Tom Cargill vividly explained the method to him by saying "Sort this way (moves his hand sideways), then sort that way (moves his hand up and down)".