I’ve been looking for projects to test various methods of work-distrubtion, without being yet-another-Fibonacci-series that has been done to death already, and I stumbled on the idea of finding all the reversible words and palindromes within a given range of lengths (3-7 letters).
It immediately seemed like a fun challenge: the simplest approach would be to load an entire dictionary in from one of the open-source resources someplace, and simply iterate over each word and find those which still make a valid word when reversed.
But just as easily, you could start ignorant of the lexicon and use brute (or genetic) force to find it… Remember, this is an exercise, it’s not really so much about finding the results.
One of my variations used ZeroMQ to distribute tasks across 20 free Amazon EC2 servers: there are both Linux and Windows instances available for free (within limits) if you use “c1.micro”.
Searching for palindromes and reversible words (“Elle”, “tuba” => “abut”) provides a number of sub-tasks which you can compartmentalize in a variety of different ways. For example: combine letters to generate a word, match word against a dictionary, reverse the word and match that against a dictionary.
At worst, a brute-force, single-threaded approach in Python might look like this;
import enchant ; dict = enchant.Dict("en_US") wordLen = 5 def descend(letters): # Check anything of 3 or more letters for wordiness. if len(letters) >= 3: # Check the word forwards and backwards. if dict.check(letters) and dict.check(letters[::-1]): print(letters) if len(letters) >= wordLen: return for addition in range(0, 26): descend(letters + chr(ord('a') + addition)) descend('')
And with a maximum length of 5 letters, you’ll start getting results pretty fast:
But each increase of wordLen increases the time the run will take by 26x – so increasing the wordLen to 7 letters increases the single-threaded run time 676xs.
There are all kinds of factors here, for example: just spitting out every word you can concoct to a “word checker” thread will be very wasteful – you’ll spend more time communicating the data than you will analyzing it.
So you’ll want to experiment with chunking / partitioning of some kind: my amazon version chunks 3-letters deep, and forwards the results back into the result-set chain.