D EAR V ANITY [F AIR], -Just a year ago last Christmas, two young ladies-smarting under that sorest scourge of feminine humanity, the having "nothing to do"-besought me to send them "some riddles." But riddles I had none at hand, and therefore set myself to devise some other form of verbal torture which should serve the same purpose. The result of my meditations was a new kind of Puzzle-new at least to me-which, now that it has been fairly tested by a year's experience and commended by many friends, I offer to you, as a newly-gathered nut, to be cracked by the omnivorous teeth which have already masticated so many of your Double Acrostics.Your first task will be to build the underlying data structures that will allow you to find a chain of links yielding a solution to Carroll's doublets for any pair of words. Then you'll complete the search algorithm and do some experiments with it.The rules of the Puzzle are simple enough. Two words are proposed, of the same length; and the Puzzle consists in linking these together by interposing other words, each of which shall differ from the next word in one letter only. That is to say, one letter may be changed in one of the given words, then one letter in the word so obtained, and so on, till we arrive at the other given word. The letters must not be interchanged among themselves, but each must keep to its own place. As an example, the word "head" may be changed into "tail" by interposing the words "heal, teal, tell, tall." I call the two given words "a Doublet," the interposed words "Links," and the entire series "a Chain," ...It is perhaps, needless to state that it is de rigeuer that the links should be English words, such as might be used in good society. ...I am told that there is an American game involving a similar principle. I have never seen it, and can only say of its inventors, "pereant qui ante nos nostra dixerunt!"
Download and unzip this data file
containing large dictionaries of words
on your computer so that the Doublets
project
and the unzipped DoubletsData
folder are siblings in your file system:
If you aren't sure where your Doublets project is, look in Eclipse under File > Properties > Resource > Location: and go to the folder containing Doublets.
Note that you don't need to import this folder into Eclipse in any way, you just need to place it in this directory so that the Doublets code can find it.
Of course, since this data isn't imported into Eclipse and isn't part of your project, you should never commit/push these dictionaries (we will test using our own copies). So when you see the dictionaries appear as unstaged changes, either leave them there as unstaged or right-click > Ignore to have git ignore them.
Your first task is to build a data structure that represents the allowable links for each word. For example, we should be
able to efficiently determine that the candidate plays from heal include heel,
hell, heat, heap,
real, hear, deal,
zeal, seal, peal,
head, veal, and
meal, while the candidates from glass are grass,
glads, class, gloss.
Create a class called Links that contains such a structure. It should implement the LinksInterface so that it
has the right methods needed by the unit tests. Its constructor should take a file name as a parameter and populate the links
data with word information from the file, which should have one word
per line. Include a member method, The getCandidates(word) method in Link must execute in O(1) time. It gets called OFTEN when searching for doublets. In particular, it should probably just do an O(1) lookup into a data structure and return what it finds, or null if there are none; there is no time to do string calculations. Therefore, you will need to do all of the work to figure out the set of words that differ from the given word in the constructor. So what basic data structure/s to use for your main Links field?
|
The Links structure for the tiny.dictionary.txt datafile looks like this:
Make sure your code passes the given tests in LinkTest.java.
Create a class Chain representing an immutable sequence of link words. Recall that immutable just means that after the data is initialized in the constructor, no method may change any fields. It should support adding a new word to the (end) of the chain, as well as retrieving the last word in the chain (i.e., to see if it is the desired word). Because the objects are immutable, adding a word should construct and return a new Chain.
Your class should be an Iterable<String>, so that you can easily look over the contents (in order) and ultimately print the solution. Include a length method as well.
When searching for a chain between the doublets, it will be important not to repeat any words (otherwise your solution will be unnecessarily long). Thus, to ensure your chain does not repeat words, you should include a boolean method contains to determine efficiently whether a given word appears in the chain. What data structure is best to use here? After you have thought about this and have a good idea, see here for a hint.
Starting from the first doublet, there are many possible links that could lead to a solution. For instance, in finding a valid chain for Carroll's doublet from head to tail, you may decide to use a link from head to heal, or you might decide instead to use a link from head to hear.
Which one should you choose? If you are playing cerebrally, you might choose the link that seems most promising. However, it could end up being a dead-end, requiring you to resort to some alternative link.
Computationally, this process requires storing the alternatives you intend perhaps to try later if your search should come up short. Of course, you'd also need a method to retrieve one of those alternatives for examination. You can visualize the search as a tree:
GLASS has 4 links to candidate words. GRASS has 2 links shown. Searching for the correct chain just means expanding this
tree until we find the end word. This makes sense if we can expand words in parallel. But we can't do this; we can
only call getCandidates() on one word at a time. So we need to store the other chains in some kind of data structure.
Say we wanted to call getCandidates() to examine all the words at depth = 1 first before looking at anything
on level 2: this is called a breadth-first
search. (Can you see why? We search broadly on 1 level before we go deep along a path.) When discussing binary trees, what kind of traversal did we
call this? What kind of data structure did we use for the corresponding iterator?
But maybe we think a depth-first search would be quicker. Then we would want to call getCandidates() on the first chain at depth 1 (grass), then call getCandidates() on the first new candidate at depth 2 (crass), then getCandidates() on the first new candidate at depth 3, etc., all before expanding any of the other chains at depth 1! When discussing binary trees, what kind of traversal is this? What kind of data structure would we use for the corresponding iterator?
See the difference? In the first case, we expand GLADS before CRASS, but in the second, we expand CRASS before GLADS.
Here is another, more complete example.
To take advantage of polymorphism, we have given you a ChainManager abstract class that supports two core methods: add, which simply adds a chain for future consideration, and next, which returns (and removes) a chain.
In addition to these, we may be interested in tracking the performance of the manager. We include two other methods, numNexts, which returns the number of times next has been called, and maxSize, which returns the largest number of chains the manager has stored at any point so far. We also include updateMax(size), which should be called when the number of chains has potentially increased.
Create two implementations of ChainManager, one called QueueChainManager that treats chains in a FIFO
fashion, and another called StackChainManager that treats chains in a LIFO fashion. The next method
should return null when there are no more chains left to consider. These correspond to the two ways to expand
the tree (breadth-first and depth-first) discussed above.
|
With all of the pieces in place, you are now ready to find a chain of links between doublets. But how? A sketch of the algorithm is as follows.
To get the process started, the initial chain is quite simple: it should contain the start word. This is your current (first) candidate chain. Add it to the chain manager to initialize it.
The algorithm should repeatedly ask the chain manager for chains to expand. What should be done with a chain?
Once you've done that, repeat the process for the next chain the manager gives you. If the chain manager says there's no chain left, you know there is no chain between the doublets.
Implement a class called Doublets whose constructor takes a LinkInterface object. Add a member method findChain that takes the two doublet strings (starting and ending) and a ChainManager, returning a solution Chain or null if there is none. The findChain method should implement the algorithm sketched above. |
At this point, you should be able to pass the unit tests in DoubletTest.
Note: Please don't treat the output above as a unit test: the numbers you get will vary depending on the order you generate links. The relative sizes of candidates, max size, etc are important, though: you should find that each manager has its own strengths and weaknesses.
The stack manager tends to find chains that are unnecessarily long. Meanwhile, the queue manager is guaranteed to find the shortest solution, but can take gobs of memory. Both of these managers somewhat unintelligently choose which candidate to explore next. Can we perhaps do better?
When new chains are added to a queue, they are necessarily at the end. Thus the queue manager returns the candidate with the shortest chain for examination. While this is an advantage when several chains have the same length, we have no way of distinguishing which may be better. For instance, which of the following two chains would you prefer to be working from for the doublet head/tail?
If you've played the game before, you'd probably pick the first. Why? After five moves, the end of the first chain differs from the goal word by only a single character. By contrast, the end of the second chain differs from the goal word in all four characters. We have an intuitive sense that the first is better. Can we formalize this somehow?
Instead of returning the shortest chain for examination, we would like to return the chain with the shortest total estimated solution length. That is, when a particular incomplete chain is extended to yield a solution, what will the solution's length be? We already know how long the candidate chain is, but how can we estimate how much longer the chain must be to reach the solution? One optimistic estimate is to count the number of characters that differ between the chain's end and the goal word. Because you can only change one character at a time, the remaining chain must have at least this many words in it (though perhaps more).
The value of a chain (for a particular doublet) is thus given by
|
If you hadn't already guessed, one of the most efficient ways to keep a collection of things and quickly access the "best" is a priority queue. In this part, we will build up a PriorityQueueChainManager that extends the ChainManager abstract class. You can begin by creating a skeletal implementation.
Java's PriorityQueue uses the elements' so-called "natural ordering", which is revealed by the compareTo method. That means the items it stores must be Comparable. To this point, you've likely been storing items inside the stack and queue as Chains. However, these are not comparable. While it might make sense to compare chains by their length, our new manager will require that we compare candidate chains by their total estimated length.
One might argue that a chain is independent of a solution or even the doublets or the search strategy; it is just a sequence of words. Your prior managers knew nothing about the doublets, chain length (explicitly), or even the search they played a part in. Because we might use estimation methods other than the one outlined above, the estimated total length is something that should be the chain manager's responsibility for calculating. In addition, it is often true in real-world practice you are handed classes and objects and you cannot change them. Consider that to be the case here: the Chain class simply cannot be modified.
Constructor Add a constructor to your PriorityQueueChainManager class that takes the ending word as a parameter (which should then be a member of your class).
Distance Estimate Add a private method to your PriorityQueueChainManager class that takes a string and returns the number of characters that differ from the target string. It may take as a precondition that the strings are the same length.
Entry Class Because chains are not directly comparable using their length, we must create a new composite type to store in the priority queue. This new type will use a sum of the chain's length and the "distance from target" of the chain's last word, as defined above, as the basis of its natural ordering.
Create an inner class of PriorityQueueChainManager called Entry. Its constructor should take a Chain as a parameter, calculate and store the total value of that chain using its length and your distance estimate, as outlined above.
The class should implement Comparable using the total value as the basis for comparison.
You should now be able to complete the implementation of your priority queue-based chain manager. A call to next should always return the chain that has the lowest value (length plus differences).
Augment your Doublets program so that it can now use the new search strategy as an option. That is, your program
asks:
Enter chain manager (s: stack, q: queue, p: priority queue, x: exit): p
PS: once you finish this part, you will have implemented a cornerstone of artificial intelligence (AI): A* search! http://en.wikipedia.org/wiki/A*_search_algorithm
To enable you to focus on the non-routine parts of this program, we have provided you with a small amount of starting code and several test classes.
In ChainManager.java, we have given you a basic framework and some variable declarations that you may find helpful. Feel free to change anything except the signature of the methods, since we may write test code that calls these directly. You may wish to write additional methods to help with your computations.
There are unit tests for the Link, Chain, and Doublets classes, which are some of the first you'll need to write. PQPerformanceTest.java will check to see if your PriorityQueue-based solutions are better than the others. And of course you should test your complete interactive program.
Turn-in your programming work by committing and pushing to your repository. (Of course, you should have committed many times by now). Be sure that your commit message contains the name(s) of the driver, as in "worked as pair, Mary drove".
Link unit tests | 20 |
Chain unit tests | 10 |
Doublets unit tests (for stack and queue chain managers) | 22 |
PQPerformance unit tests (mostly on PriQ, 3 points are for saying no path when no path exists) | 23 |
Program works interactively (-2 if crashes when start or end word isn’t in dictionary, -2 if crashes when start and end words have different lengths) | 15 |
Efficiency. This is to make sure that getCandidates is O(1), it doesn't take more than 30 seconds to load Dict35, or reloads the dictionary each time it looks for a chain. | 10 |
Style. Good layout, comments, variable names, good use of methods to break up the task, use of local variables instead of fields where appropriate, use meaningful constant names to avoid magic numbers. Up to -10 for egregious problems. | |
Total | 100 |