Due: Wednesday, April 9, 8pm
Make sure that you have the code that we worked on in class operating correctly.
Try running your Markov data structure generator on another sample input. Let's try Jorge Luis Borges' "The Garden of Forking Paths".
Try using pprint.pp()
to print out the next word
lists for a few select words. Compare your outputs and make
sure they are identical to these:
pprint.pp(markov["In"])
:
['his', 'despite', 'the', 'ten', 'all', 'all', 'the', "Ts'ui", 'the', 'the', 'your', 'some', 'this', 'another,', 'yet', 'the', 'point'],
pprint.pp(markov["by"])
:
['thirteen', 'fourteen', 'Dr.', 'a', 'train.', 'making', 'besting', 'an', 'a', 'the', 'distance.', 'clapping', 'the', 'the', 'many', 'the', 'one', 'them', 'means', 'the', 'careless', 'lightning.', 'the', 'the', 'the'],
pprint.pp(markov["other"])
:
['men,', 'men,', 'enterprise', 'than', 'times.', 'bifurcations.', 'possible', 'through', 'dimensions', 'course'],Can you find any other interesting outputs from running this code on this input data? Upload any interesting outputs that you can find.
There seems to be a lot of garbage characters and weird formatting in the David Copperfield text. Try to clean up that file and see if it appears to improve the behavior of the Markov process in any way. Does the data structure look good, or is there junk in it? Are there some things you can do to "clean up"" this data? Post your results.