Assigned: Thursday, March 27
Due for in-class presentation, Thursday, April 17 (with 2-4 people presenting works-in-progress earlier on Tuesday, April 15)
Final draft of code: due in your Google Drive Folder on Friday, April 18, 8pm.Using "scrapism" techniques, create an "active archive". These terms are concepts borrowed from our reading by Sam Lavigne, so consider reviewing that text to make sure you are clear on their meanings.
Start by using scraping techniques to build a corpus of text. You can either:
Ultimately, you will submit a written text or collection of texts that have been generated by algorithm and the data of your corpus.
We will see how to use a Markov chain to build a data structure that models an entire corpus and the relationships of words within it. You will also see how to use this Markov chain to generate new sentences or phrases that seem like they could plausibly be from that corpus.
Your final output can be entirely Markov-generated, or can be generated in a more structured way (think like a Mad Lib), or some hybrid of these two approaches. In other words, maybe you might use a Markov process to generate sentences within the larger template of a story, legal document, or some other template.
Refer back to Unit 2 lessons 2-4 for how to implement this.
Your Unit 2 Project
folder in Google
Drive should have two separate Python programs:
build_corpus.py
which
will include your web scraping and/or crawling code, and
will generate one or more text files within a subfolder
called corpus
.
markov.py
will
read from the file(s) in this corpus and generate your
output in a folder called output
.
To complete the assignment experiment with running the Markov algorithm several times and adjusting it until you get something that you like. Then in class you can present both your ouput text, as well as the code used to generate it, and the code used to build your corpus.