Code as a Liberal Art, Spring 2021

Unit 2, Tutorial 3 homework

Due: Tuesday, March 29, 8pm

  1. Review the class notes for this week.
  2. Reminder that based on how far we got in lecture, the adjusted expectations for this week are as follows. The below striked-out text will be carried over to the homework for next week, which will lead in to the Unit 2 presentations.

  3. Try some data scraping that finds unstructured data and adds it to a data structure.

    Find a website that has a search function, search for something, and use the browser inspector to examine the HTML results. For example, searching for "widgets" on Ebay.

    View the page source (or use your browser's inspect tool) and try to find a general rule that you can use to consistently locate some data point on the page. Think about trying to locate 2-3 data points for each item in the search results for example.

    Make a new Python file in Atom to create a program for this. Copy/paste the URL into your Python to make a request to get() that page, get its content, and create a BeautifulSoup object.

    Create a new, empty data structure to store your data. Use a list of dictionaries, where each item in the list (each dictionary) corresponds to one of the search results, and the key/value pairs of the dictionary are the properties of one search result.

    Now, using BeautifulSoup, target the HTML element(s) containing the data that you are interested in and add them to your data structure.

    At the end of your program, print out the data structure. Let's use the Python PrettyPrinter module. Add the following lines to your program (at the end):

    import pprint
    pp = pprint.PrettyPrinter(indent=4)
    
    Then, after that, add a line like this:
    pp.pprint(YOUR DATA STRUCTURE)
    
    replacing YOUR DATA STRUCTURE with the name of your data structure.

  4. Create a Google Doc note that reflects on this. Can you think of some things that you might be able to do with this data now that it's what we could call structured? Can you link this back to any concepts from this week around advantages and disadvantages of distant reading and algorithmic methods for analyzing texts? Think about how this technique could be applied to other examples and at other scales.