Due: Tuesday, March 29, 8pm
Reminder that based on how far we got in lecture, the adjusted expectations for this week are as follows. The below striked-out text will be carried over to the homework for next week, which will lead in to the Unit 2 presentations.
Try some data scraping that finds unstructured data and adds it to a data structure.
Find a website that has a search function, search for something, and use the browser inspector to examine the HTML results. For example, searching for "widgets" on Ebay.
View the page source (or use your browser's inspect tool) and try to find a general rule that you can use to consistently locate some data point on the page. Think about trying to locate 2-3 data points for each item in the search results for example.
Make a new Python file in Atom to create a program for
this. Copy/paste the URL into your Python to make
a request
to get()
that page, get
its content, and create a BeautifulSoup
object.
Create a new, empty data structure to store your data. Use a list of dictionaries, where each item in the list (each dictionary) corresponds to one of the search results, and the key/value pairs of the dictionary are the properties of one search result.
Now, using BeautifulSoup, target the HTML element(s) containing the data that you are interested in and add them to your data structure.
At the end of your program, print out the data
structure. Let's use the Python PrettyPrinter
module. Add the following lines to your program (at the end):
import pprint pp = pprint.PrettyPrinter(indent=4)Then, after that, add a line like this:
pp.pprint(YOUR DATA STRUCTURE)
replacing YOUR DATA STRUCTURE
with the name of your data structure.