Code Toolkit: Python, Fall 2020

Week 12 — Wednesday, November 18 — Class notes

In-class review of last week

In class we debugged Emma's work for part 1 of last week's homework. In working on that we came up with three different ways of implementing a solution. Here is the file that we came up with, which includes all three approaches: week11-hw-part1-mags.py I encourage you to step through each of these.

There were also some questions about part 2 of last week's homework but we didn't have time to get to those. I think the technique being developed there is one that will be pretty useful for many people with final projects. If you have any questions about it, please send me a Gist or an email. If there is interest we can take some time to review it together as a group.

Networking and the web

Last week we ended with a tiny glimpse into the world of networking: we wrote code that made requests from Processing running on our computers, to Python code running on a different computer which returned JSON data. This allowed all of our Processing programs to access the same shared data by making requests to the same Python program. This kind of arrangement is known as a client-server model of computing, sometimes described as a client-server architecture, "architecture" being a computer science term for the structure of a digital system.

All of the Processing programs that we wrote and then ran in this case were considered the clients, and the Python code running on a different machine is what we would refer to as the server. There are many different types of servers: file servers, email servers, login servers, and thousands of other possibilities. The server that we connected to last week is what is called a web server, because it implements HTTP, which stands for hypertext transfer protocol. By "implements" I mean that this server knows how to "speak" this protocol: it has been programmed to recognize the commands that this protocol defines, and knows what commands to issue in response.

Request and response

You have probably already inferred that HTTP, like most protocols, is structured around requests and responses. The request is a bit of data that a client sends to a server, and the response is the data that the server sends back. HTTP is the protocol of the world wide web. It is the commiunication protocol used to transmit all web pages. This is why most URLs start with "http://". (Sometimes they start with other things like ftp://, https://, or even file:// when you are viewing a file located on your own computer.) HTTP is a convenient protocol to experiment with, because at its most basic usage, the requests and responses are plain text. Let's experiment ...

Open up a command shell (e.g Terminal), run Python, and type the following commands (remember, don't type the command prompts $ or >>>)

$ python
>>> import socket
>>> s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
>>> s.connect(("google.com",80))
>>> s.send(b'GET / HTTP/1.0\n\n')
16
You have just sent an HTTP request!

This is using something called a socket. A socket is kind of like a file, in that it is something that your code can write to and read from, but it is not actually a file saved on your local disk. Rather, it is a connection to open a channel of communication with another computer, over a network. A socket is usually specified by an IP address (the numerical internet address of the computer you are connecting to) and a port number, which kind of means the window on that computer to which you want to connect. Most computers need to do more than one network thing simultaneously (e.g. visit web pages, send and receive emails, view Zoom meetings, transfer files), and these multiple simultaneous funcions are managed by doing different things through different simultaneous connections that are each specified by a unique port number. The port for the web is usually 80, so when doing testing, people will often use variations of this, like 8000, 8080, or 8888. (Lower numbers are reserved for special functions and special permissions, but higher numbers can be used by anyone.)

The above command should have displayed a number, this is just a status output telling you the number of bytes that were sent — in this case, 16, corresponding to the 16 characters of my request. (If you are reading closely, you will have noticed that there are appear to be 18 characters in my request, but that is because \n is an escape code and so counts as one character, meaning "newline", like pressing enter.)

Next, we want to get the response from the server. To do that, type the following:

>>> data = s.recv(1024)
>>> repr(data)
This should print a big chunk of HTML, which is the webpage that google.com returns for a basic request. You can experiment with the string in quotes above by replacing google.com with a different website that you'd like to visit. And you can also modify the / in the request to ask for a different page. For example, if I wanted to access an article on nytimes.com today, I would do the following:
>>> import socket
>>> s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
>>> s.connect(("nytimes.com",80))
>>> s.send(b'GET /2020/11/18/health/pfizer-covid-vaccine.html HTTP/1.0\n\n')
59
and then receive the request using the same lines as above. This request is actually giving me an error at the moment as I type this, which you can tell because the response says:
" 'HTTP/1.1 500 Domain Not Found ... 
But that's OK. HTTP is a complicated protocol and we should not stress if we cannot make it work perfectly by hand. The point of this experiment is to see how the call and response process of request and response works. With this response, you see an error code of 500, which is part of the HTTP protocol that indicates there has been some kind of problem.

A simple webserver

Now that we have made some HTTP requests to a server somewhere on the internet, let's try to set up our own webserver and make requests to that. Python actually comes bundled with a very simple webserver library that implements HTTP, called SimpleHTTPServer. Using this, we can experiment with running our own very simple webserver running locally on our computers.

First, make a new directory (folder) in Finder (or Explorer), create a new file in Atom, add some simple HTML code, and save it into this new folder as hello.html. You can use this basic HTML:

<html>
  <body>
    <h1>Hello!</h1>
    <p>This is some basic HTML</p>
  </body>
</html>
Now, cd in to this new directory, and type the following command:
$ python -m SimpleHTTPServer 8000
What this does is run a simple webserver, locally on your machine, using port 8000. In the terminal, you are now viewing a log of all the requests that the server receives, and messages about how it is handling responses to them. Open up a browser and try visiting this local webserver. The IP address of your local computer is always 127.0.0.1, and remember that we are using port 8000, which you specify with a colon, so type this in to your browser: 127.0.0.1:8000. You should see something happening in the log, and your browser should show you a list of the contents of this directory. If you click on hello.html, you should now see your HTML file displayed.

When you are ready to exit out of this, you can type control-C to stop the webserver and get you back to the command line. Try adding additional files and subfolders to the folder of this simple webserver and re-running the server. As you click around to HTML files in your browser, note how the URLs of these pages correspond to the folder and subfolder structure that you create. This is the principle behind how all websites and URLs are structured: the IP address and port refer to a computer program (a server) running on a computer somewhere, and the URL with all of its slashes (/) refers to folders and subfolders which that server program can see.

Running Python code in a webserver

OK, well that is nice and fun, but what if you want a webserver that does more than just return static HTML pages to a user. What if you want a webserver that dynamically generates content using all that you have learned about coding, and returns the output of a program to the web user?

For this, we can use a technique called CGI, which stands for common gateway interface. This is a technique where a server receives an HTTP request , but instead of simply returning an HTML file, the server executes a computer program, and the output of that program is returned as the HTTP response. It is similar to running a Python program on the command line, but instead of typing a command to execute, a user on the web requests a URL with their browser, which then invokes the program, and the browser then displays the response. In a way, it is also somewhat similar to all the work that we have been doing in Processing, except instead of interactive keyboard and mouse input, the only inputs come from the request, and instead of displaying visual graphic results, you can only return textual content in the web response.

Let's experiment with this ...

Create a subfolder called cgi-bin. Within that subfolder, create a new file called server.py, and open that in Atom. cd to this new folder, and make sure that your server.py file is executable. You can check this by typing the following (remember, don't type the dollar signs):

$ ls -l cgi-bin
It probably is not executable by default, so to make sure that it is, type the following command:
$ chmod a+x cgi-bin/server.py
(This command is short for "change mode" and it allows you to change read, write, and execute permissions on files.)

The first line of this file should be the executable path to Python on your system. Type:

$ which python
/usr/local/bin/python	  
For me, that outputs /usr/local/bin/python, but for you it may be different. Whatever it says, copy that output, and past it in as the first line of server.py, preceded by a pound sign (#), like this:
#!/usr/local/bin/python	  
Now you can add any arbitrary Python code into this program. The first thing you must do to follow the HTTP protocol is return the following text: including the blank line. So put the following commands in server.py:
print("Content-Type: text/html")
print("")
After that, try adding some other Python commands. For example for now you could simply add:
print("Hello, world!")

Now try running an HTTP server in this directory. But we cant use the same one as before. We have to use a webserver that is capable of serving CGI scripts. Fortunately Python also comes with a library that does this. So run the following command:

python -m CGIHTTPServer 8000
And now in your browser try visiting: 127.0.0.1:8000/cgi-bin/server.py

In the homework for this week, I explain how you can create an HTML page with a form into which a user can enter values, which will then be made available as a dictionary of key-value pairs to your Python CGI script. This will allow your CGI script to treat the user values from the HTML form as inputs, and work with those to generate a dynamic response.