Code Toolkit: Python, Fall 2020

Week 15 — Wednesday, December 8 — Class notes

Networking, continued

Table of contents

(The following links jump down to the various topics in these notes.)

  1. Review of last week
  2. Running Python in a webserver with CGI
  3. Input variables to CGI
  4. Example: CGI interactive dictionary
  5. Putting it all together

Review

Last week we discussed network protocol, mainly focussing on hypertext transfer protocol (HTTP): the protocol of the world wide web, used for retrieving web pages by browsers. Wtih HTTP, browsers issue HTTP requests to web servers, which reply in turn with responses containing the request page.

We saw how you can use the command line tool nc to manually issue HTTP requests, receiving responses that include HTTP headers, which include status codes. But this was really tedious to do.

We then saw how you can open network connecctions in Python using a socket object, which you can read from and write to similar to the input() command, or to a file object. This too was quite tedious.

Then I explained that Python offers utilities that implement protocols like HTTP in ways that are much easier for you as a programmer to use. The one that we'll be looking at in class is called urllib.

Then we revisted the networking example from week 13, and saw how we could put that all together to create Processing programs that ran and displayed the same data coordinated by a web server. Each Processing program sent data to that server, and received back a list of JSON data, which it would then render to the screen. This is the principle you will use to create a networked final project.

Finally, I showed you how you could run your own webserver on your computer, using the Python utility:

$ python -m http.server 8000
Whatever directory you are in when you run this command will be shareable as the root directory of a website. All the files in that directory (as well as any subdirectories) will be shared by the webserver as long as it's running, and you can visit this local website by typing http://127.0.0.1:8000 into your browser.

Running a webserver using this convenient Python utility only serves static pages: it simply lets the user click on any files in that directory to view. But we can do something more interesting which is dynamically generate web page responses, by running Python code inside a webserver. This is where we ended last week, and where we'll pick up today ...

II. Running Python in a webserver with CGI

Instead of simply creating a website that returns static pages, if you want to use Python code to dynamically generate output that gets displayed as if it was a webapge, you can run Python code in a webserver using something called CGI, which stands for common gateway interface.

To be honest, CGI is an older web technology. It was more popular in the earlier days of the web (circa 2000) and has since been largely eclipsed by more advanced ways of doing this. But this is the easiet way to understand this principle, and with this knowledge as an introduction to the fundamentals, you can later go on to learn shortcuts and more advanced ways of achieving this result.

With CGI, when a webserver receives a request, instead of simply returning the file that the request asked for, the server tries to execute the file, treating it as a computer program, and the output of the program gets returned to the user as the content of the page that they requested. In other words, with CGI, instead of a webserver returning a file to a browser, it treats the requested file as a program, runs the program, and returns the output of that program file to the user instead of the file contents.

Let's get started ...

Create a folder for this week, and within it, create a subfolder called cgi-bin. Open Atom. Create a new file, save it, call it server.py, and make sure you are saving it in the cgi-bin folder that you just created.

For CGI to work, the first line of this file needs to be the executable path to Python on your system. On the command line, type:

$ which python3
/Library/Frameworks/Python.framework/Versions/3.8/bin/python3
For me, that returns the above output. For you, it may be slightly different. Whatever it says, copy that output, and paste it in as the first line of server.py, preceded by a pound sign (#), like this:
#!/Library/Frameworks/Python.framework/Versions/3.8/bin/python3
This is called a hashbang line, a common term among CGI programs and similar techniques. Now you can add any arbitrary Python code into this program. The first thing you must do in order to comply with the HTTP protocol is to return the following text:
Content-Type: text/html
	  
including the blank line. So put the following commands next into server.py:
print("Content-Type: text/html")
print("")
After that, try adding some other Python commands. For example for now you could simply add:
print("Hello, world!")
Next, cd in to the folder that you created for this week (i.e., the folder that contains your cgi-bin folder). Make sure that your server.py file is executable. You can check this by typing the following:
$ ls -l cgi-bin
-rw-r--r--  1 rory  staff  0 Dec  8 02:56 cgi-bin/server.py
You'll probably see something like the above. The rs indicate that this file is readable, and the ws indicate that it is writeable. The file is probably not executable by default, so to make sure that it is, type the following command:
$ chmod a+x cgi-bin/server.py
(This command is short for "change mode" and it allows you to change the read, write, and execute permissions on files.)

Now if you type ls -l again you should see something like this:

$ ls -l cgi-bin/server.py 
-rwxr-xr-x  1 rory  staff  0 Dec  8 02:56 cgi-bin/server.py

The xs here indicate that this file is now executable.

Now try running an HTTP server in this directory. This is similar to the above command with a slight modification:

$ python -m http.server --cgi 8000
And now in your browser try visiting: 127.0.0.1:8000/cgi-bin/server.py

In the terminal, you are now viewing a log of all the requests that the server receives, and messages about how it is handling responses to them. When you are ready to exit out of this, you can type control-C to stop the webserver and get you back to the command line.

This is somewhat similar to running a Python program on the command line, but instead of typing a command to execute, a browser makes a web requests to a URL, which then invokes the program, and the browser then displays the response.

III. Input variables to CGI

You can pass dynamic variable values to CGI programs as inputs. These are called query parameters, and they are the things that you see in URLs that appear after the question mark (?).

For example, try typing the following URL in to your browser:

https://www.google.com/search?q=gritty

You can change the text in orange to change your search query. This is how dynamic user input works on the web, including from web forms.

In a Python CGI script, you can access these variables using the cgi library, and an object called FieldStorage, like this:

import cgi
inputs = cgi.FieldStorage()      

Now, you can treat the variable inputs as if it were a dictionary containing any query parameters and their values as key-value pairs. For example, with the Google URL above, your code might include something like:

import cgi
inputs = cgi.FieldStorage()

search_term = inputs["q"].value

Then the code would implement the search for search_term (whatever the value of that variable is) and return the results to the user. (I find the need to add the .value here somewhat annoying, but that is the way this library works. I could explain why if you are curious.)

You can read more about this in the official Python documentation here: Common Gateway Interface support.

IV. Example: CGI interactive dictionary

(a) Begin. Let's work through an example of this by adapting the interactive dictionary program that we worked on a few weeks ago.

Let's start with the simplest version of that program that we came up with, from week 11 homework, part 3. Create a new file in Atom, save it as dictionary.py, make sure that you save it into your cgi-bin directory, and make sure that you make it executable with the chmod command above. Enter the following code:

(Note that the orange text will need to be adapted to the path to Python on your system that you found above.)

#!/your/path/to/python/here
      
import cgi

print("Content-type: text/html")
print("")

inputs = cgi.FieldStorage()

dictionary = {}

dictionary["apple"] = "a red or green fruit"
dictionary["banana"] = "a yellow or brown fruit"
dictionary["carrot"] = "an orange vegetable"
  
user_text = inputs["word"].value

if user_text in dictionary:
    print("The definition of {w} is:".format(w=user_text))
    print("  " + dictionary[user_text])
else:
    print("hm, I don't know that word.")

This code is importing the cgi library, using FieldStorage to access any query parameters that have been passed in, then accessing the query parameter called word, and trying to look that up in the small dictionary that we create here.

This code also shows how to use the .format() command to include variable values in a string.

Make sure your webserver is running (if not, run it with python -m http.server --cgi 8000) and in your browser, try visiting:

http://127.0.0.1/cgi-bin/dictionary.py?word=apple

Try different values for the query paramter and see what you get.

(b) Adding a form. Having to type that input as a query parameter in the URL is pretty awkward and not very user-friendly. What if we offered the user a friendly form they could use to type the input?

First, try deleting the query paramter from the URL and pressing enter. What do you get? I get a blank white screen. Why? Flip over to the command line to see the server error. I see this:

Traceback (most recent call last):
  File "/Users/rory/Documents/Code Toolkit Python Fall 2021/Week 15/cgi-bin/dictionary.py", line 16, in <module>
    user_text = inputs["word"].name
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/cgi.py", line 517, in __getitem__
    raise KeyError(key)
KeyError: 'word'

We're getting a KeyError because we're assuming the query parameter word exists. But if the user doesn't type it in, it won't exist. Let's implement some more graceful error handling:

(Starting at line 14. Note the new indentation.)

dictionary["carrot"] = "an orange vegetable"

if "word" in inputs:
    user_text = inputs["word"].value

    if user_text in dictionary:
        print("The definition of {w} is:".format(w=user_text))
        print("  " + dictionary[user_text])
    else:
        print("hm, I don't know that word.")

So now, if the query parameter word is in the dictionary of user inputs, then lookup the word as before. Otherwise, do nothing.

Now let's improve on this a bit to add an HTML form, so the user doesn't have to type in the query parameter to the URL. An simple HTML web form for this task looks like this:

<form action='/cgi-bin/dictionary.py'>
  Type a word: <input name='word' type='text'>
  <input type='submit' value='Lookup'>
</form>

But we need to return this from Python. Which we can do like this, using the triple quotes """ for an extended string:

dictionary["carrot"] = "an orange vegetable"

if "word" in inputs:
    user_text = inputs["word"].value

    if user_text in dictionary:
            print("The definition of {w} is:".format(w=user_text))
        print("  " + dictionary[user_text])
    else:
        print("hm, I don't know that word.")
    
html = """
<form action='/cgi-bin/dictionary.py'>
Type a word: <input name='word' type='text'>
<input type='submit' value='Lookup'>
</form>
"""
print(html)

So now, we print out a basic HTML form, so that when the user presses "Lookup", it takes them back to this script with the query parameter specified by the HTML form.

(c) Adding another form to add new definitions. As we did last time, let's add functionality for the user to add new word definitions.

If the user's word is not already in the dictionary, let's return a different form that prompts the user to enter a definition:

dictionary["carrot"] = "an orange vegetable"

if "word" in inputs:
    user_text = inputs["word"].value

    if user_text in dictionary:
        print("The definition of {w} is:".format(w=user_text))
        print("  " + dictionary[user_text])
    else:
        print("hm, I don't know the word " + user_text)
        html = """
        <form action='/cgi-bin/dictionary.py'>
        <input type='hidden' name='word' value='{w}'>
        Enter a definition for it: <input name='definition' type='text'>
        <input type='submit' value='Save'>
        </form>
        """.format(w=user_text)
        print(html)
        exit()

html = """
<form action='/cgi-bin/dictionary.py'>
Type a word: <input name='word' type='text'>
<input type='submit' value='Lookup'>
</form>
"""
print(html)

This new form includes a hidden parameter that the user does not see in the form. But it gets secretly passed in as a query parameter. The reason for this is that HTTP is what is called stateless. When responding to a request, there is no way of knowing what a previous request may have included. In other words, if you want the user to enter a definition, you have to also specify the word that they are defining — there is no way to "remember" it from the previous request.

Now, if we actually want to process this user input to expand our set of definitions, we could handle that case like this:

dictionary["carrot"] = "an orange vegetable"

if "definition" in inputs:
    # somehow save this word and definition. But how?

elif "word" in inputs:
    user_text = inputs["word"].value

That if statement checks if the user has submitted definition as a query parameter, but what do we do if so? For this, think back to this example from week 12 that saves the dictionary into a JSON file and opens it up again. Or better yet, this version of that code which adds some error handling to that: week11-hw-review-error-handling2.html

Merging those together, we end up with this code:

week15-cgi-with-file-save.html

V. Putting it all together

Believe it or not, at this point you should hopefully have enough understanding of the main principles needed to create a networked version of your midterm project. Of course, the technical details will still be tricky.

For game option midterms: Examine these three pieces:

  1. week15_processing_main.pyde. Create a new Processing sketch and make this the main tab.

  2. week15_processing_network.py. Create a new tab called network and put all this code into that.

  3. week15_server.py.html. Create a new file in Atom, copy/paste this code into that, and save it into a new cgi-bin folder. Remember to adjust the permissions on this new file with chmod (see above). This uses a thing called SqliteDict to make it even easier to write JSON to a file in a secure way. This simplifies the code your CGI script will need to pass data between two clients.

Those three files are using id, x, and y fields. You can modify each of those files to work with the fields that you're using in your midterm. Then try to integrate that code with your own. One important trick: in order to run two Processing clients at the same time, you'll have to Save-As one with a different name and run them both. When testing this, I will Save-As whatever I'm working on and call it something like "tmp", then delete it after testing. I don't want to have two versions of the code floating around because when working I'm making changes to one and I dont want to replicate them in the other. Unfortunately there isn't an easier way to run to versions of the same Processing code right now.

For data visualization option midterms: For this you will need to expand the HTML form aspect of the above discussion, and link that to your midterm. You can also adapt week15_processing_main.pyde and week15_processing_network.py above to retrieve the data collected via your HTML form. You would then need to open the JSON data file from within Processing and use that data to render your project.