(The following links jump down to the various topics in these notes.)
Last week we discussed network protocol, mainly focussing on hypertext transfer protocol (HTTP): the protocol of the world wide web, used for retrieving web pages by browsers. Wtih HTTP, browsers issue HTTP requests to web servers, which reply in turn with responses containing the request page.
We saw how you can use the command line tool nc to manually issue HTTP requests, receiving responses that include HTTP headers, which include status codes. But this was really tedious to do.
We then saw how you can open network
connecctions in Python using a socket
object, which you can read from and write to similar to
the input()
command, or to a file
object. This too was quite tedious.
Then I explained that Python offers utilities that implement
protocols like HTTP in ways that are much easier for you as a
programmer to use. The one that we'll be looking at in class is
called urllib
.
Then we revisted the networking example from week 13, and saw how we could put that all together to create Processing programs that ran and displayed the same data coordinated by a web server. Each Processing program sent data to that server, and received back a list of JSON data, which it would then render to the screen. This is the principle you will use to create a networked final project.
Finally, I showed you how you could run your own webserver on your computer, using the Python utility:
$ python -m http.server 8000Whatever directory you are in when you run this command will be shareable as the root directory of a website. All the files in that directory (as well as any subdirectories) will be shared by the webserver as long as it's running, and you can visit this local website by typing
http://127.0.0.1:8000
into your browser.
Running a webserver using this convenient Python utility only serves static pages: it simply lets the user click on any files in that directory to view. But we can do something more interesting which is dynamically generate web page responses, by running Python code inside a webserver. This is where we ended last week, and where we'll pick up today ...
Instead of simply creating a website that returns static pages, if you want to use Python code to dynamically generate output that gets displayed as if it was a webapge, you can run Python code in a webserver using something called CGI, which stands for common gateway interface.
To be honest, CGI is an older web technology. It was more popular in the earlier days of the web (circa 2000) and has since been largely eclipsed by more advanced ways of doing this. But this is the easiet way to understand this principle, and with this knowledge as an introduction to the fundamentals, you can later go on to learn shortcuts and more advanced ways of achieving this result.
With CGI, when a webserver receives a request, instead of simply returning the file that the request asked for, the server tries to execute the file, treating it as a computer program, and the output of the program gets returned to the user as the content of the page that they requested. In other words, with CGI, instead of a webserver returning a file to a browser, it treats the requested file as a program, runs the program, and returns the output of that program file to the user instead of the file contents.
Let's get started ...
Create a folder for this week, and within it, create a subfolder
called cgi-bin
. Open Atom. Create a new file, save
it, call it server.py
, and make sure you are saving
it in the cgi-bin
folder that you just created.
For CGI to work, the first line of this file needs to be the executable path to Python on your system. On the command line, type:
$ which python3 /Library/Frameworks/Python.framework/Versions/3.8/bin/python3For me, that returns the above output. For you, it may be slightly different. Whatever it says, copy that output, and paste it in as the first line of
server.py
,
preceded by a pound sign (#
), like
this:
#!/Library/Frameworks/Python.framework/Versions/3.8/bin/python3This is called a hashbang line, a common term among CGI programs and similar techniques. Now you can add any arbitrary Python code into this program. The first thing you must do in order to comply with the HTTP protocol is to return the following text:
Content-Type: text/htmlincluding the blank line. So put the following commands next into
server.py
:
print("Content-Type: text/html") print("")After that, try adding some other Python commands. For example for now you could simply add:
print("Hello, world!")Next,
cd
in to the folder that you
created for this week (i.e., the folder that contains
your cgi-bin
folder). Make sure that
your server.py
file
is executable. You can check this by typing the
following:
$ ls -l cgi-bin -rw-r--r-- 1 rory staff 0 Dec 8 02:56 cgi-bin/server.pyYou'll probably see something like the above. The
r
s indicate that this file
is readable, and the w
s indicate that
it is writeable. The file is probably
not executable by default, so to make sure that
it is, type the following command:
$ chmod a+x cgi-bin/server.py(This command is short for "change mode" and it allows you to change the read, write, and execute permissions on files.)
Now if you type ls -l
again you should
see something like this:
$ ls -l cgi-bin/server.py -rwxr-xr-x 1 rory staff 0 Dec 8 02:56 cgi-bin/server.py
The x
s here indicate that this file is
now executable.
Now try running an HTTP server in this directory. This is similar to the above command with a slight modification:
$ python -m http.server --cgi 8000
And now in your browser try visiting: 127.0.0.1:8000/cgi-bin/server.py
In the terminal, you are now viewing a log of all the requests that the server receives, and messages about how it is handling responses to them. When you are ready to exit out of this, you can type control-C to stop the webserver and get you back to the command line.
This is somewhat similar to running a Python program on the command line, but instead of typing a command to execute, a browser makes a web requests to a URL, which then invokes the program, and the browser then displays the response.
You can pass dynamic variable values to CGI programs as
inputs. These are called query parameters, and
they are the things that you see in URLs that appear after the
question mark (?
).
For example, try typing the following URL in to your browser:
https://www.google.com/search?q=gritty
You can change the text in orange to change your search query. This is how dynamic user input works on the web, including from web forms.
In a Python CGI script, you can access these variables using
the cgi
library, and an object
called FieldStorage
, like this:
import cgi inputs = cgi.FieldStorage()
Now, you can treat the variable inputs
as if it
were a dictionary containing any
query parameters and their values
as key-value pairs. For example, with the
Google URL above, your code might include something like:
import cgi inputs = cgi.FieldStorage() search_term = inputs["q"].value
Then the code would implement the search
for search_term
(whatever the value of that
variable is) and return the results to the user. (I find the
need to add the .value
here somewhat annoying, but
that is the way this library works. I could explain why if you
are curious.)
You can read more about this in the official Python documentation here: Common Gateway Interface support.
(a) Begin. Let's work through an example of this by adapting the interactive dictionary program that we worked on a few weeks ago.
Let's start with the simplest version of that program that we
came up with, from week 11 homework,
part 3. Create a new file in Atom, save it
as dictionary.py
, make sure that you
save it into your cgi-bin
directory,
and make sure that you make it executable with
the chmod
command above. Enter the
following code:
(Note that the orange text will need to be adapted to the path to Python on your system that you found above.)
#!/your/path/to/python/here
import cgi
print("Content-type: text/html")
print("")
inputs = cgi.FieldStorage()
dictionary = {}
dictionary["apple"] = "a red or green fruit"
dictionary["banana"] = "a yellow or brown fruit"
dictionary["carrot"] = "an orange vegetable"
user_text = inputs["word"].value
if user_text in dictionary:
print("The definition of {w} is:".format(w=user_text))
print(" " + dictionary[user_text])
else:
print("hm, I don't know that word.")
This code is importing the cgi
library,
using FieldStorage
to access any query
parameters that have been passed in, then accessing
the query parameter called word
,
and trying to look that up in the small dictionary that we
create here.
This code also shows how to use the .format()
command to include variable values in a string.
Make sure your webserver is running (if not, run it
with python -m http.server --cgi 8000
)
and in your browser, try visiting:
http://127.0.0.1/cgi-bin/dictionary.py?word=apple
Try different values for the query paramter and see what you get.
(b) Adding a form. Having to type that input as a query parameter in the URL is pretty awkward and not very user-friendly. What if we offered the user a friendly form they could use to type the input?
First, try deleting the query paramter from the URL and pressing enter. What do you get? I get a blank white screen. Why? Flip over to the command line to see the server error. I see this:
Traceback (most recent call last): File "/Users/rory/Documents/Code Toolkit Python Fall 2021/Week 15/cgi-bin/dictionary.py", line 16, in <module> user_text = inputs["word"].name File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/cgi.py", line 517, in __getitem__ raise KeyError(key) KeyError: 'word'
We're getting a KeyError
because we're assuming the
query parameter word
exists. But if the user
doesn't type it in, it won't exist. Let's implement some more
graceful error handling:
(Starting at line 14. Note the new indentation.)
dictionary["carrot"] = "an orange vegetable" if "word" in inputs: user_text = inputs["word"].value if user_text in dictionary: print("The definition of {w} is:".format(w=user_text)) print(" " + dictionary[user_text]) else: print("hm, I don't know that word.")
So now, if
the query
parameter word
is in
the
dictionary of user inputs, then lookup the word as
before. Otherwise, do nothing.
Now let's improve on this a bit to add an HTML form, so the user doesn't have to type in the query parameter to the URL. An simple HTML web form for this task looks like this:
<form action='/cgi-bin/dictionary.py'> Type a word: <input name='word' type='text'> <input type='submit' value='Lookup'> </form>
But we need to return this from Python. Which we can do like
this, using the triple quotes """
for an extended
string:
dictionary["carrot"] = "an orange vegetable" if "word" in inputs: user_text = inputs["word"].value if user_text in dictionary: print("The definition of {w} is:".format(w=user_text)) print(" " + dictionary[user_text]) else: print("hm, I don't know that word.") html = """ <form action='/cgi-bin/dictionary.py'> Type a word: <input name='word' type='text'> <input type='submit' value='Lookup'> </form> """ print(html)
So now, we print out a basic HTML form, so that when the user presses "Lookup", it takes them back to this script with the query parameter specified by the HTML form.
(c) Adding another form to add new definitions. As we did last time, let's add functionality for the user to add new word definitions.
If the user's word is not already in the dictionary, let's return a different form that prompts the user to enter a definition:
dictionary["carrot"] = "an orange vegetable" if "word" in inputs: user_text = inputs["word"].value if user_text in dictionary: print("The definition of {w} is:".format(w=user_text)) print(" " + dictionary[user_text]) else: print("hm, I don't know the word " + user_text) html = """ <form action='/cgi-bin/dictionary.py'> <input type='hidden' name='word' value='{w}'> Enter a definition for it: <input name='definition' type='text'> <input type='submit' value='Save'> </form> """.format(w=user_text) print(html) exit() html = """ <form action='/cgi-bin/dictionary.py'> Type a word: <input name='word' type='text'> <input type='submit' value='Lookup'> </form> """ print(html)
This new form includes a hidden
parameter that the
user does not see in the form. But it gets secretly passed in as
a query parameter. The reason for this is that HTTP is what is
called stateless. When responding to a request,
there is no way of knowing what a previous request may have
included. In other words, if you want the user to enter a
definition, you have to also specify the word that they are
defining — there is no way to "remember" it from the
previous request.
Now, if we actually want to process this user input to expand our set of definitions, we could handle that case like this:
dictionary["carrot"] = "an orange vegetable" if "definition" in inputs: # somehow save this word and definition. But how? elif "word" in inputs: user_text = inputs["word"].value
That if
statement checks if the user has
submitted definition
as a query parameter, but what
do we do if so? For this, think back
to this example from week 12 that
saves the dictionary into a JSON file and opens it up again. Or
better yet, this version of that code which adds some error
handling to
that: week11-hw-review-error-handling2.html
Merging those together, we end up with this code:
week15-cgi-with-file-save.html
Believe it or not, at this point you should hopefully have enough understanding of the main principles needed to create a networked version of your midterm project. Of course, the technical details will still be tricky.
For game option midterms: Examine these three pieces:
week15_processing_main.pyde. Create a new Processing sketch and make this the main tab.
week15_processing_network.py. Create
a new tab called network
and put all this code into that.
week15_server.py.html. Create
a new file in Atom, copy/paste this code into that, and
save it into a new cgi-bin
folder. Remember to adjust the permissions on this new
file with chmod
(see above). This
uses a thing called SqliteDict
to make it
even easier to write JSON to a file in a secure way. This
simplifies the code your CGI script will need to pass data
between two clients.
Those three files are using id
, x
,
and y
fields. You can modify each of those files to
work with the fields that you're using in your midterm. Then try
to integrate that code with your own. One important
trick: in order to run two Processing clients at the same
time, you'll have to Save-As one with a different name and run
them both. When testing this, I will Save-As whatever I'm
working on and call it something like "tmp", then delete it
after testing. I don't want to have two versions of the code
floating around because when working I'm making changes to one
and I dont want to replicate them in the other. Unfortunately
there isn't an easier way to run to versions of the same
Processing code right now.
For data visualization option midterms: For this you will need to expand the HTML form aspect of the above discussion, and link that to your midterm. You can also adapt week15_processing_main.pyde and week15_processing_network.py above to retrieve the data collected via your HTML form. You would then need to open the JSON data file from within Processing and use that data to render your project.