Code Toolkit: Python, Spring 2024

Week 11 — Wednesday, April 10 — Class notes

Table of contents

(The following links jump down to the various topics in these notes.)

  1. Review
    1. Python: interactive shell
    2. Python: writing & running files with VS Code
    3. Dictionaries
    4. Data structures inside data structures
    5. Homework solutions
  2. Data serialization
  3. File input and output
    1. Writing a file
    2. Reading a file
    3. Reading and writing, with error handling
  4. JavaScript Object Notation (JSON)
  5. JSON in Processing
  6. Homework and wrapping up

I. Review

Last week we talked about working with Python outside of the Processing framework.

a. Python: interactive shell

We started off by talking about how you can run small bits of Python code interactively in the Python shell. With this method, you type Python code one line at a time and see it get immediately evaluated. This is useful for testing syntax and small bits of logic before adding it to a computer program file.

To do this, you must first access the command line. There are several ways to do so. On Mac you can open the Terminal app, on Windows you can install cygwin, the Power Shell, or the bash shell available from git. But the method we will be using is to access the terminal through VS Code. Open a new VS Code window and from the top menu select Terminal > New Terminal. See last week's notes for screenshots.

Remember that the command line is a text-based interface that simply offers a different way of viewing the same files that you can see in Finder (Mac) or Explorer (Windows). Command line commands include pwd to show the current directory (folder), ls to list the contents of the current directory, or cd to change the current directory.

In my notes I'll show command line commands like this:

$ ls
in_class.py

Remember that $ signifies the command prompt: where you type commands. Your shell might use a different prompt, like # or %. You should not type the prompt, only the stuff that comes after. I include the prompt to show you that what follows is a commnd for you to type. Lines that I include from the terminal without the command prompt are the output of running that command. Sometimes what you see may be slightly different.

To access the Python shell, run the python3 command:

$ python3
Python 3.8.3 (v3.8.3:6f8c8320e9, May 13 2020, 16:29:34) 
Type "help", "copyright", "credits" or "license" for more information.
>>> 

Note that now the command prompt has changed to >>>. This signifies that you are in the interactive Python shell and you can now type any valid Python syntax.

If you ever see an error like NameError: name 'pwd' is not defined, that means you are trying to run a command line command in the Python shell. To resolve this, type ^D or exit() to exit the Python shell.

If you ever see an error like -bash: syntax error near unexpected token, that means you are trying to type Python code in the command line. To resolve this, run the Python shell and try again.

If you are in the interactive shell and you type Python code that starts a new block (i.e. any line of code ending with a colon :), the shell prompt will change to .... This indicates that you are inside a block and you should indent this line. You can type 2 spaces and then the next line. When you wish to end the block, press return to enter an empty line. For example:

>>> i = 0
>>> while i < 4:
...   print("Hello")
...   i = i + 1
... 
Hello
Hello
Hello
Hello

You can also include blocks within blocks, just like you can do in a Python code file:

>>> i = 0
>>> while i < 4:
...   if i % 2 == 0:
...     print("Even")
...   else:
...     print("Odd")
...   i = i + 1
... 
Even
Odd
Even
Odd

Note the indentation in the above examples. Two spaces at the start of the line when I start a new block, and two additional spaces (four total) if starting a new block within that block.

b. Python: writing & running files with VS Code

The interactive Python shell can be useful but can get tiresome for testing larger chunks of code, and it is not possible using this method to save your code to run later or to give to other people to run. For this, we need to save Python code in files. In our class, we'll be using VS Code for this purpose. See the class notes for last week for detailed instructions about how to open VS Code, how to create new code files, and how to run them.

c. Dictionaries

Lastly, we also talked about a new data structure called a dictionary.

Dictionaries are similar to lists in that they can store many values, but they are different in key ways. (Pun intended.)

Lists store values in sequential order: the order in which you add values to a list is the order in which they will be stored, and you access individual values by number, based on the position of the value in order. Here's a short review of the basic operations of lists:

>>> # create a new empty list
>>> my_list = []
>>> # create a new list with values
>>> my_list = ["a", "b", "c"]
>>> # add a new value to the end of a list
>>> my_list.append("d")
>>> # get the length of a list (number of items)
>>> len(my_list)
4
>>> # access a single item in a list
>>> my_list[3]
'd'
>>> # remember you can also index a list with a variable, useful in loops:
>>> i = 3
>>> my_list[i]
'd'
>>> # check if a value is in a list
>>> "a" in my_list
True
>>> "z" in my_list
False

Dictionaries, on the other hand, store items in a random order, not the order in which they were added. Items are stored as key-value pairs, and individual items are accessed not by number but by specifying the key which can be any arbitrary value.

Let's look at the similar operations of dictionaries:

>>> # create a new empty dictionary
>>> d = {}
>>> # create a dictionary with some key/value pairs, in this case two
>>> d = {"name": "Rory", "hair": "brown"}
>>> # add a new key/value pair to a list
>>> d["species"] = "human"
>>> # get the length of a dictionary (number of key/value pairs)
>>> len(d)
3
>>> # access a single item in a dictionary
>>> d["name"]
'Rory'
>>> # check if a key is in the dictionary
>>> "name" in d
True
>>> "age" in d
False 

d. Data structures inside data structures

We also looked at how you can put data structures inside other data structures, for example lists inside dictionaries or dictionaries inside lists. Here's an example:

>>> # make two dictionaries:
>>> person1 = {"name": "Rory", "hair": "brown", "species": "human"}
>>> person2 = {"name": "Gritty", "hair": "orange", "species": "monster"}
>>> # add a list to each one:
>>> person1["likes"] = [ "computers", "media", "teaching"]
>>> person2["likes"] = [ "hockey", "Philadelphia", "pranks"]
>>> # put both dictionaries in a list:
>>> personel_file = [ person1, person2 ]
>>> # display the list:
>>> personel_file
[{'name': 'Rory', 'hair': 'brown', 'species': 'human', 'likes':
['computers', 'media', 'teaching']},
{'name': 'Gritty', 'hair': 'orange', 'species': 'monster', 'likes':
['hockey', 'Philadelphia', 'pranks']}]
>>> # access the second person's name:
>>> personel_file[1]["name"]
'Gritty'
>>> # access the second person's first like:
>>> personel_file[1]["likes"][0]
'hockey'

Pay close attention to how you index data structures within data structures. This is not a contrived example but actually a very common practice that is very useful in a lot of cases.

e. Homework review

Lastly, we reviewed some solutions to the homework from last week.

Here's the most basic implementation of part 1:

import random
number = random.randint(1,10)
name = input("Enter your name: ")
guess = int( input("Guess a number between 1 and 10: ") )
if number == guess:
    print("Congrats!")
else:
    print("Sorry, that's wrong.")

This code picks a random number between 1 and 10 and stores that in the variable number, then it prompts the user for their name and stores that in the variable name, then it prompts the user for a guess, converts that to a number with int() and stores that in the variable guess. Then it checks if the user's guess is equal to the randomly picked number and prints a message accordingly.

But this only allows one guess. To allow the user multiple guesses, there are a couple options. To offer the user an indefinite number of guesses, we can intentionally create an infinite loop like this: (Note the addition of the indentation.)

import random
number = random.randint(1,10)
name = input("Enter your name: ")
while True:
    guess = int( input("Guess a number between 1 and 10: ") )
    if number == guess:
        print("Congrats!")
        break
    else:
        print("Sorry, that's wrong.")

Since True is always, well, true, this while loop will repeat forever. But by using the break command, we can exit out of this loop. In this case, we use break if the user guesses the number correctly.

We can also offer the user a little bit of help if they guess wrong by replacing our "Sorry" message in the else case with this if / else statement:

import random
number = random.randint(1,10)
name = input("Enter your name: ")
while True:
    guess = int( input("Guess a number between 1 and 10: ") )
    if number == guess:
        print("Congrats!")
        break
    else:
        if guess < number:
            print("Too low.")
        else:
            print("Too high.")

If you'd rather limit the number of guesses instead of giving an unlimited amount, you can create a while loop with a variable, similar to how we've been doing all semester:

import random
number = random.randint(1,10)
name = input("Enter your name: ")
i = 0
while i < 10:
    guess = int( input("Guess a number between 1 and 10: ") )
    if number == guess:
        print("Congrats!")
        break
    else:
        if guess < number:
            print("Too low.")
        else:
            print("Too high.")
    i = i + 1

And if you want to tell the user how many guesses they have left you could add something like this:

import random
number = random.randint(1,10)
name = input("Enter your name: ")
print("You have 10 guesses")
i = 0
while i < 10:
    print("Guess number #" + str(i) )
    guess = int( input("Guess a number between 1 and 10: ") )
    if number == guess:
        print("Congrats!")
        break
    else:
        if guess < number:
            print("Too low.")
        else:
            print("Too high.")
    i = i + 1

Note that I'm using str() to convert the number i to a string so that I can concatenate it with the + operator. str() is like the opposite of int(). The latter converts a text number into a numerical number, and the former converts a numerical number into the text character of that number.

Here's a solution for part two. This creates a new empty dictionary, adds some definitions, asks a user for a word, checks if that word is in the dictionary, and if so displays the definition of that word.

dictionary = {}
dictionary["carrot"] = "An orange vegetable"
dictionary["orange"] = "An orange fruit"
dictionary["cherry"] = "A red tree fruit"
dictionary["strawberry"] = "A red ground fruit"

while True:
    word = input("Enter a word (or 'exit' to exit): ")
    if word == 'exit':
        print("Bye")
        break
    if word in dictionary:
        print( dictionary[word] )
    else:
        print("Sorry, I don't know that word.")

Now to allow the user to add new definitions, add the following:

dictionary = {}
dictionary["carrot"] = "An orange vegetable"
dictionary["orange"] = "An orange fruit"
dictionary["cherry"] = "A red tree fruit"
dictionary["strawberry"] = "A red ground fruit"

while True:
    word = input("Enter a word (or 'exit' to exit): ")
    if word == 'exit':
        print("Bye")
        break
    if word in dictionary:
        print( dictionary[word] )
    else:
        defn = input("I don't know that word. Please define it: ")
        dictionary[word.lower()] = defn

Of course, the sad thing about this code is that if a user were to add a bunch of new definitions, when they exit the program, all those definitions would be lost. What would be nice is if the program could save all the definitions that have been added, and then load them up next time the program runs. That brings us to the new topic for today.

(jump back up to table of contents)

II. Data serialization

Data serialization is a term for the process of converting variables and data structures into a kind of format that can be imported and exported from your code. You might want to do this to share data with another computer program, or with the same computer program at a later time. In a literal sense, to serialize data means to convert it into a series, such as textual characters or binary digits. In the cases we'll be looking at, we'll be using a text-based serialization format, so the conversion is to a series of characters. But other serialization formats might convert data to a series of bits, 1 and 0.

III. File input and output

We've talked about all sorts of various forms of input and output this semester. For inputs, in Processing we nused key presss, mouse presses, and mouse movement, while in Python outside of Processing we asked the user for textual input on the command line using the input() command. For outputs we have primarily drawn things to the screen (in Processing) or written text to the command line (outside of Processing).

Today we'll be looking at a different kind of input and output: reading and writing to files. We have already done this in one sense. In Processing, we opened image files and displayed them on screen, which required instructing Processing to read them. Today we will primarily be focused on plain text files, and we will be both reading and writing to them.

The main command in Python for opening files for reading and writing is open(). (Full official documentation.)

(jump back up to table of contents)

a. Writing a file

Let's start with opening a file and writing some text to it.

The open() command takes two arguments: the name of the file you will be opening, and a flag that indicates whether you are opening the file to read, write, or both. Let's start by writing a file. To do that we specify the flag as a "w" in the second argument:

f = open("FILENAME","w")

Important warning: Opening a file with the "w" argument zeroes out the file if it already exists. You should be very careful about this. If you have an important file with any important content, and you open the file in this way, Python will wipe out the file. I would advise you to make copies of any files that you are trying to work with in Python and work on the copies. Only when you're certain that your Python code is handling your files in the way that you want should you run them on data that is valuable to you or others.

The f here then becomes a file object. It is a variable that I can use in my Python code as a way to access this open file. The symbol f here is not a special keyword — it is just a variable name and I could have called it spaghetti or anything else, but I used f to signify "file".

After you have opened a file, you can save data into it with the write() command. (Full documentation.)

Let's make a simple example in which we prompt the user for input three times, setting each bit of input to a separate variable, and then write those three variables to a file:

name = input("Your name: ")
hair_color = input("Your hair color: ")
species = input("Your species: ")

f = open("data.txt","w")

f.write(name)
f.write("\n")
f.write(hair_color)
f.write("\n")
f.write(species)
f.write("\n")

f.close()

In this code, the file that I am opening is named data.txt. It is common to use the .txt file extension for this work because we are writing plain text, and that tells your operating how to handle this file — i.e. what programs to use to open it, etc.

Note that I am also using the "\n" character here. That is the symbol for a new line. It is how I tell Python to print a line break into this file. That symbol is fairly universally used across many different types of programming languages and text file encodings.

(jump back up to table of contents)

b. Reading a file

Now that we have a file with some textual content in it, let's see how to read it. The command to do this is (you guessed it) read(). (Documentation.)

Here is a basic example to open a file, read it in its entirety, and print that to the command line:

f = open("data.txt","r")

contents = f.read()
print(contents)

Note that now I am specifying "r" as the second argument to read() here. This is actually the default value to this optional argument, so I could have written this code like this:

f = open("data.txt")

contents = f.read()
print(contents)

But this example isn't too interesting. Let's assume we have the text file that we created by writing text in the example above, and try to read that in some meaningful way:

f = open("data.txt","r")

contents = f.read()
print(contents)

name = f.readline()
hair_color = f.readline()
species = f.readline()

print(name)
print(hair_color)
print(species)

This code reads the file line-by-line, saves the content of each line into its own variable, and then prints those variables.

Here, instead of using read(), which reads the entire file, I am using readline(), which reads the file line-by-line. Because I know that the file we are reading is the one that we created by writing above, I know that I can read those three lines and save them into corresponding variables.

Note: there is nothing special "linking" the terms name, hair_color, and special from the previous example to this one. These are just variable names and I could have called them anything here, like a, b, c. It is only because I know the format of this file (since we created it above) that I can presume reading in these three lines of this file are the values corresponding to these different variables in my code.

(jump back up to table of contents)

c. Reading and writing, with error handling

What if I want to read a file, but I am not certain in advance whether it will have been created yet or not?

There are different ways to check for file existence in Python, but one way that I like involves a new technique called error handling, which is a very useful technique for you to understand in general, beyond just working with files.

In Python, you can try to prevent errors from crashing your program by using a new syntax called exceptions and exception handling.

The syntax looks like this:

try:
    # code that may generate an error
except SOME-ERROR:
    # code that will be called after the error occurs,
    # potentially preventing the error from crashing your program
    

The SOME-ERROR above needs to be the specific type of error that you are trying to catch. In this case, if the code in the try block generates that specific error, then the code in this except block will be run after the error is thrown. This allows you to write code that hopefully handles the error in some way, preventing it from propagating further throughout your program and eventually crashing it.

In this case, we will use a try-except block to try to open a file. If that file does not exist, this code will generate a Python error called a FileNotFoundError. We will then handle this error by assuming that it was thrown because the file does not exist yet, and setting some default value and moving on.

The general pattern looks like this:

try:
    f = open("FILENAME","r")
    file_contents = f.read()
    f.close()
except FileNotFoundError:
    print("File does not exist")
    file_contents = "n/a" # This can be any default value you wish

This code tries to open a file named FILENAME for reading, and reading its contents into a variable called file_contents. However, if that file does not exist, the open() command will throw and exception, triggering our except block. That code will print an output message to the user, set a default value for file_contents, and the code should continue running without problem.

Let's see how that might apply to the write / read example we were working on above:

try:
    f = open("data.txt","r")
    name = f.readline()
    hair_color = f.readline()
    species = f.readline()

    print(name)
    print(hair_color)
    print(species)

    f.close()
except FileNotFoundError:
    print("File does not exist yet")
    f = open("data.txt","w")

    name = input("Your name: ")
    hair_color = input("Your hair color: ")
    species = input("Your species: ")

    f.write(name)
    f.write("\n")
    f.write(hair_color)
    f.write("\n")
    f.write(species)
    f.write("\n")

    f.close()

This code uses a try-except block to merge both reading and writing into one program. First it tries to open the file for reading. If the file exists, it will read it line-by-line, saving the values into variables, and then printing out those variables. But if the file does not exist, the code will throw a FileNotFoundError. This will be handled by opening the file for writing (which will create it since it does not yet exist), telling the user the file does not exist, prompting the user for three values, and then saving those values into the file with write().

The next time this program is run, it should find the file that was created last time, and successfully open it for reading.

(jump back up to table of contents)

IV. JavaScript Object Notation (JSON)

In this example, we saved three variables to a file, and then read them in. This is a pretty simple quantity and structure of data to save and access later. But what if we want to access much more data, or data that is stored in a more complicated data structure? For this we can use a common data serialization format. In our case, we will use JSON.

JSON, which stands for JavaScript Object Notation is a common interoperable format for exchanging data between different programs. As the name implies, JSON was originally created for use in JavaScript programs, but has since become a very common format for many programming languages. JSON has become a kind of interoperable standard. It can be used in many different programming languages to save data in files, send data across a network, or other things. It can even be used to share data between computer programs written in different languages, as long as those languages have a way to read and write the JSON format.

Working with JSON in Python is quite simple. Python has a library (called json) for doing just that. Get started by importing this library:

import json

Now you can use the two main functions of this library: json.loads() (for "load string") to read in JSON code and automatically convert it to Python data structures, and json.dumps() (for "dump string") which will take an arbitrarily complicated Python data structure and generate JSON output for it.

Here is a simple output example:

my_list = [10, 20, 30]
print( json.dumps(my_list) )

This example makes a list called my_list, then generates JSON to represent that, and prints that JSON. As you can see here, JSON looks pretty much like the way Python displays textual representation of data structures by default. That is because Python was built to do this, because the textual representation conventions of JSON are so ubiquitous at this point.

Now let's try a JSON input example. Note that here I am making a variable called data_string that holds a string (text in quotes) that is a textual representation of a list, not a list itself. I am then using json.loads() to "convert" that string into an actual Python list called my_list, and then printing that list.

data_string = "[ 10, 20, 30 ]"
my_list = json.loads(data_string)
print(my_list)

The output here is not that exciting, but the principle being demonstrated here is quite powerful. This means that you can take a string that represents any arbitrarily complicated data structure, and convert that into actual Python data structure for use in your program.

Let's see an example of how you might actually use this in a way that is useful by saving and loading a more complicated data structure that holds some data you might actually use in a project.

V. JSON in Processing

In Processing, we can use the above techniques (file reading and writing, and data structure serialization with JSON) to save the values of variables that we are working with in a sketch, and load them the next time the sketch runs.

Here is a very simple example that does that:

import json

f = open("data.json","r")

file_contents = f.read()

data = json.loads(file_contents)

def setup():
    size(800,800)

def draw():
    background(255)
    
    i = 0
    while i < len(data):
        d = data[i]
        fill( d["r"], d["g"], d["b"] )
        ellipse( d["x"], d["y"], 50,50 )
        i = i + 1

Note that in order for this code to run, the sketch that includes it must be saved, and inside the sketch folder, there must be a file named "data.json" that includes the following JSON content:

[{"x": 200, "y": 200, "r": 255, "g": 0, "b": 0}, {"x": 600, "y": 200, "r": 0, "g": 255, "b": 0},
{"x": 600, "y": 600, "r": 0, "g": 0, "b": 255}, {"x": 200, "y": 600, "r": 0, "g": 255, "b": 255}]

For a slightly more interesting example, have a look at the following code:

This code uses a new Processing block that we have not talked about yet: def stop(). This creates a block that will be run whenever your user exits your program. In this case, that block is being used to save some JSON data when the user exists.

This code brings together all of the above discussion. It uses a try except block that tries to read a data file, and if the data file does not exist, it creates some default values. When the code is running, it allows the user to click to add new "creatures" as we saw in the Week 7 review that talked about how to use lists to accomplish the Week 6 HW. Then, when the user exits the program, it saves the data structure of "creatures" as a JSON file that will be opened and loaded next time the program runs.

Note that the try except block here uses a IOError, which is different from the exception that we were using above. This is because we are working with a version of Processing that still uses an older version of Python, in which the exception name is slightly different.

In order to make sure that this code will work for you, create a new sketch, copy/paste the code into it, and make sure that the sketch is saved — otherwise Processing will not be able to save the JSON data file.

(jump back up to table of contents)

VI. Homework and wrapping up

The homework for this week.