(The following links jump down to the various topics in these notes.)
Last week we talked about working with Python outside of the Processing framework.
We started off by talking about how you can run small bits of Python code interactively in the Python shell. With this method, you type Python code one line at a time and see it get immediately evaluated. This is useful for testing syntax and small bits of logic before adding it to a computer program file.
To do this, you must first access the command line. There are several ways to do so. On Mac you can open the Terminal app, on Windows you can install cygwin, the Power Shell, or the bash shell available from git. But the method we will be using is to access the terminal through VS Code. Open a new VS Code window and from the top menu select Terminal > New Terminal. See last week's notes for screenshots.
Remember that the command line is a text-based
interface that simply offers a different way of viewing the same
files that you can see in Finder (Mac) or Explorer
(Windows). Command line commands
include pwd
to show the current
directory (folder), ls
to list the
contents of the current directory,
or cd
to change the current
directory.
In my notes I'll show command line commands like this:
$ ls in_class.py
Remember that $
signifies
the command prompt: where you type
commands. Your shell might use a different prompt,
like #
or %
. You should not type the prompt,
only the stuff that comes after. I include the prompt to show
you that what follows is a commnd for you to type. Lines that I
include from the terminal without the command prompt are the
output of running that command. Sometimes what you see may be
slightly different.
To access the Python shell, run
the python3
command:
$ python3 Python 3.8.3 (v3.8.3:6f8c8320e9, May 13 2020, 16:29:34) Type "help", "copyright", "credits" or "license" for more information. >>>
Note that now the command prompt has changed
to >>>
. This signifies that
you are in the interactive Python shell and you can now type any
valid Python syntax.
If you ever see an error like NameError: name
'pwd' is not defined
, that means you are trying to run
a command line command in the Python shell. To resolve this,
type ^D or exit()
to exit the Python
shell.
If you ever see an error like -bash: syntax
error near unexpected token
, that means you are trying to
type Python code in the command line. To resolve this, run the
Python shell and try again.
If you are in the interactive shell and you type Python code
that starts a new block (i.e. any line of code ending with a
colon :
), the shell prompt will change
to ...
. This indicates that you are
inside a block and you should indent this line. You can type 2
spaces and then the next line. When you wish to end the block,
press return to enter an empty line. For example:
>>> i = 0 >>> while i < 4: ... print("Hello") ... i = i + 1 ... Hello Hello Hello Hello
You can also include blocks within blocks, just like you can do in a Python code file:
>>> i = 0 >>> while i < 4: ... if i % 2 == 0: ... print("Even") ... else: ... print("Odd") ... i = i + 1 ... Even Odd Even Odd
Note the indentation in the above examples. Two spaces at the start of the line when I start a new block, and two additional spaces (four total) if starting a new block within that block.
The interactive Python shell can be useful but can get tiresome for testing larger chunks of code, and it is not possible using this method to save your code to run later or to give to other people to run. For this, we need to save Python code in files. In our class, we'll be using VS Code for this purpose. See the class notes for last week for detailed instructions about how to open VS Code, how to create new code files, and how to run them.
Lastly, we also talked about a new data structure called a dictionary.
Dictionaries are similar to lists in that they can store many values, but they are different in key ways. (Pun intended.)
Lists store values in sequential order: the order in which you add values to a list is the order in which they will be stored, and you access individual values by number, based on the position of the value in order. Here's a short review of the basic operations of lists:
>>> # create a new empty list >>> my_list = [] >>> # create a new list with values >>> my_list = ["a", "b", "c"] >>> # add a new value to the end of a list >>> my_list.append("d") >>> # get the length of a list (number of items) >>> len(my_list) 4 >>> # access a single item in a list >>> my_list[3] 'd' >>> # remember you can also index a list with a variable, useful in loops: >>> i = 3 >>> my_list[i] 'd' >>> # check if a value is in a list >>> "a" in my_list True >>> "z" in my_list False
Dictionaries, on the other hand, store items in
a random order, not the order in which they were added. Items
are stored as key-value pairs
, and individual items
are accessed not by number but by specifying
the key
which can be any arbitrary value.
Let's look at the similar operations of dictionaries:
>>> # create a new empty dictionary >>> d = {} >>> # create a dictionary with some key/value pairs, in this case two >>> d = {"name": "Rory", "hair": "brown"} >>> # add a new key/value pair to a list >>> d["species"] = "human" >>> # get the length of a dictionary (number of key/value pairs) >>> len(d) 3 >>> # access a single item in a dictionary >>> d["name"] 'Rory' >>> # check if a key is in the dictionary >>> "name" in d True >>> "age" in d False
We also looked at how you can put data structures inside other data structures, for example lists inside dictionaries or dictionaries inside lists. Here's an example:
>>> # make two dictionaries:
>>> person1 = {"name": "Rory", "hair": "brown", "species": "human"}
>>> person2 = {"name": "Gritty", "hair": "orange", "species": "monster"}
>>> # add a list to each one:
>>> person1["likes"] = [ "computers", "media", "teaching"]
>>> person2["likes"] = [ "hockey", "Philadelphia", "pranks"]
>>> # put both dictionaries in a list:
>>> personel_file = [ person1, person2 ]
>>> # display the list:
>>> personel_file
[{'name': 'Rory', 'hair': 'brown', 'species': 'human', 'likes':
['computers', 'media', 'teaching']},
{'name': 'Gritty', 'hair': 'orange', 'species': 'monster', 'likes':
['hockey', 'Philadelphia', 'pranks']}]
>>> # access the second person's name:
>>> personel_file[1]["name"]
'Gritty'
>>> # access the second person's first like:
>>> personel_file[1]["likes"][0]
'hockey'
Pay close attention to how you index data structures within data structures. This is not a contrived example but actually a very common practice that is very useful in a lot of cases.
Lastly, we reviewed some solutions to the homework from last week.
Here's the most basic implementation of part 1:
import random number = random.randint(1,10) name = input("Enter your name: ") guess = int( input("Guess a number between 1 and 10: ") ) if number == guess: print("Congrats!") else: print("Sorry, that's wrong.")
This code picks a random number between 1 and 10 and stores that
in the variable number
, then it prompts the user
for their name and stores that in the
variable name
, then it prompts the user for a
guess, converts that to a number with int()
and
stores that in the variable guess
. Then it checks
if the user's guess is equal to the randomly picked number and
prints a message accordingly.
But this only allows one guess. To allow the user multiple guesses, there are a couple options. To offer the user an indefinite number of guesses, we can intentionally create an infinite loop like this: (Note the addition of the indentation.)
import random number = random.randint(1,10) name = input("Enter your name: ") while True: guess = int( input("Guess a number between 1 and 10: ") ) if number == guess: print("Congrats!") break else: print("Sorry, that's wrong.")
Since True
is always, well, true,
this while
loop will repeat forever. But by using
the break
command, we can exit out of this loop. In
this case, we use break
if the user guesses the
number correctly.
We can also offer the user a little bit of help if they guess
wrong by replacing our "Sorry" message in the else
case with this if
/ else
statement:
import random number = random.randint(1,10) name = input("Enter your name: ") while True: guess = int( input("Guess a number between 1 and 10: ") ) if number == guess: print("Congrats!") break else: if guess < number: print("Too low.") else: print("Too high.")
If you'd rather limit the number of guesses instead of giving an
unlimited amount, you can create a while
loop with
a variable, similar to how we've been doing all semester:
import random number = random.randint(1,10) name = input("Enter your name: ") i = 0 while i < 10: guess = int( input("Guess a number between 1 and 10: ") ) if number == guess: print("Congrats!") break else: if guess < number: print("Too low.") else: print("Too high.") i = i + 1
And if you want to tell the user how many guesses they have left you could add something like this:
import random number = random.randint(1,10) name = input("Enter your name: ") print("You have 10 guesses") i = 0 while i < 10: print("Guess number #" + str(i) ) guess = int( input("Guess a number between 1 and 10: ") ) if number == guess: print("Congrats!") break else: if guess < number: print("Too low.") else: print("Too high.") i = i + 1
Note that I'm using str()
to convert the
number i
to a string so that I
can concatenate it with the +
operator. str()
is like the opposite
of int()
. The latter converts a text number into a
numerical number, and the former converts a numerical number
into the text character of that number.
Here's a solution for part two. This creates a new empty dictionary, adds some definitions, asks a user for a word, checks if that word is in the dictionary, and if so displays the definition of that word.
dictionary = {} dictionary["carrot"] = "An orange vegetable" dictionary["orange"] = "An orange fruit" dictionary["cherry"] = "A red tree fruit" dictionary["strawberry"] = "A red ground fruit" while True: word = input("Enter a word (or 'exit' to exit): ") if word == 'exit': print("Bye") break if word in dictionary: print( dictionary[word] ) else: print("Sorry, I don't know that word.")
Now to allow the user to add new definitions, add the following:
dictionary = {} dictionary["carrot"] = "An orange vegetable" dictionary["orange"] = "An orange fruit" dictionary["cherry"] = "A red tree fruit" dictionary["strawberry"] = "A red ground fruit" while True: word = input("Enter a word (or 'exit' to exit): ") if word == 'exit': print("Bye") break if word in dictionary: print( dictionary[word] ) else: defn = input("I don't know that word. Please define it: ") dictionary[word.lower()] = defn
Of course, the sad thing about this code is that if a user were to add a bunch of new definitions, when they exit the program, all those definitions would be lost. What would be nice is if the program could save all the definitions that have been added, and then load them up next time the program runs. That brings us to the new topic for today.
(jump back up to table of contents)
Data serialization is a term for the process of
converting variables and data structures into a kind of format
that can be imported and exported from your code. You might want
to do this to share data with another computer program, or with
the same computer program at a later time. In a literal sense,
to serialize data means to convert it into a
series, such as textual characters or binary digits. In the
cases we'll be looking at, we'll be using a
text-based serialization format, so the
conversion is to a series of characters. But other serialization
formats might convert data to a series of bits, 1
and 0
.
We've talked about all sorts of various forms of input and
output this semester. For inputs, in Processing we nused key
presss, mouse presses, and mouse movement, while in Python
outside of Processing we asked the user for textual input on the
command line using the input()
command. For outputs
we have primarily drawn things to the screen (in Processing) or
written text to the command line (outside of Processing).
Today we'll be looking at a different kind of input and output: reading and writing to files. We have already done this in one sense. In Processing, we opened image files and displayed them on screen, which required instructing Processing to read them. Today we will primarily be focused on plain text files, and we will be both reading and writing to them.
The main command in Python for opening files for reading and
writing
is open()
. (Full
official documentation.)
Let's start with opening a file and writing some text to it.
The open()
command takes two arguments: the name of
the file you will be opening, and a flag that indicates whether
you are opening the file to read, write, or both. Let's
start by writing a file. To do that we specify the flag as
a "w"
in the second argument:
f = open("FILENAME","w")
Important warning: Opening a file with
the "w"
argument zeroes out the file if it already
exists. You should be very careful about this. If you have an
important file with any important content, and you open the file
in this way, Python will wipe out the file. I would advise you
to make copies of any files that you are trying to work with in
Python and work on the copies. Only when you're certain
that your Python code is handling your files in the way that you
want should you run them on data that is valuable to you or
others.
The f
here then becomes a file
object. It is a variable that I can use in my Python
code as a way to access this open file. The
symbol f
here is not a special keyword — it
is just a variable name and I could have called
it spaghetti
or anything else, but I
used f
to signify "file".
After you have opened a file, you can save data into it with
the write()
command.
(Full
documentation.)
Let's make a simple example in which we prompt the user for input three times, setting each bit of input to a separate variable, and then write those three variables to a file:
name = input("Your name: ") hair_color = input("Your hair color: ") species = input("Your species: ") f = open("data.txt","w") f.write(name) f.write("\n") f.write(hair_color) f.write("\n") f.write(species) f.write("\n") f.close()
In this code, the file that I am opening is
named data.txt
. It is common to use
the .txt
file extension for this work
because we are writing plain text, and that tells your operating
how to handle this file — i.e. what programs to use to
open it, etc.
Note that I am also using the "\n"
character
here. That is the symbol for a new line. It is how I tell Python
to print a line break into this file. That symbol is fairly
universally used across many different types of programming
languages and text file encodings.
Now that we have a file with some textual content in it,
let's see how to read it. The command to do this is (you
guessed
it) read()
. (Documentation.)
Here is a basic example to open a file, read it in its entirety, and print that to the command line:
f = open("data.txt","r") contents = f.read() print(contents)
Note that now I am specifying "r"
as the second
argument to read()
here. This is actually the
default value to this optional argument, so I could have written
this code like this:
f = open("data.txt")
contents = f.read()
print(contents)
But this example isn't too interesting. Let's assume we have the text file that we created by writing text in the example above, and try to read that in some meaningful way:
f = open("data.txt","r") contents = f.read() print(contents) name = f.readline() hair_color = f.readline() species = f.readline() print(name) print(hair_color) print(species)
This code reads the file line-by-line, saves the content of each line into its own variable, and then prints those variables.
Here, instead of using read()
, which reads the
entire file, I am using readline()
, which reads the
file line-by-line. Because I know that the file we are reading
is the one that we created by writing above, I know that I can
read those three lines and save them into corresponding
variables.
Note: there is nothing special "linking" the
terms name
, hair_color
,
and special
from the previous example to this
one. These are just variable names and I could have called them
anything here,
like a
, b
, c
. It is only
because I know the format of this file (since we created it
above) that I can presume reading in these three lines of this
file are the values corresponding to these different variables
in my code.
What if I want to read a file, but I am not certain in advance whether it will have been created yet or not?
There are different ways to check for file existence in Python, but one way that I like involves a new technique called error handling, which is a very useful technique for you to understand in general, beyond just working with files.
In Python, you can try to prevent errors from crashing your program by using a new syntax called exceptions and exception handling.
The syntax looks like this:
try: # code that may generate an error except SOME-ERROR: # code that will be called after the error occurs, # potentially preventing the error from crashing your program
The SOME-ERROR
above needs to be the specific type
of error that you are trying to catch. In this
case, if the code in the try
block generates that
specific error, then the code in this except
block
will be run after the error is thrown. This allows you to write
code that hopefully handles the error in some way, preventing it
from propagating further throughout your
program and eventually crashing it.
In this case, we will use a try-except block to
try to open a file. If that file does not exist, this code will
generate a Python error called
a FileNotFoundError
. We will
then handle this error by assuming that it was
thrown because the file does not exist yet, and setting some
default value and moving on.
The general pattern looks like this:
try: f = open("FILENAME","r") file_contents = f.read() f.close() except FileNotFoundError: print("File does not exist") file_contents = "n/a" # This can be any default value you wish
This code tries to open a file named FILENAME
for
reading, and reading its contents into a variable
called file_contents
. However, if that file does
not exist, the open()
command
will throw and exception,
triggering our except
block. That code will print
an output message to the user, set a default value
for file_contents
, and the code should continue
running without problem.
Let's see how that might apply to the write / read example we were working on above:
try: f = open("data.txt","r") name = f.readline() hair_color = f.readline() species = f.readline() print(name) print(hair_color) print(species) f.close() except FileNotFoundError: print("File does not exist yet") f = open("data.txt","w") name = input("Your name: ") hair_color = input("Your hair color: ") species = input("Your species: ") f.write(name) f.write("\n") f.write(hair_color) f.write("\n") f.write(species) f.write("\n") f.close()
This code uses a try-except block to merge both
reading and writing into one program. First it tries to open the
file for reading. If the file exists, it will read it
line-by-line, saving the values into variables, and then
printing out those variables. But if the file does not exist,
the code will throw a FileNotFoundError
. This will
be handled by opening the file for writing (which will create it
since it does not yet exist), telling the user the file
does not exist, prompting the user for three values, and then
saving those values into the file with write()
.
The next time this program is run, it should find the file that was created last time, and successfully open it for reading.
(jump back up to table of contents)In this example, we saved three variables to a file, and then read them in. This is a pretty simple quantity and structure of data to save and access later. But what if we want to access much more data, or data that is stored in a more complicated data structure? For this we can use a common data serialization format. In our case, we will use JSON.
JSON, which stands for JavaScript Object Notation is a common interoperable format for exchanging data between different programs. As the name implies, JSON was originally created for use in JavaScript programs, but has since become a very common format for many programming languages. JSON has become a kind of interoperable standard. It can be used in many different programming languages to save data in files, send data across a network, or other things. It can even be used to share data between computer programs written in different languages, as long as those languages have a way to read and write the JSON format.
Working with JSON in Python is quite simple. Python has a
library (called json
) for doing just that. Get
started by importing this library:
import json
Now you can use the two main functions of this
library: json.loads()
(for "load string") to read
in JSON code and automatically convert it to Python data
structures, and json.dumps()
(for "dump string")
which will take an arbitrarily complicated Python data structure
and generate JSON output for it.
Here is a simple output example:
my_list = [10, 20, 30] print( json.dumps(my_list) )
This example makes a list called my_list
, then
generates JSON to represent that, and prints that JSON. As you
can see here, JSON looks pretty much like the way Python
displays textual representation of data structures by
default. That is because Python was built to do this, because
the textual representation conventions of JSON are so ubiquitous
at this point.
Now let's try a JSON input example. Note that here I am
making a variable called data_string
that holds
a string (text in quotes) that is a textual
representation of a list, not a list itself. I am then
using json.loads()
to "convert" that string into an
actual Python list called my_list
, and then
printing that list.
data_string = "[ 10, 20, 30 ]" my_list = json.loads(data_string) print(my_list)
The output here is not that exciting, but the principle being demonstrated here is quite powerful. This means that you can take a string that represents any arbitrarily complicated data structure, and convert that into actual Python data structure for use in your program.
Let's see an example of how you might actually use this in a way that is useful by saving and loading a more complicated data structure that holds some data you might actually use in a project.
In Processing, we can use the above techniques (file reading and writing, and data structure serialization with JSON) to save the values of variables that we are working with in a sketch, and load them the next time the sketch runs.
Here is a very simple example that does that:
import json f = open("data.json","r") file_contents = f.read() data = json.loads(file_contents) def setup(): size(800,800) def draw(): background(255) i = 0 while i < len(data): d = data[i] fill( d["r"], d["g"], d["b"] ) ellipse( d["x"], d["y"], 50,50 ) i = i + 1
Note that in order for this code to run, the sketch that
includes it must be saved, and inside the sketch folder, there
must be a file named "data.json"
that includes the
following JSON content:
[{"x": 200, "y": 200, "r": 255, "g": 0, "b": 0}, {"x": 600, "y": 200, "r": 0, "g": 255, "b": 0}, {"x": 600, "y": 600, "r": 0, "g": 0, "b": 255}, {"x": 200, "y": 600, "r": 0, "g": 255, "b": 255}]
For a slightly more interesting example, have a look at the following code:
This code uses a new Processing block that we have not talked
about yet: def stop()
. This creates a block that
will be run whenever your user exits your program. In this case,
that block is being used to save some JSON data when the user
exists.
This code brings together all of the above discussion. It uses a try except block that tries to read a data file, and if the data file does not exist, it creates some default values. When the code is running, it allows the user to click to add new "creatures" as we saw in the Week 7 review that talked about how to use lists to accomplish the Week 6 HW. Then, when the user exits the program, it saves the data structure of "creatures" as a JSON file that will be opened and loaded next time the program runs.
Note that the try except block here uses
a IOError
, which is different from the exception
that we were using above. This is because we are working with a
version of Processing that still uses an older version of
Python, in which the exception name is slightly different.
In order to make sure that this code will work for you, create a new sketch, copy/paste the code into it, and make sure that the sketch is saved — otherwise Processing will not be able to save the JSON data file.
(jump back up to table of contents)