Code Toolkit: Python, Fall 2020

Week 11 — Wednesday, November 11 — Class notes

Data serialization: for storage and communication

This week we're going to talk about data serialization, which is a process for translating data in the form of variables and data structures into a format that can be stored for later use (like in a file), or transmitted to others (like across a network).

Serialization can be accomplished with many different types of data and file formats. The serialization format that we will be using today is JSON (usually pronounced JAY-sawn), which stands for JavaScript object notation. This is a set of rules for formatting a file, and is a very common serialization format today. JSON comes out of JavaScript, although it is a format that is used in the context of many programming languages. Most programming languages in use today have libraries to make it easier to read and write data in JSON format.

Those of you who did a data visualization for your midterm have already worked with data serialization, although we did not refer to it by that name when we were working on your code. Everyone who did data visualization this semester worked with CSV data which stands for comma-separated values. This is an older format that existed long before JSON, and is still useful in many cases. A common use case for CSV is to represent tabular data, like a spreadsheet. A CSV-formatted file looks like this:

  Rory,human,brown,computers
  Gritty,monster,orange,hockey
  Fido,dog,beige,running

As you can see, and as the name implies, CSV is a data format that simply lists rows of data, in which each value is separated by commas. Sometimes CSV files will contain a header row indicating what each of these values represents, like the first row of a spreadsheet. In this case, that might look like this:

  name,species,hair color,favorite thing

Which data format you decide to use when you need to serialize data depends on the structure of the data, and the operations you need to do on it. Hopefully by the end of today you will have a sense of what JSON is good for, and when you might elect to use this format.

But before we get to serialization, I want to introduce a new kind of data structure. In the process, I'll introduce the command line and show how you can run Python code in this new mode. And we'll end the class today with an example that shows how you can access serialized JSON data via a network with a very simple API to create a shared, interactive experience.

(The following links jump down to the various topics in these notes.)

Dictionaries, an intro
The command line
Dictionaries, syntax and usage
Accessing JSON via a network

Dictionaries, an intro

So far we have talked about one kind of data structure: lists (week 7). Lists are good at storing values in a fixed, sequential order. Remember that you create a list with empty square brackets (x = []), then you add values to it using the append() command which are then stored in the order in which they were added, and you reference those values with a numerical index inside square brackets (x[0], x[3], etc.)

During our discussion of lists, I mentioned that Python offers many different data structures. Which data structure to use in a given situation depends on the structure of the data that you are trying to model, and the operations that you will need to do with it. Determining which data structure(s) to use to solve a given problem can often be a question of some subjective debate, and there is often not one clear right answer.

The data structure we'll learn about today is the dictionary. While lists were sequential (added to in order, and accessed by numerical index), dictionaries are unordered, and they are accessed by an index of any kind of value (number, string, etc.) A dictionary contains a collection of mappings, also referred to as key-value pairs. Think of a key-value pair as like a pairing of a word and its definition — this is where the dictionary gets its name. Dictionaries do not preserve the order in which items are added to them, and instead are accessed, we might say, randomly, by retrieving the value (the definition) for a given key (the word).

Since at this point in our class we are starting to move out of Processing and into Python itself, I want introduce dictionary syntax to you outside of Processing. But first, we need to take a detour into the command line.

The command line

The command line is just a different kind of interface for accessing files and running programs on your computer. While Finder (on Mac) and Explorer (on Windows) use a graphical user interface (GUI) to manage our files, folders, and programs, the command line is a text-based interface. The command line is typically thought of as a kind of legacy mode of computer use: text-based command line interfaces were common before the prominence of the GUI.

Why learn the command line? A command line context is often an easier way to develop computer programs because we as developers do not need to worry about designing and implementing graphical interfaces, which often require extra code. Conversely, this means that using programs in a text-based command line mode is often less intuitive and less user-friendly than in graphical modes. Working in a command line mode is an important skill for a developer, however, as it often allows you to more rapidly run and test new code, without having to build up a more developed interface. Also, since the command line is text-based, there are some situations, like data processing, where running a program in a command line context may actually be easier. Lastly, there may be times when you will be working in a server context — for example installing code over a network into a cloud-based machine that is physically distant and inaccessible in person — and in these cases it is often impossible to access a GUI interface, so you would need to navigate a command line interface.

Sidenote: If you would like to learn more about the history of the command line, and watch an elaborated technical discussion, you can access this two part video tutorial that I created for another class: Command line tutorial part 1: history, Command line tutorial part 2: technical instruction. Keep in mind that this was created for another class, so a couple comments will be irrelevant, but the history and explanation of how to use the command line will all be relevant.

Let's go through a few basic commands that you'll need to get started working with the command line. Note the new formatting: whenever I use a fixed-width font on a black background with rounded corners and a thin gray bar on top, I am showing you valid instructions for the command line, like this:

$ pwd
/Users/rory/dev/code-toolkit
$ ls
dictionary.py

The way to read this is to look and see that pwd and ls are valid command line instructions. You can type them in at a command prompt and press enter. The lines that come below are examples of what you may see, but your results will be different depending on the files and folders on your computer.

Please note: you should not enter the dollar sign ($) when you are entering command line instructions. I use the dollar sign to signify the command prompt. The command prompt may be signified by different punctuation in your terminal. That is how the command line tells you that it's ready for more input, and that is where you will enter new commands. Always press enter to execute the command you've typed.

pwd — stands for "print working directory", and this command will display the directory (equivalent to a folder) that you are currently working in.

ls — stands for "list", and this command will list all of the files in your current directory. This is the command line equivalent of simply opening a folder in a Finder (Explorer) window and viewing its contents.

cd — stands for "change directory", and this is how you change the directory that you are currently in. Mac has some really nice interaction between the command line and the GUI, so you can type cd, and then drag a folder in to the Terminal window, and press enter, and that will change to that directory. I'm not sure if there is an equivalent to this on Windows.

The command line allows you to type the name of any executable program on your system, press enter, and the command line will then try to run that program. Since we're working in Python today, let's run the Python program. Do this by simply typing python, and you should see something like this:

$ python
Python 2.7.7 (default, Jun  2 2014, 18:55:26) 
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

Notice that the punctuation indicating my command prompt has changed from a $ to >>>, which means that we are in the python shell. This is a place where you can type any valid Python code, and it will be interpretted and run.

To exit out of the Python shell you can type CONTROL-D, or exit(). Note that regular command line instructions will not work inside the Python shell, so if you want to cd or ls, you need to exit out of Python.

Dictionaries, syntax and usage

Now that we have a Python shell, let's learn about how to use dictionaries in this interactive Python shell. First let's review some list commands:

>>> a = []
>>> a.append(100)
>>> a.append(200)
>>> a.append(300)
>>> print(a)
[100, 200, 300]
>>> print(a[0])
100
>>> print(a[1])
200
>>> print(len(a))
3

Note that you don't need to type print() in the Python shell to see a variable value. You can simply type the variable name, and the Python shell will evaluate it and print its value:

>>> a
[100, 200, 300]
>>> a[0]
100
>>> len(a)
3

It is also valid Python code to initialize a list by using this notation:

>>> a = [ 100, 200, 300 ]
>>> print(a)
[100, 200, 300]
>>> print(a[0])
100
>>> print(a[1])
200

Now let's learn some new syntax for dictionaries:

>>> r = {}
>>> r["name"] = "Rory"
>>> r["species"] = "human"
>>> r["hair color"] = "brown"
>>> r["favorite thing"] = "computers"
>>> r
{'hair color': 'brown', 'name': 'Rory',
 'favorite thing': 'computers', 'species': 'human'}
>>> r["name"]
'Rory'

Notice that the order in which the key-value pairs are printed is not the same order in which I entered them. This is because dictionaries do not preserve order, as I mentioned above.

You can also use the len() command as with lists. In this case, len() will return the number of key-value pairs:

>>> len(r)
4

If you try to access a key that is not in your dictionary, you will get a KeyError. For example:

>>> r["date of birth"]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'date of birth'

This very helpful error message is telling you precisely the problem: that the key "date of birth" does not exist in this dictionary.

There is a command that you can use to help avoid these errors: in. This command checks if a given key is in the dictionary, like so:

>>> "date of birth" in r
False

So you can use this boolean inside an if statement, like so:

>>> if "date of birth" in r:
...   r["date of birth"]
... else:
...   "Key not in this dictionary"
... 
'Key not in this dictionary'

Similar to the new array initialization above, you can also initialize a dictionary with some new syntax:

>>> r = {
    "name": "Rory",
    "species": "human",
    "hair color": "brown",
    "favorite thing": "computers"
}
>>> r["name"]
'Rory'

One last thing, you can even combine data structures in useful and important ways. So for example:

>>> r = {
    "name": "Rory",
    "species": "human",
    "hair color": "brown",
    "favorite thing": "computers"
}
>>> g = {
    "name": "Gritty",
    "species": "monster",
    "hair color": "orange",
    "favorite thing": "hockey"
}
>>> a = [ r, g ]
>>> a
[{'hair color': 'brown', 'name': 'Rory',
  'favorite thing': 'computers', 'species': 'human'},
 {'hair color': 'orange', 'name': 'Gritty',
  'favorite thing': 'hockey', 'species': 'monster'}]

So what I've done here is create two dictionaries, r and g, and then created a list a, which holds those two dictionaries. Note the format that Python uses to display the contents of a. This is JSON! This is what the JSON text format looks like.

Dictionary example, with JSON via network

For the remainder of class, let's work through an example about how dictionaries work, how they can be used to serialize JSON data, and how that JSON data can be retrieved via a network.

Step 1. Let's start with a basic Processing example that reviews many of the concepts from throughout the semester so far. Create a new Processing sketch and add one graphical element that the user can move with the mouse. I'll draw this graphical element with a simple rectangle, but I'll call this a "creature" and we can imagine that it could be something more visually interesting.

Start with one creature. If we want it to move horizontally and vertically and to have it's own color, how many variables do we need to represent it? Three: x, y, color. So create these three variables, give them initial values, do all the usual Processing stuff like specifying window size, and add a keyPressed() block to respond to user input. You should end up with something like this:

week11_networked_example_part0.pyde

(Note that I have added a .html extension so that you can view this file in a browser. You can copy/paste this code into a Processing window. Or, if you want to download and run this code, save the file and remove the .html extension, but you will also need to remove the HTML code from the file, like <pre> tags.)

Step 2. Great. Now let's add a bunch of other creatures. How do we do that? With a list. And if we want these other creatures to be represented in the same way as our single creature, how many lists do we need? Three. A list for x, a list for y, and a list for color. Add those lists, then append two initial values to each list in setup(), and add a loop in draw() to iterate over the lists to draw all the creatures represented by the lists. Don't worry about moving these around on the screen yet. After doing this, you should end up with something like this:

week11_networked_example_part1.pyde

Step 3. Now let's say we want to add a size to our creatures, so that each one can have its own unique size. How would we do this? We could add a size variable. We would do this by adding one size variable for our single moving creature, and then a list to hold the size values for the creatures that we're representing with lists. But, there is a different way to do this — one that involves dictionaries. And I think this will be a good example of how we can use this new data structure

Start by replacing the three variables that represent our moving creature with one dictionary that contains three key-value pairs holding those values. So go from this:

myX = 250
myY = 250
myColor = color(155,155,255)

to this:

myCreature = {}
myCreature["x"] = 250
myCreature["y"] = 250
myCreature["color"] = color(155,155,255)

Similarly, inside setup(), where we are initializing our lists, instead of append()ing values to three different lists, let's make some dictionaries, and append those to one single list. So in global space, go from this:

xList = []
yList = []
cList = []

to this:

creatureList = []

and then, inside setup(), go from this:

xList.append(100)
yList.append(100)
cList.append( color(255,155,155) )

xList.append(200)
yList.append(200)
cList.append( color(155,255,155) )

to this:

c = {}
c["x"] = 100
c["y"] = 100
c["color"] = color(255,155,155)
creatureList.append(c)

c = {}
c["x"] = 200
c["y"] = 200
c["color"] = color(155,255,155)
creatureList.append(c)

Note that now I only have one list, and it contains a collection of dictionary objects. Putting that altogether will look like this:

week11_networked_example_part2.pyde

Also note that I also needed to modify draw() and keyPressed() to use these new dictionaries.

Step 4. Now we can finally add that size variable that I was talking about in Step 3. To do this, simply add this new line in global space:

myCreature["size"] = 10

and use this new key-value pair inside draw():

rect(myCreature["x"],myCreature["y"],myCreature["size"],myCreature["size"])

Then, do something similar for the list of creatures. Inside setup(), add a new key-value pair pair to each creature, like this:

c["size"] = 30

and use that when you are looping over the creatures to draw them. Putting that altogether should look something like this:

week11_networked_example_part3.pyde

Step 5: Serializing this data. Hopefully you might see some advantages to working this way. There are some conveniences to working with key-value pairs instead of individual variables. But maybe not that many — you could still do all of this work so far without using dictionaries, as we have been doing throughout the whole semester! But one advantage to using data structures in this way is that they can be easily serialized.

So let's modify this sketch so that instead of hard-coding the list of creature values, we are reading thata data from a file. Add the following import statement as the first line of your program:

import json

This will let us use some new commands to read and write JSON data. Now add the following lines inside setup():

f = open("data.json")
j = json.load(f)

for c in j:
    creatureList.append(c)

This is going to open a file named data.json, read its contents, and populate your list of creatures automatically with data from that file. But first you need to add this file to your sketch directory. Have a look at this file: week11_data.json. That is JSON data. It looks a lot like the way Python prints out the contents of dictionaries and lists. Save this file into your sketch folder, and run your sketch. Now, you should be able to modify that JSON file to change values like size or color, or add new creatures. Be careful when modifying a JSON file, the format is very precise. You need commas separating each item, but you cannot have a comma after the last item.

Putting that altogether should look something like this:

week11_networked_example_part4.pyde

Step 6: Networking. In Step 5 we saw how we could read JSON data from a local file, in other words, a data file that is saved on disk, on the same computer that we are working on. (The word local perhaps get over-used in computer science. So far this semester we have talked about local in opposition to global, in terms of variable scope. But now, we are talking about local in opposition to remote, in terms of whether a file is located on your own computer, or, on a different computer over a network.)

Save the following code file: week11_network.pyde. Once you download it, you should be able to simply drag it into your sketch window to add it as a new tab to your sketch. If you'd like, have a look at this code. It defines two functions: getData(cList) and sendData(c). getData() retrieves a JSON file from a webserver with the IP address of 174.138.45.118:5000, then it populates a list of dictionaries, and returns that list. You would use it like this:

creatureList = getData(creatureList)

sendData(c) sends the data about one single "creature" to this webserver. This webserver has its own Python code which saves this data into a list and distributes it to anyone who calls getData(cList).

Putting that altogether would look like this:

week11_networked_example_part5.py

Now, if you run this code (and the webserver is turned on and running!) you should be able to move your creature around, while also seeing the moving creatures of anyone else currently connecting to this webserver.

If you would like to see what the webserver code looks like, you can have a look here: week11-web-server.py. This is a relatively short amount of code! Just keep in mind that this looks simple, but it is using a lot of things that we have not talked about yet. Mainly, this is using a Python web server library called Flask, which you can read about if you are curious.

Code Toolkit: Python, Fall 2020

Week 11 — Wednesday, November 11 — Class notes

Data serialization: for storage and communication

Table of contents

Dictionaries, an intro

The command line

Dictionaries, syntax and usage

Dictionary example, with JSON via network