Today I demonstrated how you can import data into your TurtleStitch composition and use it to determine how your design is rendered.
Working with data imported as a list(Click each section to jump down.)
These days data is everywhere and there are many, many places you can go searching for datasets to work with in an exercise like this. You can even create your dataset very easily and organize it using any spreadsheet application. After working through this exercise using a found dataset, you should understand how you could create a similar composition on your own using data that you find or create yourself.
The NYC Open Data project is one place where you can find many datasets to work with. In this case these are all collections of data about various activity in New York, including transportation, education, housing, crime, and natural life. Because I've had students work with it in the past, for this demo I'm going to use the 2018 Central Park Squirrel Census. (Shoutout to Kasper Bielecki and Anne Chen for finding this and working with it in "Code Toolkit: Python" this semester.)
To work with data in TurtleStich today, we are going to start with CSV data, which stands for comma-separated values. There are other ways to work with datasets, but for now I would recommend that you stick with CSV. If you get more comfortable with this perhaps you can experiment with other formats.
Files in CSV format are comprised of data arranged in rows, and as the name implies, each row is made up of data values that are separated by commas. This format — a file full of rows, where each row is a list of values — is very similar to a spreadsheet, which is also comprised of rows and columns. It is very easy to convert a CSV file into a spreadsheet and vice versa.
Unlike spreadsheets, CSV files are just plain text (no fancy formatting). This means that they are very easy to work with when writing computer programs in any language including TurtleStitch, which can easily access, open, and read the contents of data in files formatted as CSV. And the fact that CSV files are easily opened by spreadsheet applications means that we can open CSV files with any spreadsheet tool to view, examine, and manipulate the data.
CSV is also a very standard data format that is widely supported. The NYC Open Data project supports CSV as this is one of the several formats in which it provides data.
So to work with datasets from the NYC Open Data project, I
recommend that you access that data in the CSV format. This is
very easy to do. (See the above image.) Click "Export", then
click "CSV". This will download a file to your computer
called 2018_Central_Park_Squirrel_Census_-_Squirrel_Data.csv
. The .csv
is called a file extension: it tells your
operating system what type of file this is, how it's formatted,
and what program to use to open it. If you double-click on
this file, what program does your computer try to open it
with?
To examine this file, let's open it in a spreadsheet program. I recommend Google Sheets. Click that link, then click "Blank" to create a new blank spreadsheet.
Next, click "File" > "Import", then click "Upload" and drag in the file that you just downloaded. You can leave the defaults as they are specified and click "Import data". This will give you a spreadsheet full of the imported CSV data.
Click "View" > "Freeze" and freeze one row. This will let you scroll while viewing the column headings, and will also help with sorting and filtering.
Have a look through this dataset. Try scrolling to the right to see the various column headings that you have to work with, and what the data looks like in each column. How many rows are there? What are some columns with numerical values that might be interesting to use in a TurtleStitch composition?
With this section, I'll walk through how you might explore your data to find an interesting subset of the data to work with. This is just one example meant to illustrate some exploratory techniques. You will have to examine your own dataset on your own to find some patterns within it that you think will be interesting to incorporate into your composition.
Try sorting and filtering the data. Click "Data" > "Create a filter", this will let you filter data, only showing rows with certain values. Then click the sort / filter icon on column E ("Shift") as shown below, then select either "AM" or "PM" and click "OK" to only display rows that contain this value. Filtering does not delete data from your spreadsheet it only hides it from view, and you can always remove the filter later.
This seems interesting to me. I'll find a meaningful subset of this data in some way, and then filter it by both "AM" and "PM" so I can run that data through TurtleStitch to create a "day" and "night" version of my composition.
But what should this "meaningful subset" be? Hmm ... Looking through this huge dataset, I noticed column D ("Hectare"). This must correspond to some specific areas of the park. Reading a bit more about this dataset, I confirmed that yes, these hectares are used to divide the park up into 350 small squares. So I'll reduce my dataset to just the first 5 hectares: 1A, 1B, 1C, 1D, and 1E. That will give me a feasible amount of data to work with, and then I will compare these 5 hectares during the "AM" and "PM" shifts.
Click the sort / filter icon on column D ("Hectare"), and sort A-Z.
Then, now that my data is filtered to only "AM" and sorted by "Hectare", I will simply drag to highlight only the X and Y values (columns A and B) for all rows that correspond to Hectares 1A - 1E, click "Edit" > "Copy" — or CONTROL-click and select "Copy". Then make a new spreadsheet and paste in this data.
Now repeat this process for "PM". Go back to the spreadsheet with the whole dataset. Click the sort / filter icon for column E ("Shift") and filter only for "PM". Then again highlight only the X and Y values (columns A and B) for all rows that correspond to Hectares 1A - 1E, copy these values, paste them into a new spreadsheet.
These two spreadsheets are almost ready to save and import into TurtleStitch, but there is a slight problem: the values in both of these spreadsheets will not work so well as numbers to use when creating a TurtleStitch composition. Have a look at these values:
-73.98110784,40.76751594 -73.98100322,40.76816256 ...How would you use this in TurtleStitch? As a
move
steps
block? As a
turn degrees
block?
The fact is, it would be hard to do something with these values that would look nice in TurtleStitch. The solution is a process called data normalization. This is a fancy term that just means munging the data a little bit to adjust the scale or range. In this way the data will still work to represent the phenomena it was meant to depict, but it will operate a scale that we can work with in our composition.
In this case, the problem is that all of our values are very close together, clustered around one number, which would not create a lot of spatial variation within our final composition. The solution in this case is to substract the first value from all the values in each column. This will create a column of all very small numbers. We can then multiply these by a large number to scale them up as large as we want, which will give a good spread. In effect, it is like we are zooming in to the middle point of these numbers.
To do this, subtract the first value in the column from all the
others. Enter this formula into the first cell of column
C: =A1 - -73.98110784
and press enter. (If you're
worlking with different data, type or copy in the value of A1
here.)
Next, click cell C1, click in the bottom-right corner, and drag down all the way to apply this formula to the entire column:
Examine these values. They all look very small. That's good. Now we can multiply them by something to spread them out. In my case, it looks like 100,000 will work well. Modify the formula, noting the added parenthesis:
Then, again click and drag to apply this to the entire column.
Now, repeat this process for column B, mapping it over into
column D. Subtract the first value from all values in the
column, and multiply it by something. Experiment to see what
gives a range of numbers that you can work well with in
TurtleStitch. In this case, my formula
is: =100000*(B1-40.76751594)
.
Now let's save this file. Click "File" > "Download" >
"Comma Separated Values" to save this new file to my
computer. Rename this file to "Squirrel AM.csv" because you're
about to make another .csv
file and
it's important to not get them mixed up.
Now repeat the process for your sheet with the "PM" data. Use the same formula for each column that you used for the step above. Download and save this file as "Squirrel PM.csv"
Now we have two files of comparable CSV data that we can work with somehow in TurtleStitch.
The first step needed to work with external data in TurtleStitch is to import some additional blocks. Start a new TurtleStitch project, click the file icon, and click "Import tools".
This adds several new blocks for you to work with. The main one
that we need for this exercise is for each item
of
, in
the Variables
palette.
Next, let's import our CSV data. Click the file icon and click "Import". Browse to the "Squirrel AM.csv" file that you saved and click "OK". This will create a new variable with the name of your file, but instead of containing a single value, this variable will contain all the data in your CSV file. TurtleStitch will show this to you with the "Table view" popup. Click "OK" to close it.
Repeat that process with "Squirrel PM.csv". Now you have two tables of data to work with.
To use these tables of data, let's make a composition that draws a small square at each location specified by each row of the table. Here's my code blocks to get started:
Now click on the Variables
palette and drag in the for
each item of
block
and place it before pen up
so that it includes everything except
reset
. Then
drag Squirrel AM
in
to the for each item
of
block, like this:
Now, within the loop created by for each item
of
, you can reference
the item
variable,
which will include a line of your CSV data.
To access the specific items of the row of CSV data, use
the
item
of
block. In my case, I will drag that block in to the arguments of
set x to
and
set y to
.
To access column C of data, specify item number 3 (since it is
the third column) and to access column D, specify item number
4. Column C will correspond to the X location of the squares,
and column D will correspond to the Y location.
Putting that all together looks like this:
And then you can re-run the composition on PM data by simply
dragging the Squirrel
PM
variable into the
for each item
of
block and running again:
Read Nancy Paterson, "Stock Market Skirt: The Evolution of the Internet, the Interface, and an Idea", from Database Aesthetics, edited by Christiane Paul and Victoria Vesna, 2007
You can read more about this project here:
Using a different dataset from the NYC Open Data project, create your own embroidered data visualization.
All work is due by Tuesday at 8pm.