Today we're going to build on the reading discussion this week about the aleatory in computation by learning some techniques in computer program code for working with randomness. One of the things that we read and discussed was where the aleatory comes from within the rigid, formal strictures of digital machinery; how can randomness be generated from deterministic processes?
Today we will not be going into the technical details today around how to generate random and pseudo-random numbers. This topic involves complicated mathematics and we could spend an entire semester (or more!) just talking about those mathematic techniques. Instead, we will start with the fact that random numbers are in fact possible to generate, and we will learn the commands that Python provides to generate them.
Building on this, we will look at techniques for working with algorithms that incorporate randomness, or in other words, algorithms that are probabilistic: step-by-step procedures that incorporate and use values that are unpredictable, but within various ranges and likelihoods, which we can shape and control.
For this lesson I'll be working with a few images. You can work with any images that you'd like as you work through these notes. But if you'd like to use the same images as me, you can download them here:
We didn't get through all the techniques in the class notes last week. Some of you made efforts with the homework parts related to that in any case. Great work to all who did that.
There are a few parts from last week's notes that will be essential to understanding some techniques for today, and they will be essential to completing the homework for this week, which is what you will present as your Unit 1 culmination work. So let's take some time to cover those topics:
Last week we covered the pixel
(Section
IV), and the fact that a digital image is typically
comprised of a grid of
pixels. Each pixel in
the grid is represented by two
coordinates that we call x
and y
, with x
corresponding to
the pixel's horizontal position left/right,
and y
corresponding to the vertical position
up/down.
We then went on to discuss how a pixel is typically
represented by three numbers between 0 and 255, which
typically correspond to the red, green, and blue color
components which combine to form the color of a given
pixel. For example: 255,0,0
would be a pure
red; 0,0,255
would be a pure blue;
and 255,0,255
would be purple (red plus
blue). Determining more color values than that can be
complicated when it's unfamiliar, and you can use a color
picker tool like this
one from Google to help you. Move the sliders around
and note the R, G, B values for each color.
Note that the color picker also lists CMYK, HSV, and HSL values. These are called different color modes or color spaces. We only touched on this briefly last week, but this can be a very useful idea for working with color. In particular, HSV stands for hue, saturation, and value (value is like brightness). Hue is a number that corresponds to the position along the rainbow spectrum: ROYGBIV (red, orange, yellow, green, blue, indigo, violet) usually represented by a number between 0 and 360. Saturation corresponds to how pale/gray the color is. And value or brightness corresponds to how light or dark. Working with HSV can be very useful for filtering color in an image based on hue or brightness.
Putting all the above together, the first example of an
image processing algorithm from last week should now make
sense. Let's go over
that: Example
0: Filtering image pixels. This is using
a nested loop to iterate over all pixels
in the grid. A nested loop is the term
for one loop inside another. Imagine doing a
variable trace table for this and notice
that x
is incremented from 0 to
the width
of the image, and each
time x
increases by one, y
is
incremented from 0 to the height
of the
image. So this nested loop is iterating over the image
column by column.
I hope you see how you could use this pattern here as a way to apply all the algorithms that we looked at in weeks 1 & 2 to image processing. Just as you wrote code to search for the largest or smallest number, or the longest or shortest word, you could write code to search for the brightest pixel in an image, or to sort all the pixels of an image from dark to bright, or from red to violet.
Example 6 does this in the most straightforward
way. It requires two command line arguments
(in sys.argv
). It then uses them with the
PIL Image.open
command to open two image
files which must already exist in the same
directory. Then it uses the PIL
command thumbnail()
to resize each
image. Then Image.new( "RGB", (600,600), "white"
)
to create a new image that is 600 pixels
wide, and 600 pixels tall, all white. Next it
uses paste()
to paste the first image
into the new image at location 0,0
, and
then paste()
to put the second image into
the new image at
location 200,200
. Finally is
uses save()
to save the new image to a
file. Thus combining two images into one.
Example 7 takes the above technique and adds
transparency. The putalpha()
command adds
a layer of transparency to the entire image. As with
R,G,B, the values here are between 0 and
255. So putalpha(128)
makes the entire
image about 50% transparent. Now, to combine that
while preserving that transparency, instead of using
the paste()
command, we have to
use alpha_composite()
, which takes the
same arguments. Finally, since JPEGs do not allow
transparency, we have to convert this image with
transparency to a JPEG before saving it. The command
is convert("RGB")
. Alternatively, we
could have saved the image as a PNG, which does allow
for transparency within it. We would do this by
specifying a filename with a .png
extension: save("new.png")
The last case here, Example 8, combines Example 0 above with Example 7. Notice that the new code in blue here is almost the same as in Example 0. I'm using a nested loop to iterate over each pixel. For each, I'm getting that pixel's values:
(red,green,blue,alpha) = img2_alpha.getpixel((x,y))and then asking
if
that pixel is
"bluish", which I'm checking like this:
if blue > red and blue > green:If that is
True
, I'm using putpixel()
to put a transparent pixel value at that x,y
location — i.e., the fourth argument is 0, completely transparent.
The topics for today: (Click each to jump down to the corresponding section.)
random.random()
random.seed()
random.random()
In spite of all the discussion in your reading responses and in
class about the complexity of generating random numbers, Python
implements various random number generating algorithms and
provides them to us as a collection of easy-to-use
commands. These are provided by a library aptly
called random
.
To get started, simply import this library:
import random
Python documentation. So far, I have been sharing with you all the various commands and functions that you have been working with. But I want to point you to the Python documentation itself, so that you may explore by accessing it directly:
This is the authoritative standard for all things Python. It is comprehensive and accurate, describing all aspects of the language. There are many other guides out there (w3schools.com is good, as is this guide called Automate the Boring Stuff with Python, by Al Sweigart) and these can be great. But if you want to go straight to the source for definitive information, the official Python docs cannot be beat.Official programming language documentation like this can be hard to read. I often need to read slowly and carefuly, and think hard about the precise meanings of the terms that are used here. I usually have to read an explanation a few times before I completely wrap my head around all of its details and implications. But since this is the official documentation, that scrutiny is usually worth it for the thorough understanding that it can offer.
At your stage of learning to code, I know that it can be tempting simply to Google a problem: an error or a task you're trying to accomplish. But keep in mind that the results you will find will be unreliable or simply wrong. Even when the advice is correct, it may not be relevant — it may reference complicated techniques you have not learned yet, or it may be describing a problem subtly different from your own. In any case, reading and interpretting this advice is very difficult, and determining if it is correct and relevant is a very complicated task. Think of it like seeking medial advice from "Dr Google."
Thus, I would like to challenge you this semester to seek advice from reliable sources like the three above, and practice getting accustomed to how to navigate these resources for the information you seek.
In particular, I recommend you look at (a) the tutorial for an instructive introduction, and most powerfully, (b) the "Library Reference", which will tell you everything that you need to know (and more) about Python and its commands.
Let's experiment with using the official Python documentation today. Click on the "Library Reference", scroll about 1-2 screens down, and in the "Numeric and Mathematic Modules" section, find the entry labeled: "random — Generate pseudo-random numbers" (or, click here to go straight there). This section explains how to generate random numbers using Python.
To get started, let's look at the function
called random()
,
which is essentially the heart of this module. To get
acquainted, let's experiment with this in the Python shell:
>>> import random >>> random.random() 0.5536582109886334 >>> random.random() 0.519728888240708 >>> random.random() 0.21552893690804353As you can see,
random()
returns an unpredictable
decimal number between 0 and 1.
There are numerous other shortcuts to make working with randomness easier. But let's start by spending some time with this one.
Using only this function, if you wanted a random integer (a number with no decimal places) between 0 and 9, what might you do? Have a look at this code and think about what it's doing:
>>> int( random.random() * 10 )First of all, taking a decimal number between 0 and 1 and multiplying it by 10 is going to give us another number that is at minimum 0, and at maximum 10. Think about what you would get if you multiply 10 times .9999 (close to the largest value of
random()
). You'll get 9.999.
To summarize this: when you are starting with a number in the
range of 0 to 1, multiplying scales a range. You
are expanding that range of 0 to 1, into the range of 0 and
whatever you multiply it by. random.random() * 100
is going to give us a random decimal number somewhere between 0
and 100. This is a useful principle to keep in mind.
But I said I wanted an integer (whole
number). That is what the int()
command does. It
truncates (or cuts off) the decimal part of the number,
returning a whole number from a decimal point. If you'd prefer,
you could also use round()
, which doesn't simply
truncate, but actually rounds:
>>> int(.1) 0 >>> int(.6) 0 >>> round(.1) 0 >>> round(.6) 1
OK. Now, what if I wanted a number that was between 50 and 100?
What is the range here? 50. So I could use my scaling technique
like this: random.random() * 50
. But that is going
to give a number between 0 and 50, not between 50 and 100. What
can I do here? I could simply add 50: 50 + random.random()
* 50
. Now I'm taking a random number scaled to the range
of 0 to 50, and adding 50 to it, shifting that range to 50 to
100. To summarize this principle: when working with numbers like
this, addition shifts a range.
Multiplying scales a range, and adding shifts the
range. Using these techniques, you can shape the range
of values we get from random.random()
and
manipulate them to span whatever range of randomness we
wish. These are very useful principles for creative coding.
Fortunately, Python gives us some commands that we can use directly to achieve the same principles. (I think it is important and valuable later on to understand what is going on here, hence the above explanation before this reveal.)
If you want a randomly generated whole number (integer) from
within some range, you can
use: random.randrange(start,stop)
. (Python
docs for this.) This takes two arguments and returns a
random integer in between those arguments. So for example:
>>> random.randrange(5,10) 6 >>> random.randrange(5,10) 8 >>> random.randrange(5,10) 5
Note that this returns values that include the first arugment,
but that are less than the second argument. In other
words, random.randrange(5,10)
will randomly return
numbers from the list: 5, 6, 7, 8, and 9. In the language of
mathematics (you may remember this from algebra or calculus) we
say that the first argument is "inclusive" and the second is
"exclusive", sometimes written
as [5,10).
If you only pass in one argument, it returns a number between 0 and that argument:
>>> random.randrange(10) 8 >>> random.randrange(10) 3 >>> random.randrange(10) 1
Similar to with two arguments, when passing in one argument, the
value is 0 is inclusive, and the argument is
exclusive. So random.randrange(10)
yields values in
the range: 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9.
random.seed()
Another very useful idea in working with randomness is the idea
of a random sequence of numbers that are reproducible. This can
be useful for testing for example. Let's say you want to draw
something that looks random, but you want it to look random the
same way each time you run your code. You don't want it to be
different each time. There is a command for
this: random.seed()
When you call random.seed()
with one numerical
argument, every time you call random()
after that,
it will always return the same random-looking sequence. For
example, if I run Python twice, I get two different random
numbers each time:
$ python >>> import random >>> random.random() 0.9998781067887237
$ python >>> import random >>> random.random(1) 0.6638400757429336But if I call
random.seed()
, I still get a random number,
but it is the same each time:
$ python >>> import random >>> random.seed(42) >>> random.random() 0.7534071414623441
$ python >>> import random >>> random.seed(42) >>> random.random() 0.7534071414623441
This is called a pseudo-random number (it was discussed in the 10 PRINT book) because the numbers are drawn from a uniform distribution, so they appear random, but they are in fact deterministic.
This is very very useful in game development. For example, say
you want to generate a character or a natural terrain, and you
want its appearance to seem random. But you want that apperance
to be the same each time the game is run, not changing every
time. If that character has a name or ID number, you could use
that as the seed. Each time you want to render that character,
call seed()
with their ID number, and the calls
to random()
that come after will seem random, but
will be random in the same way each time.
You're probably familiar with this term because it is used in many games, such as Minecraft.
What if I want to generate some kind of digital object, say an
image with randomly placed colored pixels on a white background,
and I want to control how many colored pixels there are? You can
ask if
statements about random choices to control
the probability.
For example, have a look at this code, which creates a new blank image, loops over all the pixels, and sets some of them to a color based on a random value:
Example 1
from PIL import Image import random # let's make a 100x100 white image width = 100 height = 100 img = Image.new("RGB", (width,height), (255,255,255) ) for y in range(height): for x in range(width): r = random.random() if r > .5: img.putpixel( (x,y), (0,0,0) ) img.save("rando.png")With this code, I'm using a nested loop again to iterate over each pixel in the image. The outter loop (
y
) repeats once for each row, and each time it
repeats, the inner loop (x
) repeats for each pixel
in that row. For each pixel (i.e., for each x,y pair) this code
picks a random number between 0 and 1 and if that number is
greater than .5, it assigns that pixel to be black —
otherwise the pixel will be white.
In other words, it loops over each pixel and assigns it to be black with 50% probability.
And what if I simply want there to be less black pixels in that image? Have a look at this:
Example 2
from PIL import Image
import random
# let's make a 100x100 white image
width = 100
height = 100
img = Image.new("RGB", (width,height), (255,255,255) )
for y in range(height):
for x in range(width):
r = random.random()
if r > .9:
img.putpixel( (x,y), (0,0,0) )
img.save("rando.png")
In Example 1, a pixel is drawn about 50% of the time: assuming a uniform distribution, a random number between 0 and 1 will be greater than .5 about half of the time. In Example 2 however, I am now only drawing the pixel if the random choice is greater than .9, so this will draw a pixel about 10% of the time. Notice the more sparse image.
The above two examples loop over every pixel in the image and decide with some probability whether that pixel should be black or not. What if we wanted to have a little more control over where and how those random pixels were placed?
Instead of looping over every pixel, we could loop some number of times, and each time we could pick random numbers to use as pixel coordinates.
Let's start by looping 500 times, each time randomly selected a pair of coordinates to use to place a black pixel:
Example 3
from PIL import Image import random # let's make a 100x100 white image width = 100 height = 100 img = Image.new("RGB", (width,height), (255,255,255) ) # loop 500 times, and each time, pick a random x and a random y # and draw a pixel there for n in range(500): x = int( random.random() * 100 ) y = int( random.random() * 100 ) img.putpixel( (x,y), (0,0,0) ) img.save("rando.png")
But what if we didn't want the pixels evenly spaced out. What if we wanted them randomized, but clustered in some kind of way?
When we use random.random()
, it is what's called
a uniform distribution. Every single possible
value is equally likely as the others. This creates an effect of
total noise, like static on a broken TV.
If we want to create something that is random, but clustered in certain ways, we can use what's called a normal distribution, also known as a Gaussian distribution. You may have seen this depicted as a bell curve: values can still be chosen randomly within some range, but there is a greater likelihood that values will be chosen in the middle cluster than at the extremes. This type of probability distribution is used to model many things in the natural and social world, and we can use it for our purposes here.
Fortunately, Python provides a command for us to use
here: random.gauss()
.
As the
docs explain, this command takes two arguments. The first is
the mean which corresponds to the middle point
of the cluster, and the second corresponds to
the standard deviation, which defines who
tightly or widely flared out this distribution is.
Now instead of using random.random()
and getting a
completely uniform distribution, we can use
a Gaussian distribution and shape the
probabilities somewhat.
Example 4
from PIL import Image import random # let's make a 100x100 white image width = 100 height = 100 img = Image.new("RGB", (width,height), (255,255,255) ) # loop 500 times, and each time, pick a random x and a random y # and draw a pixel there for n in range(500): x = int( random.gauss(50,10) ) y = int( random.gauss(50,10) ) img.putpixel( (x,y), (0,0,0) ) img.save("rando.png")
By changing the mean (the 50s) you could move
the cluster around in the x
or y
dimension. And by changing the standard
deviations (the 10s) you can change how tightly
clustered the values are.
You can also mix and match these techniques. So for example here
I'm using a Gaussian distribution for x
(horizontal
clustering), and a uniform distribution for y
(evenly spaced out vertically).
Example 5
from PIL import Image
import random
# let's make a 100x100 white image
width = 100
height = 100
img = Image.new("RGB", (width,height), (255,255,255) )
# loop 500 times, and each time, pick a random x and a random y
# and draw a pixel there
for n in range(500):
x = int( random.gauss(50,10) )
y = int( random.random() * 100 )
img.putpixel( (x,y), (0,0,0) )
img.save("rando.png")
I hope that you realize that you can use these exact techniques
for more than just x,y
values. You could use
similar techniques for choosing random numbers to generate color
patterns, letters, or any time you need a value in a creative
coding context.
What if I want to do something with randomness that is not just generating a number, but rather randomly selecting something from a list? I could do that like this:
>>> import random >>> lyric = ['Birds', 'flying', 'high', 'you', 'know', 'how', 'I', 'feel'] >>> i = random.randrange( len(lyric) ) >>> lyric[i] 'know' >>> i = random.randrange( len(lyric) ) >>> lyric[i] 'Birds'What I'm doing here is creating a list called
lyric
,
then remember that len(lyric)
gives me the length of
that list. So I'm passing the length of the list in to the
command random.randrange()
. That will always give
me a value between 0 and the length of the list. I then use that
value as the index to my list.
Turns out this is such a common operation that Python makes a
shortcut for it: choice()
. If you pass a list to
this function, it will randomly select one item from the list:
>>> random.choice(lyric) 'know' >>> random.choice(lyric) 'how'
Keep in mind that random.seed()
also applies
here. So if you ran the above command in a Python program, it
would make different choices each time. But if you preceded it
with random.seed()
passing in a fixed number, it
would make the same apparently random choices each time you ran
the program.
Let's use this principle to randomly select a file from a folder of many files.
Let's say that I am in a directory (folder) called "Unit 1, Tutorial 4", and that it contains a subbolder called "images", like this:
From the command line, I can use ls
to
view the contents:
$ ls images $ ls images/ earth.jpg fire.jpg newspaper.png smoke.jpgNow I'm going to run Python and use a function called
listdir()
, which takes the name of a
directory, and returns a list of all the files in that
directory. This is in the os
library, so I must
import it first:
>>> from os import listdir >>> listdir("images") ['newspaper.png', 'fire.jpg', 'earth.jpg', 'smoke.jpg']Here I have passed in the name
images
,
the name of my subfolder, and listdir()
gives me an
array of all files within that. Then I could choose a random
file in that list like this:
>>> from os import listdir >>> import random >>> files = listdir("images") >>> random_file = random.choice(files)Now I could go and use
file
as I would any filename
— for example to open that file with Pillow.
Here is how you would do this in a Python program — and there is one more key step to the process:
from os import listdir, path import random from PIL import Image files = listdir("images") files.remove(".DS_Store") random_file = random.choice(files) img = Image.open( path.join("images",random_file) )
Note that I'm importing path
, and then
using path.join()
in the last
line. This is taking the directory name and the filename and
joining them together into the path needed to
access this file. If you tried to access the file without the
directory name you'd get an error since the file is not in your
current folder, but rather in a
subfolder. (Remember the
discussion about paths two weeks ago.)
This technique will be crucial in getting started with the homework for this week.
Warning. If you try the above technique, you may get an error message that looks something like this:
PIL.UnidentifiedImageError: cannot identify image file '.DS_Store'If you see this error, it is because your directory contains a hidden file called
.DS_Store
. These are
hidden files that Mac OS uses to store information about how
your folder is displayed in Finder. Obviously it is not an image
file, so PIL gives an error message when trying to open
it. Wikipedia
has some additioinal useful explanation of what these files are
and what they contain.
Mac OS tries to hide these files from you, but you can see them
if you type ls -a
from the command
line:
$ ls -a . .DS_Store earth.jpg newspaper.png .. air.jpg fire.jpg(Note: this is from the command line, not the Python shell. Remember that you can always tell which shell you're in from the prompt. If you see
>>>
, then you are in
Python. If you see $
or %
, then you are in the command
line. You can always enter the Python shell by
typing python
, and you can exit it by
typing exit()
or CONTROL-D.)
To verify the existence of .DS_Store
from within a Python program or the Python shell, you could
simply print the return value of listdir()
, like
this:
>>> from os import listdir >>> files = listdir("images") >>> files ['newspaper.png', 'air.jpg', '.DS_Store', 'fire.jpg', 'earth.jpg']
Fortunately, the solution for you is easy, simply remove them from your list of filenames:
>>> from os import listdir >>> files = listdir("images") >>> files ['newspaper.png', 'air.jpg', '.DS_Store', 'fire.jpg', 'earth.jpg'] >>> files.remove(".DS_Store") >>> files ['newspaper.png', 'air.jpg', 'fire.jpg', 'earth.jpg']
That should prevent the PIL.UnidentifiedImageError
.