Python

Note: you can download the Python code referenced in this lecture from http://lyle.smu.edu/~tylerm/courses/cse3353/code/l3.zip . The direct links in the text may not work due to a server misconfiguration.

Installation

The version of Python we'll use in this class is 2.7.3. You can download and install from python.org. Note: do NOT install version 3.3.x. Python version 3 includes a substantial overhaul of the language, and most libraries are not compatible so very few people actually use that version.

Supplemental Readings

Optional (If you're new to Python and want to learn more than what is in these notes, then these are good resources.)

Why Python?

We already know Java and C++. Why learn Python?

Python is a very likable language. Some things that I like about it:

Python as an Environment

One of the best ways to learn python is to type expressions into the interpreter. Just give the command python to your Linux or Mac shell, or from a Python command line in Windows, and start typing expressions.

[tmoore@minnow ~] python
Python 2.7 (r27:82500, Sep 16 2010, 18:03:06)
[GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 3
>>> b = 4
>>> import math
>>> c = math.sqrt(a*a+b*b)
>>> c
5.0

To exit, invoke the quit() function or type a control-D.

Python as a Language

Python code looks like no other code that I'm familiar with. It's a complete departure from the C family of languages.

# Like shell scripts, it uses # as a end-of-line comment character

import math    # packages are "loaded" by the import statement

a = 3          # no need to declare types; very dynamic
b = 4
c = math.sqrt(a*a+b*b)
So far, not too bad. Let's see some syntax:

if a == b:
    print 'a and b are the same'
else:
    print 'a and b differ'
print "let's go on"

Hmmm. Where are the parens and braces? Gone! Python knows that the else section is over because the indentation ends. That's right, the indentation has syntactic meaning in Python. So, the following two programs are different in Python.

#first code block
i = 0
while i < 10:
    i += 1
print i

#second code block
i = 0
while i < 10:
    i += 1
    print i

The first code block prints the last number, while the bottom one prints every number, because the print statement is inside the loop.

Functions in Python

Function declarations in Python are simple: a name, a formal argument list (no data types), and a body. The end of the body is, as expected, signaled by the end of indentation.

You can try these by downloading the mathfuns.py python file, importing the contents into python, and running the functions:

>>> import mathfuns
>>> mathfuns.hypo(5,12)
13.0
>>> mathfuns.gcd(55,89)
1

You can avoid the filename (which is also the name of the module) by importing particular members or all members:

>>> from math import sqrt
>>> sqrt(9)
3
>>> from mathfuns import *
>>> fibonacci(100)   # too long!
>>> gcd(30,50)
a is 30 and b is 50
a is 30 and b is 20
a is 10 and b is 20
a is 10 and b is 10
10

Data types in Python

All the examples so far have been numeric, for no good reason but that numbers don't need much introduction. Let's look at some more interesting datatypes. To play with these test values, download this sampledata.py file.

Strings

Strings pretty much work as you expect. You can concatenate them with the + operator. You can take their length. You can print them.

>>> from sampledata import *
>>> x+x
'spam, spam, '
>>> x+x+y+' and '+x
'spam, spam, eggs,  and spam, '
>>> x+x+y+'and '+x
'spam, spam, eggs, and spam, '
>>> len(x)
6
>>> lex(x+y)
12
>>> print(x+y)
spam, eggs,

You can substitute stuff into them like the C printf statement, using % and a letter code to indicate the type and format. Here are the most common codes:

Code Data type Example string Example value
%s String 'Hi %s' % ('Tyler') 'Hi Tyler'
%i Integer '%i+%i' % (1,2) '1+2'
%f Float '%f+%i' % (1.1,2) '1.100000+2'
%.nf Float with n decimal places '%.2f+%i' % (1.1,2) '1.10+2'

For complete reference see the Python documentation.

Lists

Lists are denoted with square brackets with commas between the elements. You can index them numerically, and extract sub-lists. You can append stuff onto the end (actually, either end). You can store into them. Lists are one of the most useful and commonly used data structures in Python.

>>> from sampledata import *
>>> len(cheeses)
6
>>> cheeses[0]
'swiss'
>>> cheeses[1:3]
['gruyere', 'cheddar']
>>> cheeses[1:4]
['gruyere', 'cheddar', 'stilton']
>>> cheeses.append('gouda')
>>> cheeses
['swiss', 'gruyere', 'cheddar', 'stilton', 'roquefort', 'brie', 'gouda']
>>> cheeses[0] = 'emmentaler'
>>> cheeses
['emmentaler', 'gruyere', 'cheddar', 'stilton', 'roquefort', 'brie', 'gouda']

The append shows how to invoke a method on a list, and that lists are mutable, unlike tuples.

Tuples

Tuples are just like lists, except that they use parentheses instead of square brackets and they are immutable.

>>> from sampledata import *
>>> len(troupe)
6
>>> troupe[0] = 'Homer'   # won't work
Traceback (most recent call last):
  File "", line 1, in 
TypeError: 'tuple' object does not support item assignment
>>>

Dictionaries

Like all civilized languages, Python has hashtables built-in, except that Python calls them dictionaries (like Smalltalk). Dictionaries are another fundamental data structure in Python that you will use frequently.

>>> from sampledata import *
>>> uni
{'Jones': 'Oxford', 'Gilliam': 'Occidental', 'Cleese': 'Cambridge', 'Chapman': 'Cambridge', 'Idle': 'Cambridge', 'Palin': 'Oxford'}
>>> uni['Palin']
'Oxford'
>>> uni['Palin'] = 'Oxford University'
>>> uni['Palin']
'Oxford University'
>>> uni.keys()
['Jones', 'Gilliam', 'Cleese', 'Chapman', 'Idle', 'Palin']

Iteration in Python

We've seen the usual while loop above, which is very normal. The python for loop, however, is very powerful, due to its natural integration with iterators. You've seen iterators in Java, but the implementation in Python is much cleaner.

Essentially, if you pass a list or dictionary into a for loop, it will iterate over the list's values or the dictionary's keys.

>>> from sampledata import *
>>> for cheese in cheeses:
...     print '%s is tasty' % (cheese)
...
swiss is tasty
gruyere is tasty
cheddar is tasty
stilton is tasty
roquefort is tasty
brie is tasty

Similarly for dictionaries:

>>> for name in uni:
...     print name, uni[name]
...
Jones Oxford
Gilliam Occidental
Cleese Cambridge
Chapman Cambridge
Idle Cambridge
Palin Oxford

There's even another built-in method to lists that will give you the key-value pairs directly as tuples:

>>> for k, v in uni.iteritems():
...     print k, v
... 
Jones Oxford
Gilliam Occidental
Cleese Cambridge
Chapman Cambridge
Idle Cambridge
Palin Oxford University

If you'd really like to emulate C-style for loops over numbers, you can using the range() function:

>>> range(4)
[0, 1, 2, 3]
>>> for i in range(len(cheeses)):
...     print cheeses[i]
...
swiss
gruyere
cheddar
stilton
roquefort
brie

You shouldn't normally need to do this, given that lists, dictionaries, and most objects have built-in iterators.

List Comprehensions

Here's a common coding task: iterate over some list, perform some action on each element of that list, and store the results in a new list. Here's an example

>>> cheeselen=[]
>>> for c in cheeses:
...     cheeselen.append(len(c))
...
>>> cheeselen
[5, 7, 7, 7, 9, 4]

It turns out that there is a way to do this in just one line of code using list comprehensions:

cheeselen=[len(c) for c in cheeses]
cheeselen

But wait, there's more! Suppose you only want to add items to the list if they meet a certain condition, say if the item begins with the letter s. Well here's the long way:

>>> scheeselen=[]
>>> for c in cheeses:
...     if c[0]=='s':
...             scheeselen.append(len(c))
...
>>> scheeselen
[5, 7]

It turns out you can even add a conditional to list comprehensions. So the 4 lines of code become this:

scheeselen=[len(c) for c in cheeses if c[0]=="s"]
scheeselen

While this may seem a bit esoteric, list comprehensions are incredibly useful when coding quickly. It's worth getting used to them.

String Manipulation

String in python have a number of useful built-in methods. Two of the most useful are split() and join(). Split will break up a sting according to the parameter passed to the method and place the pieces into a list.

x="apple,orange,banana,mango"
fruits=x.split(",")
fruits

The default value for split takes removes all whitespace characters. So for instance:

>>> "Hi there how are you      doing".split()
['Hi', 'there', 'how', 'are', 'you', 'doing']

The inverse of split is join(). Because join() is a method for strings, you invoke it after the string and pass in a list that needs gluing together. Suppose we want to rejoin our fruits list, this time with semicolons:

>>> fruits
['apple', 'orange', 'banana', 'mango']
>>> semifruits=";".join(fruits)
>>> semifruits
'apple;orange;banana;mango'

File I/O in Python

File I/O is pretty straightforward. If you have a text file, you use the built-in open() method. The easiest way to read a file line-by-line is to use the built-in iterator. Suppose we want to read in the file fruitveg.csv:

apple,34,fruit
pear,3,fruit
lettuce,4,veg
potato,15,veg
mango,22,fruit

We could simply print the file out one line at a time:

>>> for line in open('public_html/data/fruitveg.csv'):
...     print "Here is a line: %s" % line
...
Here is a line: apple,34,fruit

Here is a line: pear,3,fruit

Here is a line: lettuce,4,veg

Here is a line: potato,15,veg

Here is a line: mango,22,fruit

We end up with an extra line because there is a newline character at the end of each line, but one is also added at the end of a print statement. To fix that, we can use the string method strip():

>>> for line in open('public_html/data/fruitveg.csv'):
...     print "Here is a line: %s" % line.strip()
...
Here is a line: apple,34,fruit
Here is a line: pear,3,fruit
Here is a line: lettuce,4,veg
Here is a line: potato,15,veg
Here is a line: mango,22,fruit

Suppose we wanted to create a dictionary mapping the fruit name to a list with the other values in each line. Here's what we would need to do:

fvmap={}
for line in open('public_html/data/fruitveg.csv'):
    bits=line.split(',')
    fvmap[bits[0]]=[bits[1],bits[2].strip()]

>>> fvmap
{'lettuce': ['4', 'veg'], 'mango': ['22', 'fruit'], 'pear': ['3', 'fruit'], 'apple': ['34', 'fruit'], 'potato': ['15', 'veg']}

At-home exercise: How can you modify this code to work for any length list, not only two-element lists?

To write to a file, you also use the open() function, but this time include an additional parameter. For instance, to rearrange the list elements 2 and 3 from fruitveg.csv:

f=open('public_html/data/fruitveg2.csv','w')
for k in fvmap:
    f.write('%s,%s,%s\n'%(k,fvmap[k][1],fvmap[k][0]))

f.close()

You can also append to the end of an existing file by using the 'a' parameter.

Variables in Python

Variables are created when they are assigned; no declaration is needed. They are also implicitly typed; there is no need to specify whether it's an int, string, float, etc.

>>> x=4
>>> y=4.2
>>> z='hamster'
>>> type(x)
<type 'int'>
>>> type(y)
<type 'float'>
>>> type(z)
<type 'str'>

For scoping, if the variable is created inside a function, the variable's scope is local to that function. If the variable is created outside a function, it is global (to that file/module).

x=3
def computeFun():
    x=4
    return "the value of fun is %i" % x

def computeFun2():
    return "the value of fun is %i" % x

>>> computeFun()
'the value of fun is 4'
>>> computeFun2()
'the value of fun is 3'

Function code can refer to (get values from) a global variable that already exists (see computeFun2() above). However, if the function wants to modify an existing global variable, then it must be explicitly referred to in the function using the global keyword. Otherwise, the interpreter would allocate a local variable just for the function:

y=1
def computeLife():
    y=42
    return "the meaning of life is %i" % y

def computeLife2():
    global y
    y=42
    return "the meaning of life is %i" % y

>>> computeLife()
'the meaning of life is 42'
>>> y
1
>>> computeLife2()
'the meaning of life is 42'
>>> y
42

Executable Python Scripts and Modules

You can put a bunch of Python code, including function definitions and such, into a file and run it. Look at now_v1.py:

#!/usr/bin/python

from datetime import datetime

now = datetime.now()
print now.strftime("%H:%M:%S")  # Use the internet standard

You can run it from the shell as follows:

python now_v1.py
2012-01-31

Sometimes is more convenient to turn the file into an executable script (if running Linux or Mac OS X):

chmod a+rx now_v1.py
now_v1.py
2012-01-31

Modules in Python

We've also seen that we can import functions and other useful stuff from files into python. It's smart to write your Python code so that you can use the code that way, invoking them from other Python code. Look at now_v2.py:

#!/usr/bin/python

from datetime import datetime

def now():
    """Returns a string for the current day in internet format:  YYYY-MM-DD"""
    now = datetime.now()
    return now.strftime("%H:%M:%S")  # Use the internet standard

Here's how we could use it:

>>> import now_v2
>>> now_v2.now()
'2011-02-17'
>>> print now_v2.now()
2011-02-17

But now it doesn't work as a shell script. Can we do both? Yes, there's a trick! Look at now_v3.py:

#!/usr/bin/python

from datetime import datetime

def now():
    """Returns a string for the current day in internet format:  YYYY-MM-DD"""
    now = datetime.now()
    return now.strftime("%H:%M:%S")  # Use the internet standard

# the following code is only executed if this file is invoked from the
# command line as a script, rather than loaded as a module.

if __name__ == '__main__':
    print now()

And as a script:

chmod a+rx now_v3.py
now_v3.py
2011-02-17

PyDoc: Self-documenting Scripts

To get the documentation on a Python module, including one you write yourself, you can use the pydoc shell command:

pydoc mathfuns

produces the following documentation for mathfuns right to your screen. (You can also set up pydoc as a web server, which is very cool.)

Of course, the documentation that Pydoc gives you comes from the author of the module, and when you write Python code, you shoulder the responsibility of documenting what you create.

Give every function a meaningful documentation string. Write the kind of documentation you'd like to read if you wanted to know how to use the function. The string goes (in triple-quotes, which allows for multiple lines) as the first element of the function definition.

def computeLife():
    """This function returns the meaning of life in integer form"""
    return 42

Some additional guidelines and information:

Documenting Functions Docstring conventions

Python Semantics

Python is a language that is byte-compiled and interpreted on the fly, like Perl, PHP, JavaScript (but not Java) and many other scripting languages. It is weakly typed, with types on objects rather than on variables. It is mostly lexically scoped. Objects are allocated from the heap; almost nothing is allocated on the stack, so you can return objects from functions.

Versions of Python

Python is in active development, and new versions come out regularly. As of this writing (2013), the latest stable release is version 2.7.3.

There is also Python version 3. Version 3 substantially modifies the language and is not backwards-compatible with version 2. This means packages developed for version 2 won't work. Consequently, almost no one actually uses Version 3, since much of Python's value is the packages developed by the open-source community. We use Python version 2 in this class.

Significant portions of these notes were adapted from those used by Scott Anderson in CS304 at Wellesley College. Consequently, these notes are also released under a Creative Commons License.