Python
Note: you can download the Python code referenced in this lecture from http://lyle.smu.edu/~tylerm/courses/cse3353/code/l3.zip . The direct links in the text may not work due to a server misconfiguration.
Installation
The version of Python we'll use in this class is 2.7.3. You can download and install from python.org. Note: do NOT install version 3.3.x. Python version 3 includes a substantial overhaul of the language, and most libraries are not compatible so very few people actually use that version.
Supplemental Readings
Optional (If you're new to Python and want to learn more than what is in these notes, then these are good resources.)
- Think Python, by Olin professor Allen Downey, Chapters 3 and 4. The entire book is available to read online, and offers easy-to-read explanations of Python concepts we will cover.
- The Official Python Tutorial Sections 1--5
- Python quick reference - Very terse overview of features of Python as distinguished from other languages.
Why Python?
We already know Java and C++. Why learn Python?
- Python has far less overhead than Java/C++ for the programmer.
- Python is closer to psuedo-code on the English-pseudocode-code spectrum than Java or C/C++, but it actually executes!
- Python is handy for data manipulation and transformation, and anything "quick and dirty."
- Python is very powerful, thanks to all those extension modules.
Python is a very likable language. Some things that I like about it:
- Its syntax, while bizarre compared to conventional languages, is spare and almost beautiful. It will take some getting used to, but it's quite readable, even for a beginner. Once you're used to it, you may never want to go back to a conventional language!
- It has a lot of powerful, dynamic features: dynamic creation of objects, functions, and pretty much anything.
- It has a lot of powerful packages.
- It is easily portable.
- It has a read-eval-print-loop which makes experimentation and playing a joy.
- Its object-oriented programming is good but optional.
Python as an Environment
One of the best ways to learn python is to type expressions into the interpreter. Just give the command python to your Linux or Mac shell, or from a Python command line in Windows, and start typing expressions.
[tmoore@minnow ~] python
Python 2.7 (r27:82500, Sep 16 2010, 18:03:06)
[GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 3
>>> b = 4
>>> import math
>>> c = math.sqrt(a*a+b*b)
>>> c
5.0
To exit, invoke the quit() function or type a control-D.
Python as a Language
Python code looks like no other code that I'm familiar with. It's a complete departure from the C family of languages.
# Like shell scripts, it uses # as a end-of-line comment character
import math # packages are "loaded" by the import statement
a = 3 # no need to declare types; very dynamic
b = 4
c = math.sqrt(a*a+b*b)
So far, not too bad. Let's see some syntax:
if a == b:
print 'a and b are the same'
else:
print 'a and b differ'
print "let's go on"
Hmmm. Where are the parens and braces? Gone! Python knows that the else section is over because the indentation ends. That's right, the indentation has syntactic meaning in Python. So, the following two programs are different in Python.
#first code block
i = 0
while i < 10:
i += 1
print i
#second code block
i = 0
while i < 10:
i += 1
print i
The first code block prints the last number, while the bottom one prints every number, because the print statement is inside the loop.
Functions in Python
Function declarations in Python are simple: a name, a formal argument list (no data types), and a body. The end of the body is, as expected, signaled by the end of indentation.
You can try these by downloading the mathfuns.py python file, importing the contents into python, and running the functions:
>>> import mathfuns
>>> mathfuns.hypo(5,12)
13.0
>>> mathfuns.gcd(55,89)
1
You can avoid the filename (which is also the name of the module) by importing particular members or all members:
>>> from math import sqrt
>>> sqrt(9)
3
>>> from mathfuns import *
>>> fibonacci(100) # too long!
>>> gcd(30,50)
a is 30 and b is 50
a is 30 and b is 20
a is 10 and b is 20
a is 10 and b is 10
10
Data types in Python
All the examples so far have been numeric, for no good reason but that numbers don't need much introduction. Let's look at some more interesting datatypes. To play with these test values, download this sampledata.py file.
Strings
Strings pretty much work as you expect. You can concatenate them with the + operator. You can take their length. You can print them.
>>> from sampledata import *
>>> x+x
'spam, spam, '
>>> x+x+y+' and '+x
'spam, spam, eggs, and spam, '
>>> x+x+y+'and '+x
'spam, spam, eggs, and spam, '
>>> len(x)
6
>>> lex(x+y)
12
>>> print(x+y)
spam, eggs,
You can substitute stuff into them like the C printf statement, using % and a letter code to indicate the type and format. Here are the most common codes:
Code | Data type | Example string | Example value |
---|---|---|---|
%s | String | 'Hi %s' % ('Tyler') | 'Hi Tyler' |
%i | Integer | '%i+%i' % (1,2) | '1+2' |
%f | Float | '%f+%i' % (1.1,2) | '1.100000+2' |
%.nf | Float with n decimal places | '%.2f+%i' % (1.1,2) | '1.10+2' |
For complete reference see the Python documentation.
Lists
Lists are denoted with square brackets with commas between the elements. You can index them numerically, and extract sub-lists. You can append stuff onto the end (actually, either end). You can store into them. Lists are one of the most useful and commonly used data structures in Python.
>>> from sampledata import *
>>> len(cheeses)
6
>>> cheeses[0]
'swiss'
>>> cheeses[1:3]
['gruyere', 'cheddar']
>>> cheeses[1:4]
['gruyere', 'cheddar', 'stilton']
>>> cheeses.append('gouda')
>>> cheeses
['swiss', 'gruyere', 'cheddar', 'stilton', 'roquefort', 'brie', 'gouda']
>>> cheeses[0] = 'emmentaler'
>>> cheeses
['emmentaler', 'gruyere', 'cheddar', 'stilton', 'roquefort', 'brie', 'gouda']
The append shows how to invoke a method on a list, and that lists are mutable, unlike tuples.
Tuples
Tuples are just like lists, except that they use parentheses instead of square brackets and they are immutable.
>>> from sampledata import *
>>> len(troupe)
6
>>> troupe[0] = 'Homer' # won't work
Traceback (most recent call last):
File "", line 1, in
TypeError: 'tuple' object does not support item assignment
>>>
Dictionaries
Like all civilized languages, Python has hashtables built-in, except that Python calls them dictionaries (like Smalltalk). Dictionaries are another fundamental data structure in Python that you will use frequently.
>>> from sampledata import *
>>> uni
{'Jones': 'Oxford', 'Gilliam': 'Occidental', 'Cleese': 'Cambridge', 'Chapman': 'Cambridge', 'Idle': 'Cambridge', 'Palin': 'Oxford'}
>>> uni['Palin']
'Oxford'
>>> uni['Palin'] = 'Oxford University'
>>> uni['Palin']
'Oxford University'
>>> uni.keys()
['Jones', 'Gilliam', 'Cleese', 'Chapman', 'Idle', 'Palin']
Iteration in Python
We've seen the usual while loop above, which is very normal. The python for loop, however, is very powerful, due to its natural integration with iterators. You've seen iterators in Java, but the implementation in Python is much cleaner.
Essentially, if you pass a list or dictionary into a for loop, it will iterate over the list's values or the dictionary's keys.
>>> from sampledata import *
>>> for cheese in cheeses:
... print '%s is tasty' % (cheese)
...
swiss is tasty
gruyere is tasty
cheddar is tasty
stilton is tasty
roquefort is tasty
brie is tasty
Similarly for dictionaries:
>>> for name in uni:
... print name, uni[name]
...
Jones Oxford
Gilliam Occidental
Cleese Cambridge
Chapman Cambridge
Idle Cambridge
Palin Oxford
There's even another built-in method to lists that will give you the key-value pairs directly as tuples:
>>> for k, v in uni.iteritems():
... print k, v
...
Jones Oxford
Gilliam Occidental
Cleese Cambridge
Chapman Cambridge
Idle Cambridge
Palin Oxford University
If you'd really like to emulate C-style for loops over numbers, you can using the range() function:
>>> range(4)
[0, 1, 2, 3]
>>> for i in range(len(cheeses)):
... print cheeses[i]
...
swiss
gruyere
cheddar
stilton
roquefort
brie
You shouldn't normally need to do this, given that lists, dictionaries, and most objects have built-in iterators.
List Comprehensions
Here's a common coding task: iterate over some list, perform some action on each element of that list, and store the results in a new list. Here's an example
>>> cheeselen=[]
>>> for c in cheeses:
... cheeselen.append(len(c))
...
>>> cheeselen
[5, 7, 7, 7, 9, 4]
It turns out that there is a way to do this in just one line of code using list comprehensions:
cheeselen=[len(c) for c in cheeses]
cheeselen
But wait, there's more! Suppose you only want to add items to the list if they meet a certain condition, say if the item begins with the letter s. Well here's the long way:
>>> scheeselen=[]
>>> for c in cheeses:
... if c[0]=='s':
... scheeselen.append(len(c))
...
>>> scheeselen
[5, 7]
It turns out you can even add a conditional to list comprehensions. So the 4 lines of code become this:
scheeselen=[len(c) for c in cheeses if c[0]=="s"]
scheeselen
While this may seem a bit esoteric, list comprehensions are incredibly useful when coding quickly. It's worth getting used to them.
String Manipulation
String in python have a number of useful built-in methods. Two of the most useful are split()
and join()
. Split will break up a sting according to the parameter passed to the method and place the pieces into a list.
x="apple,orange,banana,mango"
fruits=x.split(",")
fruits
The default value for split takes removes all whitespace characters. So for instance:
>>> "Hi there how are you doing".split()
['Hi', 'there', 'how', 'are', 'you', 'doing']
The inverse of split is join()
. Because join()
is a method for strings, you invoke it after the string and pass in a list that needs gluing together. Suppose we want to rejoin our fruits list, this time with semicolons:
>>> fruits
['apple', 'orange', 'banana', 'mango']
>>> semifruits=";".join(fruits)
>>> semifruits
'apple;orange;banana;mango'
File I/O in Python
File I/O is pretty straightforward. If you have a text file, you use the built-in open()
method. The easiest way to read a file line-by-line is to use the built-in iterator. Suppose we want to read in the file fruitveg.csv:
apple,34,fruit
pear,3,fruit
lettuce,4,veg
potato,15,veg
mango,22,fruit
We could simply print the file out one line at a time:
>>> for line in open('public_html/data/fruitveg.csv'):
... print "Here is a line: %s" % line
...
Here is a line: apple,34,fruit
Here is a line: pear,3,fruit
Here is a line: lettuce,4,veg
Here is a line: potato,15,veg
Here is a line: mango,22,fruit
We end up with an extra line because there is a newline character at the end of each line, but one is also added at the end of a print statement. To fix that, we can use the string method strip()
:
>>> for line in open('public_html/data/fruitveg.csv'):
... print "Here is a line: %s" % line.strip()
...
Here is a line: apple,34,fruit
Here is a line: pear,3,fruit
Here is a line: lettuce,4,veg
Here is a line: potato,15,veg
Here is a line: mango,22,fruit
Suppose we wanted to create a dictionary mapping the fruit name to a list with the other values in each line. Here's what we would need to do:
fvmap={}
for line in open('public_html/data/fruitveg.csv'):
bits=line.split(',')
fvmap[bits[0]]=[bits[1],bits[2].strip()]
>>> fvmap
{'lettuce': ['4', 'veg'], 'mango': ['22', 'fruit'], 'pear': ['3', 'fruit'], 'apple': ['34', 'fruit'], 'potato': ['15', 'veg']}
At-home exercise: How can you modify this code to work for any length list, not only two-element lists?
To write to a file, you also use the open()
function, but this time include an additional parameter. For instance, to rearrange the list elements 2 and 3 from fruitveg.csv:
f=open('public_html/data/fruitveg2.csv','w')
for k in fvmap:
f.write('%s,%s,%s\n'%(k,fvmap[k][1],fvmap[k][0]))
f.close()
You can also append to the end of an existing file by using the 'a'
parameter.
Variables in Python
Variables are created when they are assigned; no declaration is needed. They are also implicitly typed; there is no need to specify whether it's an int, string, float, etc.
>>> x=4
>>> y=4.2
>>> z='hamster'
>>> type(x)
<type 'int'>
>>> type(y)
<type 'float'>
>>> type(z)
<type 'str'>
For scoping, if the variable is created inside a function, the variable's scope is local to that function. If the variable is created outside a function, it is global (to that file/module).
x=3
def computeFun():
x=4
return "the value of fun is %i" % x
def computeFun2():
return "the value of fun is %i" % x
>>> computeFun()
'the value of fun is 4'
>>> computeFun2()
'the value of fun is 3'
Function code can refer to (get values from) a global variable that already exists (see computeFun2() above). However, if the function wants to modify an existing global variable, then it must be explicitly referred to in the function using the global
keyword. Otherwise, the interpreter would allocate a local variable just for the function:
y=1
def computeLife():
y=42
return "the meaning of life is %i" % y
def computeLife2():
global y
y=42
return "the meaning of life is %i" % y
>>> computeLife()
'the meaning of life is 42'
>>> y
1
>>> computeLife2()
'the meaning of life is 42'
>>> y
42
Executable Python Scripts and Modules
You can put a bunch of Python code, including function definitions and such, into a file and run it. Look at now_v1.py:
#!/usr/bin/python
from datetime import datetime
now = datetime.now()
print now.strftime("%H:%M:%S") # Use the internet standard
You can run it from the shell as follows:
python now_v1.py
2012-01-31
Sometimes is more convenient to turn the file into an executable script (if running Linux or Mac OS X):
chmod a+rx now_v1.py
now_v1.py
2012-01-31
Modules in Python
We've also seen that we can import functions and other useful stuff from files into python. It's smart to write your Python code so that you can use the code that way, invoking them from other Python code. Look at now_v2.py:
#!/usr/bin/python
from datetime import datetime
def now():
"""Returns a string for the current day in internet format: YYYY-MM-DD"""
now = datetime.now()
return now.strftime("%H:%M:%S") # Use the internet standard
Here's how we could use it:
>>> import now_v2
>>> now_v2.now()
'2011-02-17'
>>> print now_v2.now()
2011-02-17
But now it doesn't work as a shell script. Can we do both? Yes, there's a trick! Look at now_v3.py:
#!/usr/bin/python
from datetime import datetime
def now():
"""Returns a string for the current day in internet format: YYYY-MM-DD"""
now = datetime.now()
return now.strftime("%H:%M:%S") # Use the internet standard
# the following code is only executed if this file is invoked from the
# command line as a script, rather than loaded as a module.
if __name__ == '__main__':
print now()
And as a script:
chmod a+rx now_v3.py
now_v3.py
2011-02-17
PyDoc: Self-documenting Scripts
To get the documentation on a Python module, including one you write yourself, you can use the pydoc shell command:
pydoc mathfuns
produces the following documentation for mathfuns right to your screen. (You can also set up pydoc as a web server, which is very cool.)
Of course, the documentation that Pydoc gives you comes from the author of the module, and when you write Python code, you shoulder the responsibility of documenting what you create.
Give every function a meaningful documentation string. Write the kind of documentation you'd like to read if you wanted to know how to use the function. The string goes (in triple-quotes, which allows for multiple lines) as the first element of the function definition.
def computeLife():
"""This function returns the meaning of life in integer form"""
return 42
Some additional guidelines and information:
Documenting Functions Docstring conventions
Python Semantics
Python is a language that is byte-compiled and interpreted on the fly, like Perl, PHP, JavaScript (but not Java) and many other scripting languages. It is weakly typed, with types on objects rather than on variables. It is mostly lexically scoped. Objects are allocated from the heap; almost nothing is allocated on the stack, so you can return objects from functions.
Versions of Python
Python is in active development, and new versions come out regularly. As of this writing (2013), the latest stable release is version 2.7.3.
There is also Python version 3. Version 3 substantially modifies the language and is not backwards-compatible with version 2. This means packages developed for version 2 won't work. Consequently, almost no one actually uses Version 3, since much of Python's value is the packages developed by the open-source community. We use Python version 2 in this class.
Significant portions of these notes were adapted from those used by Scott Anderson in CS304 at Wellesley College. Consequently, these notes are also released under a Creative Commons License.