Python Data Structures Tutorial
Data structures not only provide the means to store and organize data, they also facilitate the four fundamental operations: insertion, update, deletion and retrieval of data. Some data structures are better at some of these four functions than other (what I mean by better will be clarified below), so a programmer must choose the data structures that are appropriate for the task at hand.
This tutorial presents the fundmentals of the built-in data structures in Python: list, tuple, dictionary, and set.
The tutorial is organized as follows:
Preliminary
- Introduction shows how to create these data structures and their printed forms, which help you recognize them at a glance.
- What can be stored? shows that these data structures are capable of storing heterogenous and homogenous data types
Fundamental operations
The presentation of the four fundamental operations is broken up into the following components:
- Mutability explains the concept of whether a data structure can be modified after its creation.
- Indexing operations and data retrieval explains the concept of an element’s position and shows how to get data from a data structure.
- Insertion, update, and deletion shows how to perform these operations.
- Usage convention lists some of the conventional ways in which the data structures presented in this post are used.
Pythonic
- Iterating through a data structure explains the use of these data structures as iterables.
- Building sequence, one item at a time shows how to use list comprehension and generator expression.
Preliminary
Introduction
The following code shows how to create a list, a tuple, a dictionary, and a set.
list_of_numbers = [1, 2, 3]
print list_of_numbers
print type(list_of_numbers)
[1, 2, 3]
<type 'list'>
tuple_of_numbers = (1, 2, 3)
print tuple_of_numbers
print type(tuple_of_numbers)
(1, 2, 3)
<type 'tuple'>
set_of_numbers = {1, 2, 3}
print set_of_numbers
print type(set_of_numbers)
set([1, 2, 3])
<type 'set'>
my_dictionary = {'a':1, 'b':2, 'c':3}
print my_dictionary
print type(my_dictionary)
{'a': 1, 'c': 3, 'b': 2}
<type 'dict'>
Recap
- A list’s content is enclosed in brackets,
[]
. - A tuple’s content is enclosed in parentheses,
()
. - A set’s content in enclosed in braces,
{}
. But when printed out, it is presented asset([<set's content>])
. Note how this differs from the printed forms of list and tuple. - A dictionary’s content is also enclosed in braces, but the elements of a dictionary are key:value pairs.
What can be stored?
list_of_letters = ['a', 'b', 'c']
print list_of_letters
mixed_list = ['a', 1, 1.5, False]
print mixed_list
['a', 'b', 'c']
['a', 1, 1.5, False]
tuple_of_letters = ('a', 'b', 'c')
print tuple_of_letters
mixed_tuple = ('a', 1, 1.5, False)
print mixed_tuple
('a', 'b', 'c')
('a', 1, 1.5, False)
set_of_letters = {'a', 'b', 'c'}
print set_of_letters
mixed_set = {'a', 1, 1.5, False}
print mixed_set
set(['a', 'c', 'b'])
set([False, 1.5, 'a', 1])
mixed_dictionary = {1:'a', 'b':2, 1.5:0}
print mixed_dictionary
{1.5: 0, 1: 'a', 'b': 2}
Recap
- List, tuple, and set can store data of one type, or of many types. Above, we have seen them storing just numbers, just letters, and a mix of numbers and letters.
- Dictionary keys can be of many data types, and likewise their values. Above, we saw
mixed_dictionary
hasint, str, float
keys, andstr
andint
values. - The data stored in list, tuple, set, and dictionary don’t have to be just numbers and letters, or built-in data types. You can store data types that you create yourself (called user-defined data type). More on this later.
Fundamental operations
The four fundamental operations can be categorized into modifying operations and non-modifying operations. Insertion, update, and deletion are modifying operations - they change a data structure. Retrieval is non-modifying - it doesn’t change anything.
Mutability
A data structure is mutable if it can be modified after its creation. Otherwise, it is immutable.
List and dictionary are mutable. Set and tuple are immutable.
Because insertion, update, and deletion are modifying operations, they can be used only on list and dictionary, but not on set and tuple. In other words, set and tuple can’t be changed after their creation.
Retrieving data from a data structure does not modify the data structure itself, so you can retrieve from all of list, tuple, set, and dictionary.
Retrieval is presented first, because (1) it can be used on all four data structures, and (2) it involves indexing operations, which is also used in insertion, update, and deletion.
Indexing operations and data retrieval
list
and tuple
are ordered data structures. This means there exists an order of the elements in the data structures. Let’s look at this through a couple of examples
print list_of_numbers
print list_of_numbers[0]
print list_of_numbers[1]
print list_of_numbers[2]
[1, 2, 3]
1
2
3
print tuple_of_numbers
print tuple_of_numbers[0]
print tuple_of_numbers[1]
print tuple_of_numbers[2]
(1, 2, 3)
1
2
3
Note from the above examples that the number inside the brackets specifies the position of the data element. This number is called an index, and it starts from 0. The 0-index specifies the position of the first element.
set
and dict
are unordered data structures. The elements stored in them are like ice cubes in a bucket - there is no order to them. Hence, there is no such thing as element’s indexed position, so we can’t index a set
or a dict
.
You cannot retrieve data from a set directly. One of the ways to do so is to convert the set to a list, then use the list indexing operation as shown above.
Retrieving data from a dict
is usually accomplished by using its keys.
The following demonstrates how to retrieve data from a set
and a dict
.
print list(set_of_numbers)[0]
print list(set_of_numbers)[1]
print list(set_of_numbers)[2]
print '\n'
print my_dictionary['a']
print my_dictionary['b']
print my_dictionary['c']
1
2
3
1
2
3
Recap
list
andtuple
are ordered,set
anddict
are unordered.list
anddict
are mutable,tuple
andset
are immutable.- In an ordered data structure, the order of the elements is the order in which they were listed from left to right at the time of creation.
- To retrieve a single element from a
list
or atuple
, specify the element’s position inside brackets. The position is called an index, and the index begins from 0, not 1. The 0-index specifies the position of the first element. - Retrieving data from a set can be accomplished by converting the set to a list.
- Retrieving data from a dictionary is accomplished specifying the key corresponding to the desired value, in brackets.
We now move on to the modifying operations.
Insertion, update, and deletion on list
# append to the the end of the list
print 'current list_of_numbers: {}'.format(list_of_numbers)
list_of_numbers.append(100)
print 'after appending 100 to the end: {}'.format(list_of_numbers)
# insert to a specific position
list_of_numbers.insert(0, 10)
print 'after inserting 10 at index 0: {}'.format(list_of_numbers)
# deletion
del list_of_numbers[0]
print 'after delete the element at index 0: {}'.format(list_of_numbers)
# update element at a specific position
list_of_numbers[0] = 50
print 'after updating element at index 0 to 50: {}'.format(list_of_numbers)
current list_of_numbers: [1, 2, 3]
after appending 100 to the end: [1, 2, 3, 100]
after inserting 10 at index 0: [10, 1, 2, 3, 100]
after delete the element at index 0: [1, 2, 3, 100]
after updating element at index 0 to 50: [50, 2, 3, 100]
Recap
mylist.append(element)
appendselement
to the right end ofmylist
.mylist.insert(position index, element)
insertselement
intomylist
atposition index
.del mylist[position index]
deletes the element atposition index
frommylist
.mylist[position index] = value
update the element atposition index
inmylist
to value.
Insertion, update, and deletion on dict
# Insert a new key:value entry
print 'current my_dictionary:\n\t {}'.format(my_dictionary)
my_dictionary['d'] = 4
print 'after inserting the pair d:4 into my_dictionary:\n\t {}'.format(my_dictionary)
# Update the value of a key
my_dictionary['d'] = 5
print "after changing the value corresponding to the key 'd' to 5:\n\t {}".format(my_dictionary)
# Delete a key:value entry
del my_dictionary['d']
print "after deleting d:5 from my_dictionary:\n\t {}".format(my_dictionary)
current my_dictionary:
{'a': 1, 'c': 3, 'b': 2}
after inserting the pair d:4 into my_dictionary:
{'a': 1, 'c': 3, 'b': 2, 'd': 4}
after changing the value corresponding to the key 'd' to 5:
{'a': 1, 'c': 3, 'b': 2, 'd': 5}
after deleting d:5 from my_dictionary:
{'a': 1, 'c': 3, 'b': 2}
Recap
- Insertion, deletion and update on a
dict
all use the syntaxmy_dictionary[key]
. - Insertion and update syntaxes look alike. If
key
already exists, the value on the right hand side replaces the currentvalue
of thekey
. Otherwise, a newkey:value
pair is inserted inmy_dictionary
. - Deletion on
dict
usesdel
like deletion onlist
.
Usage conventions
The following conventions should be followed:
- Use list
for homogeneous data (data of the same type), tuple
for heterogenous data (data of different types), set
for unique data items, and dict
when there is a natural correspondence that can be expressed as key:value pairs.
- For example, list
can be used to store a collection of prices, so you can take their average later on, tuple
for a person’s personal information, such as name (string
), age (int
), height (double
). set
for a collection of unique students’ names in a class, and dict
for a gradebook with key being students’ names and values being their grades.
Pythonic
Iterating through a data structure
It is often useful to iterate through a data structure, i.e. get the stored items one at a time. For example, given a dict
gradebook, you may want to get only the students’ names whose grades correspond to letter A. This requires going through the name:grade pairs in the dict
one at a time and check whether the grade passes the threshold for an A.
The following code demonstrates iterating through a list
, a tuple
, a set
, and a dict
. Note the similarities and differences in the syntax.
list_of_numbers = [1, 2, 3, 4]
print 'iterating through a list'
for number in list_of_numbers:
print number
tuple_of_letters = ('a', 'b', 'c', 'd')
print '\niterating through a tuple'
for letter in tuple_of_letters:
print letter
set_of_letters = {'a', 'b', 'c', 'd'}
print '\niterating through a set'
for letter in set_of_letters:
print letter
grade_book = {'John':76, 'Kate':89, 'Peter':95}
print '\niterating through a dictionary'
for name, grade in grade_book.iteritems():
print "{}'s grade is {}".format(name, grade)
iterating through a list
1
2
3
4
iterating through a tuple
a
b
c
d
iterating through a set
a
c
b
d
iterating through a dictionary
John's grade is 76
Kate's grade is 89
Peter's grade is 95
Recap
-
The basic syntax for iterating through
list
,tuple
, andset
ispython for item in data_structure: <do something with the item>
- The syntax for iterating through
dict
is almost the same, exceptdict
must be converted to an iterable collection by callingiteritems()
on thedict
. - The order in which the items are fetched are the same order in which they are listed in a
list
or atuple
. Sinceset
anddict
are unordered data structures, this order can change depending on the internal storage of the data items (more on this later).
Building a sequence, one item at a time
list comprehension (or listcomp) and generator expression (genexp) are Pythonic ways of building sequences while iterating through another sequence. The most crucial differences between listcomp and genexp are:
- listcomp is used only to build a list
, whereas genexp can be used to build other types of sequences, such as tuple
.
- listcomp creates a complete list
, while genexp creates an object that “yields” the data items one at a time - this saves memory.
# multiplies all elements in a list of number by a constant
a = [1, 2, 3]
b = [num * 2 for num in a]
print a
print b
print type(b)
[1, 2, 3]
[2, 4, 6]
<type 'list'>
I’ll digress to address a subtle point here. You may ask: why can I do the above with a * 2
. Let’s see what happens if we do that.
a * 2
[1, 2, 3, 1, 2, 3]
The *
operator creates a new list
containing two copies of the original list
. In fact, it does not make sense to write a * 2
and expect the a
’s elements multiplied by 2. This behavior is completely consistent once you recall that list
can contain heterogeneous data. Multiply by a constant a list containing both numbers and letters wouldn’t make sense, would it?
# build a tuple containing elements in a, each multiplied by 2
a = (1, 2, 3)
b = (num * 2 for num in a)
print a
print type(a)
print b
print type(b)
(1, 2, 3)
<type 'tuple'>
<generator object <genexpr> at 0x10463a820>
<type 'generator'>
Above we see that a
is a tuple, and b
is a generator
object - b
does not store data but simply is a means to iterate through the sequence created by code in parentheses on line 3. To get the data from b
, call next(b)
.
print next(b)
print next(b)
print next(b)
print next(b)
2
4
6
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
<ipython-input-18-17570dfc8ec9> in <module>()
2 print next(b)
3 print next(b)
----> 4 print next(b)
StopIteration:
Once the last item (number 6) has been yielded, b
is “empty”. Calling next(b)
therefore raises a StopIteration
Exception. This behavior is consistent with the Iterator protocol. I will explain the Iterator protocol in a future post.
To actually build a tuple, we need to put the generator expression inside tuple()
, as follows:
a = (1, 2, 3)
b = tuple(num * 2 for num in a)
print a
print type(a)
print b
print type(b)
(1, 2, 3)
<type 'tuple'>
(2, 4, 6)
<type 'tuple'>