Python Data Structures Tutorial

I present the Python's built-in data structures that are most commonly used and provide an organizational frame for this knowledge to help you build more complex data structures.

Data structures not only provide the means to store and organize data, they also facilitate the four fundamental operations: insertion, update, deletion and retrieval of data. Some data structures are better at some of these four functions than other (what I mean by better will be clarified below), so a programmer must choose the data structures that are appropriate for the task at hand.

This tutorial presents the fundmentals of the built-in data structures in Python: list, tuple, dictionary, and set.

The tutorial is organized as follows:

Preliminary

  • Introduction shows how to create these data structures and their printed forms, which help you recognize them at a glance.
  • What can be stored? shows that these data structures are capable of storing heterogenous and homogenous data types

Fundamental operations

The presentation of the four fundamental operations is broken up into the following components:

  • Mutability explains the concept of whether a data structure can be modified after its creation.
  • Indexing operations and data retrieval explains the concept of an element’s position and shows how to get data from a data structure.
  • Insertion, update, and deletion shows how to perform these operations.
  • Usage convention lists some of the conventional ways in which the data structures presented in this post are used.

Pythonic

  • Iterating through a data structure explains the use of these data structures as iterables.
  • Building sequence, one item at a time shows how to use list comprehension and generator expression.

Preliminary

Introduction

The following code shows how to create a list, a tuple, a dictionary, and a set.

list_of_numbers = [1, 2, 3]
print list_of_numbers
print type(list_of_numbers)
[1, 2, 3]
<type 'list'>
tuple_of_numbers = (1, 2, 3)
print tuple_of_numbers
print type(tuple_of_numbers)
(1, 2, 3)
<type 'tuple'>
set_of_numbers = {1, 2, 3}
print set_of_numbers
print type(set_of_numbers)
set([1, 2, 3])
<type 'set'>
my_dictionary = {'a':1, 'b':2, 'c':3}
print my_dictionary
print type(my_dictionary)
{'a': 1, 'c': 3, 'b': 2}
<type 'dict'>

Recap

  • A list’s content is enclosed in brackets, [].
  • A tuple’s content is enclosed in parentheses, ().
  • A set’s content in enclosed in braces, {}. But when printed out, it is presented as set([<set's content>]). Note how this differs from the printed forms of list and tuple.
  • A dictionary’s content is also enclosed in braces, but the elements of a dictionary are key:value pairs.

What can be stored?

list_of_letters = ['a', 'b', 'c']
print list_of_letters
mixed_list = ['a', 1, 1.5, False]
print mixed_list
['a', 'b', 'c']
['a', 1, 1.5, False]
tuple_of_letters = ('a', 'b', 'c')
print tuple_of_letters
mixed_tuple = ('a', 1, 1.5, False)
print mixed_tuple
('a', 'b', 'c')
('a', 1, 1.5, False)
set_of_letters = {'a', 'b', 'c'}
print set_of_letters
mixed_set = {'a', 1, 1.5, False}
print mixed_set
set(['a', 'c', 'b'])
set([False, 1.5, 'a', 1])
mixed_dictionary = {1:'a', 'b':2, 1.5:0}
print mixed_dictionary
{1.5: 0, 1: 'a', 'b': 2}

Recap

  • List, tuple, and set can store data of one type, or of many types. Above, we have seen them storing just numbers, just letters, and a mix of numbers and letters.
  • Dictionary keys can be of many data types, and likewise their values. Above, we saw mixed_dictionary has int, str, float keys, and str and int values.
  • The data stored in list, tuple, set, and dictionary don’t have to be just numbers and letters, or built-in data types. You can store data types that you create yourself (called user-defined data type). More on this later.

Fundamental operations

The four fundamental operations can be categorized into modifying operations and non-modifying operations. Insertion, update, and deletion are modifying operations - they change a data structure. Retrieval is non-modifying - it doesn’t change anything.

Mutability

A data structure is mutable if it can be modified after its creation. Otherwise, it is immutable.

List and dictionary are mutable. Set and tuple are immutable.

Because insertion, update, and deletion are modifying operations, they can be used only on list and dictionary, but not on set and tuple. In other words, set and tuple can’t be changed after their creation.

Retrieving data from a data structure does not modify the data structure itself, so you can retrieve from all of list, tuple, set, and dictionary.

Retrieval is presented first, because (1) it can be used on all four data structures, and (2) it involves indexing operations, which is also used in insertion, update, and deletion.

Indexing operations and data retrieval

list and tuple are ordered data structures. This means there exists an order of the elements in the data structures. Let’s look at this through a couple of examples

print list_of_numbers
print list_of_numbers[0]
print list_of_numbers[1]
print list_of_numbers[2]
[1, 2, 3]
1
2
3
print tuple_of_numbers
print tuple_of_numbers[0]
print tuple_of_numbers[1]
print tuple_of_numbers[2]
(1, 2, 3)
1
2
3

Note from the above examples that the number inside the brackets specifies the position of the data element. This number is called an index, and it starts from 0. The 0-index specifies the position of the first element.

set and dict are unordered data structures. The elements stored in them are like ice cubes in a bucket - there is no order to them. Hence, there is no such thing as element’s indexed position, so we can’t index a set or a dict.

You cannot retrieve data from a set directly. One of the ways to do so is to convert the set to a list, then use the list indexing operation as shown above.

Retrieving data from a dict is usually accomplished by using its keys.

The following demonstrates how to retrieve data from a set and a dict.

print list(set_of_numbers)[0]
print list(set_of_numbers)[1]
print list(set_of_numbers)[2]
print '\n'
print my_dictionary['a']
print my_dictionary['b']
print my_dictionary['c']
1
2
3


1
2
3

Recap

  • list and tuple are ordered, set and dict are unordered. list and dict are mutable, tuple and set are immutable.
  • In an ordered data structure, the order of the elements is the order in which they were listed from left to right at the time of creation.
  • To retrieve a single element from a list or a tuple, specify the element’s position inside brackets. The position is called an index, and the index begins from 0, not 1. The 0-index specifies the position of the first element.
  • Retrieving data from a set can be accomplished by converting the set to a list.
  • Retrieving data from a dictionary is accomplished specifying the key corresponding to the desired value, in brackets.

We now move on to the modifying operations.

Insertion, update, and deletion on list

# append to the the end of the list
print 'current list_of_numbers: {}'.format(list_of_numbers)
list_of_numbers.append(100)
print 'after appending 100 to the end: {}'.format(list_of_numbers)

# insert to a specific position
list_of_numbers.insert(0, 10)
print 'after inserting 10 at index 0: {}'.format(list_of_numbers)

# deletion
del list_of_numbers[0]
print 'after delete the element at index 0: {}'.format(list_of_numbers)

# update element at a specific position
list_of_numbers[0] = 50
print 'after updating element at index 0 to 50: {}'.format(list_of_numbers)
current list_of_numbers: [1, 2, 3]
after appending 100 to the end: [1, 2, 3, 100]
after inserting 10 at index 0: [10, 1, 2, 3, 100]
after delete the element at index 0: [1, 2, 3, 100]
after updating element at index 0 to 50: [50, 2, 3, 100]

Recap

  • mylist.append(element) appends element to the right end of mylist.
  • mylist.insert(position index, element) inserts element into mylist at position index.
  • del mylist[position index] deletes the element at position index from mylist.
  • mylist[position index] = value update the element at position index in mylist to value.

Insertion, update, and deletion on dict

# Insert a new key:value entry
print 'current my_dictionary:\n\t {}'.format(my_dictionary)
my_dictionary['d'] = 4
print 'after inserting the pair d:4 into my_dictionary:\n\t {}'.format(my_dictionary)

# Update the value of a key
my_dictionary['d'] = 5
print "after changing the value corresponding to the key 'd' to 5:\n\t {}".format(my_dictionary)

# Delete a key:value entry
del my_dictionary['d']
print "after deleting d:5 from my_dictionary:\n\t {}".format(my_dictionary)
current my_dictionary:
	 {'a': 1, 'c': 3, 'b': 2}
after inserting the pair d:4 into my_dictionary:
	 {'a': 1, 'c': 3, 'b': 2, 'd': 4}
after changing the value corresponding to the key 'd' to 5:
	 {'a': 1, 'c': 3, 'b': 2, 'd': 5}
after deleting d:5 from my_dictionary:
	 {'a': 1, 'c': 3, 'b': 2}

Recap

  • Insertion, deletion and update on a dict all use the syntax my_dictionary[key].
  • Insertion and update syntaxes look alike. If key already exists, the value on the right hand side replaces the current value of the key. Otherwise, a new key:value pair is inserted in my_dictionary.
  • Deletion on dict uses del like deletion on list.

Usage conventions

The following conventions should be followed: - Use list for homogeneous data (data of the same type), tuple for heterogenous data (data of different types), set for unique data items, and dict when there is a natural correspondence that can be expressed as key:value pairs. - For example, list can be used to store a collection of prices, so you can take their average later on, tuple for a person’s personal information, such as name (string), age (int), height (double). set for a collection of unique students’ names in a class, and dict for a gradebook with key being students’ names and values being their grades.

Pythonic

Iterating through a data structure

It is often useful to iterate through a data structure, i.e. get the stored items one at a time. For example, given a dict gradebook, you may want to get only the students’ names whose grades correspond to letter A. This requires going through the name:grade pairs in the dict one at a time and check whether the grade passes the threshold for an A.

The following code demonstrates iterating through a list, a tuple, a set, and a dict. Note the similarities and differences in the syntax.

list_of_numbers = [1, 2, 3, 4]
print 'iterating through a list'
for number in list_of_numbers:
    print number

tuple_of_letters = ('a', 'b', 'c', 'd')
print '\niterating through a tuple'
for letter in tuple_of_letters:
    print letter

set_of_letters = {'a', 'b', 'c', 'd'}
print '\niterating through a set'
for letter in set_of_letters:
    print letter

grade_book = {'John':76, 'Kate':89, 'Peter':95}
print '\niterating through a dictionary'
for name, grade in grade_book.iteritems():
    print "{}'s grade is {}".format(name, grade)
iterating through a list
1
2
3
4

iterating through a tuple
a
b
c
d

iterating through a set
a
c
b
d

iterating through a dictionary
John's grade is 76
Kate's grade is 89
Peter's grade is 95

Recap

  • The basic syntax for iterating through list, tuple, and set is python for item in data_structure: <do something with the item>

  • The syntax for iterating through dict is almost the same, except dict must be converted to an iterable collection by calling iteritems() on the dict.
  • The order in which the items are fetched are the same order in which they are listed in a list or a tuple. Since set and dict are unordered data structures, this order can change depending on the internal storage of the data items (more on this later).

Building a sequence, one item at a time

list comprehension (or listcomp) and generator expression (genexp) are Pythonic ways of building sequences while iterating through another sequence. The most crucial differences between listcomp and genexp are: - listcomp is used only to build a list, whereas genexp can be used to build other types of sequences, such as tuple. - listcomp creates a complete list, while genexp creates an object that “yields” the data items one at a time - this saves memory.

# multiplies all elements in a list of number by a constant
a = [1, 2, 3]
b = [num * 2 for num in a]
print a
print b
print type(b)
[1, 2, 3]
[2, 4, 6]
<type 'list'>

I’ll digress to address a subtle point here. You may ask: why can I do the above with a * 2. Let’s see what happens if we do that.

a * 2
[1, 2, 3, 1, 2, 3]

The * operator creates a new list containing two copies of the original list. In fact, it does not make sense to write a * 2 and expect the a’s elements multiplied by 2. This behavior is completely consistent once you recall that list can contain heterogeneous data. Multiply by a constant a list containing both numbers and letters wouldn’t make sense, would it?

# build a tuple containing elements in a, each multiplied by 2
a = (1, 2, 3)
b = (num * 2 for num in a)
print a
print type(a)
print b
print type(b)
(1, 2, 3)
<type 'tuple'>
<generator object <genexpr> at 0x10463a820>
<type 'generator'>

Above we see that a is a tuple, and b is a generator object - b does not store data but simply is a means to iterate through the sequence created by code in parentheses on line 3. To get the data from b, call next(b).

print next(b)
print next(b)
print next(b)
print next(b)
2
4
6



---------------------------------------------------------------------------

StopIteration                             Traceback (most recent call last)

<ipython-input-18-17570dfc8ec9> in <module>()
      2 print next(b)
      3 print next(b)
----> 4 print next(b)


StopIteration:

Once the last item (number 6) has been yielded, b is “empty”. Calling next(b) therefore raises a StopIteration Exception. This behavior is consistent with the Iterator protocol. I will explain the Iterator protocol in a future post.

To actually build a tuple, we need to put the generator expression inside tuple(), as follows:

a = (1, 2, 3)
b = tuple(num * 2 for num in a)
print a
print type(a)
print b
print type(b)
(1, 2, 3)
<type 'tuple'>
(2, 4, 6)
<type 'tuple'>