Implementing Iteration

Agenda

  1. Review: Iteration
  2. Details: iterables, iterators, iter, and next
  3. Implementing iterators with classes
  4. Implementing iterators with generators and yield

1. Review: Iteration

Iteration simply refers to the process of accessing — one by one — the items stored in some container. The order of the items, and whether or not the iteration is comprehensive, depends on the container.

In Python, we typically perform iteration using the for loop.

In [1]:
# e.g., iterating over a list
l = [2**x for x in range(10)]
for n in l:
    print(n)
1
2
4
8
16
32
64
128
256
512
In [2]:
# e.g., iterating over the key-value pairs in a dictionary
d = {x:2**x for x in range(10)}
for k,v in d.items():
    print(k, '=>', v)
0 => 1
1 => 2
2 => 4
3 => 8
4 => 16
5 => 32
6 => 64
7 => 128
8 => 256
9 => 512

2. Details: iterables, iterators, iter, and next

We can iterate over anything that is iterable. Intuitively, if something can be used as the source of items in a for loop, it is iterable.

But how does a for loop really work? (Review time!)

In [3]:
l = [2**x for x in range(10)]
In [4]:
for x in l:
    print(x)
1
2
4
8
16
32
64
128
256
512
In [5]:
it = l.__iter__()
In [6]:
type(it)
Out[6]:
list_iterator

An iterable's __iter__ method returns an iterator. This is a separate object (from the container itself) that can be used to iterate over the container's contents.

In [7]:
it.__next__() # re-evaluate this cell with ctrl-enter
Out[7]:
1
In [8]:
l.__next__() # note: lists are not iterators!
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-8-01f476f2762c> in <module>()
----> 1 l.__next__() # note: lists are not iterators!

AttributeError: 'list' object has no attribute '__next__'

An iterator's __next__ method returns the "next" item from the associated container. It raises a StopIteration exception when there are no more items.

In [9]:
# putting it all together
it = l.__iter__()
while True:
    try:
        print(it.__next__())
    except StopIteration:
        break
1
2
4
8
16
32
64
128
256
512

Instead of calling __iter__ and __next__ directly, we would typically use the global iter and next functions.

In [10]:
it = iter(l)
while True:
    try:
        print(next(it))
    except StopIteration:
        break
1
2
4
8
16
32
64
128
256
512

Of course, we typically would just write:

In [11]:
for x in l:
    print(x)
1
2
4
8
16
32
64
128
256
512

Note we can use multiple iterators concurrently — each one keeps track of its own progress through the underlying container.

In [12]:
it1 = iter(l)
it2 = iter(l)
it3 = iter(l)
next(it2)
next(it3)
next(it3)
while True:
    try:
        print(next(it1), next(it2), next(it3))
    except StopIteration:
        break
1 2 4
2 4 8
4 8 16
8 16 32
16 32 64
32 64 128
64 128 256
128 256 512

An iterators behavior is not always well defined if the container is modified during the process of iteration:

In [13]:
d = {'a': 'apples', 'b': 'bananas', 'c': 'cats'}
it = iter(d)
print(next(it))
d['d'] = 'dogs'
print(next(it))
a
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-13-2b685c971ad4> in <module>()
      3 print(next(it))
      4 d['d'] = 'dogs'
----> 5 print(next(it))

RuntimeError: dictionary changed size during iteration
In [14]:
d = {'a': 'apples', 'b': 'bananas', 'c': 'cats'}
for k in d:
    print(k)
    d['d'] = 'dogs'
a
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-14-bd2fad690fd7> in <module>()
      1 d = {'a': 'apples', 'b': 'bananas', 'c': 'cats'}
----> 2 for k in d:
      3     print(k)
      4     d['d'] = 'dogs'

RuntimeError: dictionary changed size during iteration

Another detail: iterators are themselves also iterables.

In [15]:
l = [2**x for x in range(10)]
it = iter(l)
for x in it:
    print(x)
1
2
4
8
16
32
64
128
256
512

... which means that iterator objects must also implement the __iter__ method (in addition to __next__)

3. Implementing iterators with classes

In [20]:
class MyIterator:
    def __init__(self, max):
        self.max = max
        self.curr = 0
        
    # the following methods are required for iterator objects
    
    def __next__(self):
        if self.curr < self.max:
            self.curr += 1
            return self.curr
        else:
            raise StopIteration
    
    def __iter__(self):
        return self
In [21]:
it = MyIterator(10)
In [26]:
next(it)
Out[26]:
5
In [27]:
it = MyIterator(10)
while True:
    try:
        print(next(it))
    except StopIteration:
        break
1
2
3
4
5
6
7
8
9
10
In [28]:
it = MyIterator(10)
for i in it:
    print(i)
1
2
3
4
5
6
7
8
9
10

For a container type, we need to implement an __iter__ method that returns an iterator.

In [29]:
class ArrayList:
    def __init__(self):
        self.data = []
        
    def append(self, val):
        self.data.append(None)
        self.data[len(self.data)-1] = val
        
    def __iter__(self):
        class ArrayListIterator:
            def __init__(self, lst):
                self.idx = 0
                self.lst = lst
            
            def __iter__(self):
                return self
            
            def __next__(self):
                if self.idx < len(self.lst.data):
                    self.idx += 1
                    return self.lst.data[self.idx-1]
                else:
                    raise StopIteration()
        return ArrayListIterator(self)
In [30]:
l = ArrayList()
for x in range(10):
    l.append(2**x)
In [31]:
it = iter(l)
In [32]:
type(it)
Out[32]:
__main__.ArrayList.__iter__.<locals>.ArrayListIterator
In [38]:
next(it)
Out[38]:
32
In [39]:
for x in l:
    print(x)
1
2
4
8
16
32
64
128
256
512

4. Implementing iterators with generators and yield

What's a "generator"?

In [40]:
g = (2**x for x in range(10)) # NOT a list comprehension!
In [41]:
type(g)
Out[41]:
generator
In [42]:
len(g)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-42-baf0f4fee012> in <module>()
----> 1 len(g)

TypeError: object of type 'generator' has no len()
In [43]:
# try to re-evaluate this cell
for x in g:
    print(x)
1
2
4
8
16
32
64
128
256
512

A generator object is another type of iterator. We can use list comprehension like syntax to create a generator, but what we get is NOT a list!

The generator object doesn't actually get populated with all the values the comprehension indicates (as a list would). Rather, it lazily computes the values as we need them (when its __next__ method is called).

In [44]:
%timeit -n 10000 [x for x in range(10**3)]
10000 loops, best of 3: 38.1 µs per loop
In [45]:
%timeit -n 10000 (x for x in range(10**10))
10000 loops, best of 3: 1.01 µs per loop

If a function takes an iterable, we can pass it a generator. This can be much more memory-efficient (and possibly more time-efficient) than passing the function a list with the same items (as would be returned by the generator).

In [46]:
sum([x for x in range(10**3)])
Out[46]:
499500
In [47]:
sum(x for x in range(10**3))
Out[47]:
499500
In [48]:
%timeit 1024 in [2**x for x in range(100)]
10000 loops, best of 3: 47.6 µs per loop
In [49]:
%timeit 1024 in (2**x for x in range(100))
100000 loops, best of 3: 5.23 µs per loop

A generator object can also be created by calling a generator function. A generator function is any function that contains the yield keyword.

In [50]:
def foo():
    yield
In [51]:
foo()
Out[51]:
<generator object foo at 0x10ee40620>
In [52]:
g = foo()
In [53]:
next(g)
In [54]:
def foo():
    yield 1
In [55]:
g = foo()
In [56]:
next(g)
Out[56]:
1
In [57]:
def foo():
    print('foo was called')
    yield 1
In [58]:
g = foo()
In [59]:
next(g)
foo was called
Out[59]:
1

The body of a generator function is not actually run to yield a value when it is first called! It is only run for the first time when next is invoked on the returned generator object.

In [60]:
def foo():
    print('L1')
    yield 1
    print('L2')
    yield 2
    print('L3')
    yield 3
    print('L4')
In [61]:
g = foo()
In [62]:
next(g)
L1
Out[62]:
1

Each next call on a generator object will run the body of the generator function up to the next yield statement (if any). The value specified in the yield statement will be the return value of next. If there is no remaining yield statement, the function will run until it exits and a StopIteration exception will be raised.

... which is just what we need to support iteration with a for loop!

In [63]:
for x in foo():
    print(x)
L1
1
L2
2
L3
3
L4
In [64]:
def foo():
    for x in range(10):
        yield 2**x
In [65]:
for x in foo():
    print(x)
1
2
4
8
16
32
64
128
256
512

For our previous container type:

In [66]:
class ArrayList:
    def __init__(self):
        self.data = []
        
    def append(self, val):
        self.data.append(None)
        self.data[len(self.data)-1] = val
        
    def __iter__(self):
        for i in range(len(self.data)):
            yield self.data[i]
In [67]:
l = ArrayList()
for x in range(10):
    l.append(2**x)
In [68]:
for x in l:
    print(x)
1
2
4
8
16
32
64
128
256
512