Iteration simply refers to the process of accessing — one by one — the items stored in some container. The order of the items, and whether or not the iteration is comprehensive, depends on the container.
In Python, we typically perform iteration using the for
loop.
# e.g., iterating over a list
l = [2**x for x in range(10)]
for n in l:
print(n)
# e.g., iterating over the key-value pairs in a dictionary
d = {x:2**x for x in range(10)}
for k,v in d.items():
print(k, '=>', v)
iter
, and next
¶We can iterate over anything that is iterable. Intuitively, if something can be used as the source of items in a for
loop, it is iterable.
But how does a for
loop really work? (Review time!)
l = [2**x for x in range(10)]
for x in l:
print(x)
it = l.__iter__()
type(it)
An iterable's __iter__
method returns an iterator. This is a separate object (from the container itself) that can be used to iterate over the container's contents.
it.__next__() # re-evaluate this cell with ctrl-enter
l.__next__() # note: lists are not iterators!
An iterator's __next__
method returns the "next" item from the associated container. It raises a StopIteration exception when there are no more items.
# putting it all together
it = l.__iter__()
while True:
try:
print(it.__next__())
except StopIteration:
break
Instead of calling __iter__
and __next__
directly, we would typically use the global iter
and next
functions.
it = iter(l)
while True:
try:
print(next(it))
except StopIteration:
break
Of course, we typically would just write:
for x in l:
print(x)
Note we can use multiple iterators concurrently — each one keeps track of its own progress through the underlying container.
it1 = iter(l)
it2 = iter(l)
it3 = iter(l)
next(it2)
next(it3)
next(it3)
while True:
try:
print(next(it1), next(it2), next(it3))
except StopIteration:
break
An iterators behavior is not always well defined if the container is modified during the process of iteration:
d = {'a': 'apples', 'b': 'bananas', 'c': 'cats'}
it = iter(d)
print(next(it))
d['d'] = 'dogs'
print(next(it))
d = {'a': 'apples', 'b': 'bananas', 'c': 'cats'}
for k in d:
print(k)
d['d'] = 'dogs'
Another detail: iterators are themselves also iterables.
l = [2**x for x in range(10)]
it = iter(l)
for x in it:
print(x)
... which means that iterator objects must also implement the __iter__
method (in addition to __next__
)
class MyIterator:
def __init__(self, max):
self.max = max
self.curr = 0
# the following methods are required for iterator objects
def __next__(self):
if self.curr < self.max:
self.curr += 1
return self.curr
else:
raise StopIteration
def __iter__(self):
return self
it = MyIterator(10)
next(it)
it = MyIterator(10)
while True:
try:
print(next(it))
except StopIteration:
break
it = MyIterator(10)
for i in it:
print(i)
For a container type, we need to implement an __iter__
method that returns an iterator.
class ArrayList:
def __init__(self):
self.data = []
def append(self, val):
self.data.append(None)
self.data[len(self.data)-1] = val
def __iter__(self):
class ArrayListIterator:
def __init__(self, lst):
self.idx = 0
self.lst = lst
def __iter__(self):
return self
def __next__(self):
if self.idx < len(self.lst.data):
self.idx += 1
return self.lst.data[self.idx-1]
else:
raise StopIteration()
return ArrayListIterator(self)
l = ArrayList()
for x in range(10):
l.append(2**x)
it = iter(l)
type(it)
next(it)
for x in l:
print(x)
yield
¶What's a "generator"?
g = (2**x for x in range(10)) # NOT a list comprehension!
type(g)
len(g)
# try to re-evaluate this cell
for x in g:
print(x)
A generator object is another type of iterator. We can use list comprehension like syntax to create a generator, but what we get is NOT a list!
The generator object doesn't actually get populated with all the values the comprehension indicates (as a list would). Rather, it lazily computes the values as we need them (when its __next__
method is called).
%timeit -n 10000 [x for x in range(10**3)]
%timeit -n 10000 (x for x in range(10**10))
If a function takes an iterable, we can pass it a generator. This can be much more memory-efficient (and possibly more time-efficient) than passing the function a list with the same items (as would be returned by the generator).
sum([x for x in range(10**3)])
sum(x for x in range(10**3))
%timeit 1024 in [2**x for x in range(100)]
%timeit 1024 in (2**x for x in range(100))
A generator object can also be created by calling a generator function. A generator function is any function that contains the yield
keyword.
def foo():
yield
foo()
g = foo()
next(g)
def foo():
yield 1
g = foo()
next(g)
def foo():
print('foo was called')
yield 1
g = foo()
next(g)
The body of a generator function is not actually run to yield a value when it is first called! It is only run for the first time when next
is invoked on the returned generator object.
def foo():
print('L1')
yield 1
print('L2')
yield 2
print('L3')
yield 3
print('L4')
g = foo()
next(g)
Each next
call on a generator object will run the body of the generator function up to the next yield
statement (if any). The value specified in the yield
statement will be the return value of next
. If there is no remaining yield
statement, the function will run until it exits and a StopIteration
exception will be raised.
... which is just what we need to support iteration with a for
loop!
for x in foo():
print(x)
def foo():
for x in range(10):
yield 2**x
for x in foo():
print(x)
For our previous container type:
class ArrayList:
def __init__(self):
self.data = []
def append(self, val):
self.data.append(None)
self.data[len(self.data)-1] = val
def __iter__(self):
for i in range(len(self.data)):
yield self.data[i]
l = ArrayList()
for x in range(10):
l.append(2**x)
for x in l:
print(x)