Iteration simply refers to the process of accessing — one by one — the items stored in some container. The order of the items, and whether or not the iteration is comprehensive, depends on the container.
In Python, we typically perform iteration using the for loop.
# e.g., iterating over a list
l = [2**x for x in range(10)]
for n in l:
print(n)
# e.g., iterating over the key-value pairs in a dictionary
d = {x:2**x for x in range(10)}
for k,v in d.items():
print(k, '=>', v)
iter, and next¶We can iterate over anything that is iterable. Intuitively, if something can be used as the source of items in a for loop, it is iterable.
But how does a for loop really work? (Review time!)
l = [2**x for x in range(10)]
for x in l:
print(x)
it = l.__iter__()
type(it)
An iterable's __iter__ method returns an iterator. This is a separate object (from the container itself) that can be used to iterate over the container's contents.
it.__next__() # re-evaluate this cell with ctrl-enter
l.__next__() # note: lists are not iterators!
An iterator's __next__ method returns the "next" item from the associated container. It raises a StopIteration exception when there are no more items.
# putting it all together
it = l.__iter__()
while True:
try:
print(it.__next__())
except StopIteration:
break
Instead of calling __iter__ and __next__ directly, we would typically use the global iter and next functions.
it = iter(l)
while True:
try:
print(next(it))
except StopIteration:
break
Of course, we typically would just write:
for x in l:
print(x)
Note we can use multiple iterators concurrently — each one keeps track of its own progress through the underlying container.
it1 = iter(l)
it2 = iter(l)
it3 = iter(l)
next(it2)
next(it3)
next(it3)
while True:
try:
print(next(it1), next(it2), next(it3))
except StopIteration:
break
An iterators behavior is not always well defined if the container is modified during the process of iteration:
d = {'a': 'apples', 'b': 'bananas', 'c': 'cats'}
it = iter(d)
print(next(it))
d['d'] = 'dogs'
print(next(it))
d = {'a': 'apples', 'b': 'bananas', 'c': 'cats'}
for k in d:
print(k)
d['d'] = 'dogs'
Another detail: iterators are themselves also iterables.
l = [2**x for x in range(10)]
it = iter(l)
for x in it:
print(x)
... which means that iterator objects must also implement the __iter__ method (in addition to __next__)
class MyIterator:
def __init__(self, max):
self.max = max
self.curr = 0
# the following methods are required for iterator objects
def __next__(self):
if self.curr < self.max:
self.curr += 1
return self.curr
else:
raise StopIteration
def __iter__(self):
return self
it = MyIterator(10)
next(it)
it = MyIterator(10)
while True:
try:
print(next(it))
except StopIteration:
break
it = MyIterator(10)
for i in it:
print(i)
For a container type, we need to implement an __iter__ method that returns an iterator.
class ArrayList:
def __init__(self):
self.data = []
def append(self, val):
self.data.append(None)
self.data[len(self.data)-1] = val
def __iter__(self):
class ArrayListIterator:
def __init__(self, lst):
self.idx = 0
self.lst = lst
def __iter__(self):
return self
def __next__(self):
if self.idx < len(self.lst.data):
self.idx += 1
return self.lst.data[self.idx-1]
else:
raise StopIteration()
return ArrayListIterator(self)
l = ArrayList()
for x in range(10):
l.append(2**x)
it = iter(l)
type(it)
next(it)
for x in l:
print(x)
yield¶What's a "generator"?
g = (2**x for x in range(10)) # NOT a list comprehension!
type(g)
len(g)
# try to re-evaluate this cell
for x in g:
print(x)
A generator object is another type of iterator. We can use list comprehension like syntax to create a generator, but what we get is NOT a list!
The generator object doesn't actually get populated with all the values the comprehension indicates (as a list would). Rather, it lazily computes the values as we need them (when its __next__ method is called).
%timeit -n 10000 [x for x in range(10**3)]
%timeit -n 10000 (x for x in range(10**10))
If a function takes an iterable, we can pass it a generator. This can be much more memory-efficient (and possibly more time-efficient) than passing the function a list with the same items (as would be returned by the generator).
sum([x for x in range(10**3)])
sum(x for x in range(10**3))
%timeit 1024 in [2**x for x in range(100)]
%timeit 1024 in (2**x for x in range(100))
A generator object can also be created by calling a generator function. A generator function is any function that contains the yield keyword.
def foo():
yield
foo()
g = foo()
next(g)
def foo():
yield 1
g = foo()
next(g)
def foo():
print('foo was called')
yield 1
g = foo()
next(g)
The body of a generator function is not actually run to yield a value when it is first called! It is only run for the first time when next is invoked on the returned generator object.
def foo():
print('L1')
yield 1
print('L2')
yield 2
print('L3')
yield 3
print('L4')
g = foo()
next(g)
Each next call on a generator object will run the body of the generator function up to the next yield statement (if any). The value specified in the yield statement will be the return value of next. If there is no remaining yield statement, the function will run until it exits and a StopIteration exception will be raised.
... which is just what we need to support iteration with a for loop!
for x in foo():
print(x)
def foo():
for x in range(10):
yield 2**x
for x in foo():
print(x)
For our previous container type:
class ArrayList:
def __init__(self):
self.data = []
def append(self, val):
self.data.append(None)
self.data[len(self.data)-1] = val
def __iter__(self):
for i in range(len(self.data)):
yield self.data[i]
l = ArrayList()
for x in range(10):
l.append(2**x)
for x in l:
print(x)