Generators, Iterables, and Iterators are some of the most used tools in Python. However, we don't often stop to think about how they work, how we can develop our generators and iterables. Once you learn what you can do with them, it is possible to expand your toolbox and make your code much more efficient and pythonic.
Iterables
One of the very first things you learn about Python is that if you have a list, for example, you can go through all the elements with a convenient syntax:
>>> var = [1, 2, 3]
>>> for element in var:
... print(element)
...
1
2
3
Even if you have worked with Python for long, you may give this syntax for granted, without stopping to think a lot about it. At some point you may find that it also works with tuples or dictionaries:
>>> var = {'a': 1, 'b':2, 'c':3}
>>> for key in var:
... print(key)
...
a
b
c
The same syntax works with strings:
>>> var = 'abc'
>>> for c in var:
... print(c)
...
a
b
c
As you can imagine, an iterable is an object in Python that allows going through its elements one by one. It seems like almost any data type that allows to group information together is iterable. So, the next logical step is to think about whether we can define our own iterable.
The __getitem__ approach
As you may know now, to define our own types, we define classes. We could do something like this:
class Arr:
def __init__(self):
self.a = 0
self.b = 1
self.c = 2
And to use it:
>>> a = Arr()
>>> print(a.a)
0
>>> print(a.b)
1
You can see very, very briefly, how to get started using classes. It is a simple example, in which we store three different values, and we then print two of them. However, if we try the following, we would get an error:
>>> for item in a:
... print(item)
...
Traceback (most recent call last):
File "<input>", line 1, in <module>
TypeError: 'Arr' object is not iterable
Indeed we see that we can't iterate over our object. Python just doesn't know what element comes first, which one later. Therefore, we need to implement it ourselves, and you'll see from the example that it is obvious:
class Arr:
def __init__(self):
self.a = 0
self.b = 1
self.c = 2
def __getitem__(self, item):
if item == 0:
return self.a
elif item == 1:
return self.b
elif item == 2:
return self.c
raise StopIteration
I know it is very convoluted, but let's see how it works:
>>> a = Arr()
>>> print(a[0])
0
>>> print(a[1])
1
>>> for element in a:
... print(element)
...
0
1
2
Now you see that by implementing the __getitem__
method, we can access the attributes of our object like if it would be a list, or a tuple, just by doing a[0]
, a[1]
. When you start threading this precisely, there are many paths you can follow, many questions you can ask yourself. At this stage may be essential to check what duck typing is and how you can work with it.
You can see that we also raise a StopIteration
exception if we go beyond the first three elements. This exception is particular to iterables. In the for loop, we get all the elements we wanted without any problems. However, if we try the following the behavior wouldn't be what we expected:
>>> a[3]
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "<input>", line 14, in __getitem__
StopIteration
If it had been a list, we would expect to get an IndexError
instead of a StopIteration
exception. Let's see another example, that allows us to construct an iterable from data and not as we did with just 3 elements. We can develop an object called Sentence
, which can go word by word within a for
loop, like this:
class Sentence:
def __init__(self, text):
self.words = text.split(' ')
def __getitem__(self, item):
return self.words[item]
It is a simple example in which we take some text and split it in the spaces. This result is stored in the attribute words
. The __getitem__
in this case is just returning the appropriate item from the list of words. We can use it as follows:
>>> s = Sentence('This is some form of text that I want to explore')
>>> for w in s:
print(w)
This
is
some
form
of
text
that
I
want
to
explore
You can very quickly see that the behavior is what we were expecting. Of course, there are some limitations, such as not taking into account punctuation. However, we see that the for
loop stops without problems once the list of words runs out of elements. On the other hand, if we try something like s[100]
, we get the expected IndexError
.
Note
The class Sentence can be greatly improved if, for instance, we implement a __len__
method. However, this goes beyond the scope of this article.
The only requisite to create an iterable object such as the ones we showed above, is to have a __getitem__
method that access elements using a 0-index approach. The first element is s[0], and so forth. If you are relying on a list, such as in the example with Sentence
, then there are no problems. If you are developing something such as our Arr
example, then you need to take care of ensuring the 0 is the first element to be retrieved.
Creating elements on the fly
In the examples above, we could go through the specified elements within a for
-loop. However, the elements we were iterating over were specified at the moment of instantiation of the class. It is not necessary. Let's see how it would look like if we create a class that can generate random numbers for as long as we want:
import random
class RandomNumbers:
def __getitem__(self, item):
return random.randint(0, 10)
And we can use it as follows:
>>> r = RandomNumbers()
>>> for a in r:
>>> print(a)
>>> if a==10:
>>> break
>>>
4
1
5
2
5
7
2
7
9
9
3
10
Pay attention to the fact that we are stopping the loop as soon as the RandomNumber
object generates a 10. If we don't do this, the for loop would be running forever. You may, or example, use a timer to stop after a certain time, or after a certain number of iterations, or you may simply want to run forever. The choice is yours. You can also limit the generation of values in the class itself:
def __getitem__(self, item):
num = random.randint(0, 10)
if num == 10:
raise StopIteration
return num
And then we don't need to break the for
loop, it naturally stops once a 10
is generated within the class.
A situation where you may think about applying this method is, for example, if you are acquiring a signal from a device that generates sequential values. It may be frames acquired from a camera, or analog values read by a simple device one after the other. I will use these techniques in a later tutorial focusing on the interfacing with devices.
Another example of an iterable that generates values only when requested is when we open a file. Python does not load into memory all the contents, but instead, we can iterate over each line. It is how it looks like with one of the articles of this website:
>>> with open('39_use_arduino_with_python.md', 'r') as f:
... for line in f:
... print(line)
...
Using Python to communicate with an Arduino
===========================================
...
This is what allows you to work with files that are much bigger than what it's possible to hold in memory. You can also quickly check it by looking at the sizes:
>>> import sys
>>> with open('39_use_arduino_with_python.md', 'r') as f:
>>> print(sys.getsizeof(f))
>>> size = 0
... for line in f:
... size += sys.getsizeof(line)
... print(size)
...
216
56402
You can see that the size of the f
variable is much smaller than the total size of the contents of the file added up. It is, of course, not an accurate method to know the size of a file. Still, it can get you an idea of the areas in which you can start considering iterables instead of loading a wealth of information to the memory of the computer.
Iterators
In the section above, we have seen that we can create an iterable object just by defining an appropriate __getitem__
method. We have also seen that several objects with which we have probably already interacted, such as lists, or files are iterables. However, something is happening under the hood, which is worth discussing and is called iterators.
The difference between an iterable and an iterator is very subtle. Perhaps it can be thought as the difference between a class and an object. We know we can iterate over a list, or over a custom object. But how is Python dealing with that flow? We can, very quickly, see how it works with a simple list:
>>> var = ['a', 1, 0.1]
>>> it = iter(var)
>>> next(it)
'a'
>>> next(it)
1
>>> next(it)
0.1
>>> next(it)
Traceback (most recent call last):
File "<input>", line 1, in <module>
StopIteration
In the code above, we have constructed an iterator using the built-in function iter
. This function takes an iterable and returns an iterator. The iterator is the object which understands what next
means. The iterator is iterable itself:
>>> it = iter(var)
>>> for element in it:
... print(element)
...
a
1
0.1
If we would like more control over the process, Python gives it to us. An iterator must implement two methods: __next__
and __iter__
. If we want our class to use a specific iterator, then it should also specify an __iter__
method. Let's see what would happen with the Sentence
class. To make it more obvious, we have it iterating through the words but in backward order. First, we define the iterator:
class SentenceIterator:
def __init__(self, words):
self.words = words
self.index = 0
def __next__(self):
try:
word = self.words[self.index]
except IndexError:
raise StopIteration()
self.index -= 1
return word
def __iter__(self):
return self
The code above is relatively clear. You only have to keep in mind that the only way of specifying a next
value is by keeping track of the current index is returned. Now that we have our iterator, we must update the original Sentence
class:
class Sentence:
def __init__(self, text):
self.words = text.split(' ')
def __iter__(self):
return SentenceIterator(self.words)
And now comes the fun part:
>>> text = "This is a text to test if our iterator returns values backward"
>>> s = Sentence(text)
>>> s[0]
Traceback (most recent call last):
File "<input>", line 1, in <module>
TypeError: 'Sentence' object does not support indexing
Indeed, we can't access the elements of our Sentence
object because we haven't specified the appropriate __getitem__
. However, the for
loop will work:
>>> for w in s:
... print(w)
...
This
backward
values
returns
iterator
our
if
test
to
text
a
is
This
You see, the order in which it prints is backward, as expected. Just the first word is mistaken because we start in 0
, and that also corresponds to the last index: -12
. You can find a solution to this issue if you want, it just requires a bit of thinking.
The example above also works with iter
:
>>> it = iter(s)
>>> next(it)
'This'
>>> next(it)
'backwards'
It is important to note that once the iterator is exhausted, there is nothing else we can do with it. At the same time, we can always go through the elements in the Sentence
object within a for
loop.
The only missing part now is adding the ability to access the words by index but in the proper order. If we develop a __getitem__
, what do you expect to happen to the for
loop?
class Sentence:
def __init__(self, text):
self.words = text.split(' ')
def __iter__(self):
return SentenceIterator(self.words)
def __getitem__(self, item):
return self.words[item]
and we use it as follows:
>>> s = Sentence(text)
>>> s[0]
'This'
>>> s[1]
'is'
>>> for w in s:
... print(w)
...
This
backwards
values
returns
iterator
our
if
test
to
text
a
is
This
So now you see, we can access the elements through their index, and we can iterate over them in backward order.
Can Sentence have a __next__ method?
In principle, you can make Sentence
also an iterator, by adding a __next__
method. But this may be a terrible idea. Remember that iterators work until they exhaust themselves, and they maintain an internal index. If you mix Sentence
as both an iterable and an iterator, you will run into problems every time you encounter a double loop.
Note
Splitting the iterator and the iterable is a good idea. You can read a bit more about it in the book Fluent Python, by Luciano Ramalho. Chapter 14 is entirely dedicated to this topic.
For people working with scientific instruments, splitting the iterable and the iterator can have benefits. It is plausible to think scenarios where you want an iterator that behaves in different ways, depending on a parameter. For example, imagine you are reading from a camera, and want to achieve something like:
>>> for frame in camera:
... analyze(frame)
Very simple, you analyze every frame generated by a camera. However, you may be in a situation where an external trigger generates frames. Therefore you want to attach the timestamp to each frame, or it may be that the camera acquires a finite number of frames at a given framerate. In principle, it could be the iterator that takes care of it.
In many cases, however, we don't need to define a separated iterator, we can use generators directly in our __iter__
definition. That is the focus of the next section.
Generators: Using the yield keyword
The last important topic to cover is the role of generators, which, in my opinion, are still under-utilized, but they open new doors. A classic example is generating an infinite list of numbers. We know that it is impossible to store a list of infinite numbers in memory. Still, if we know the spacing between them, we can always know what number comes next. It is the core idea with generators. Let's start with a straightforward idea to understand the mechanics:
def make_numbers():
print('Making a number')
yield 1
print('Making a new number')
yield 2
print('Making the last number')
yield 3
The first thing that should grab your attention in the code above is that we define a function with the keyword def
, but instead of a return
, we use a yield
statement. The prints are there for you to understand the flow. Another interesting aspect is that we have three yield
, while for a normal function, you would have at most one return
. And now, to use it, we can do the following:
>>> a = make_numbers()
>>> print(a)
<generator object make_numbers at 0x7f65a8ea8ed0>
As soon as we invoke the function make_numbers
, we see no print statement. It means that nothing is running yet. Now, we can ask for the next element of the generator:
>>> next(a)
Making a number
1
Now you see the print statement is running, because we get the making a number
string on the screen, and we also get the first number. We can keep going:
>>> next(a)
Making a new number
2
>>> next(a)
Making the last number
3
>>> next(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>>
And there you have it, once we run out of yield
statements, the StopIteration
is raised. It is the same exception we saw earlier when developing our iterables. This exception shows us that we could use a generator in the same way we have used an iterable before. For example, we can use it inside a for-loop:
>>> b = make_numbers()
>>> for i in b:
... print(i)
...
Making a number
1
Making a new number
2
Making the last number
3
We can expand the generators to do a plethora of things. For example, let's see how to generate a flexible number of equally spaced integers:
def make_numbers(start, stop, step):
i = start
while i<=stop:
yield i
i += step
And based on what we have done before, we can do, for example:
>>> for i in make_numbers(1, 20, 2):
... print(i)
1
3
5
7
9
11
13
15
17
19
What is very important to note here is that every number is generated only when it is needed, and hence the name of generators. Let's see what happens with the amount of memory used by our variables:
>>> import sys
>>> z = make_numbers(0, 1000000000, 1)
>>> sys.getsizeof(z)
128
The variable z
goes from 0 to 1 billion in steps of 1. However, if we check the amount of memory used, we get just 128 bytes. It is very remarkable and is possible thanks to the generator syntax.
Generators in classes
One pattern remaining is mixing generators and iterators. We can rewrite the Sentence class to allow us looping through its elements:
class Sentence:
def __init__(self, text):
self.words = text.split(' ')
def __iter__(self):
for word in self.words:
yield word
And we can simply use it as before:
>>> text = "This is a text to test our iterator"
>>> s = Sentence(text)
>>> for w in s:
... print(w)
This
is
a
text
to
test
our
iterator
Notice that we didn't define an explicit iterator such as SentenceIterator
, and we are not defining the __next__
method. However, we can still iterate through our sentence. The example is so basic that maybe we can't see its usefulness. Let's adapt it slightly to be able to iterate through each word of a file. One option would be to read the entire file and split it into words, making a huge list. It can consume too much memory if the file is large enough, and therefore it is not handy. But nothing prevents us from reading line by line:
class WordsFromFile:
def __init__(self, filename):
self.filename = filename
def __iter__(self):
with open(self.filename, 'r') as f:
for line in f:
words = line.split(' ')
for word in words:
yield word
For example, we could use it like this:
>>> words = WordsFromFile('22_Step_by_step_qt.rst.md')
>>> for w in words:
... print(w)
What you can see from the example is that when we instantiate the WordsFromFile
class, we only store the filename, we don't open any file. However, when we iterate through the file, we open it, and instead of reading it all to memory, we iterate through each line. It is something that Python allows us to do with files. Then, we split the words on each line, just as we did before, and we yield
each word on each line.
Another exciting result that arises from using a generator as we just did is that we can nest loops, for example:
>>> for w in words:
... for c in words:
... print(w, c)
It is a ridiculous example, but we would get an output like the following:
considerations very
considerations simple
considerations way.
considerations You
considerations could
considerations find
considerations better
considerations solutions,
considerations of
considerations course,
considerations but
considerations this
If we had defined an index in the class and used it to grab the following element, the nesting would not have worked. You can go ahead and try it by yourself.
Generators, Iterators, Iterables?
It may happen that after reading the entire guide, you are still struggling to understand the difference between generators, iterators, and iterables. Iterables are the easiest to separate because it refers to objects on which is possible to iterate. Without the formality, if given an element of your object, you know how to get the next element, then your object is iterable.
To iterate through objects, however, you create a new type of class, called an iterator. Iterators are the ones responsible for keeping track of the index, and knowing when you reached the end. Generators are, in essence, the same, but they just use the yield
syntax.
There are some subtleties regarding whether iterators can be generators or vice-versa. Still, the distinction is useful when you want to transmit a clear message of what your code is doing. An iterator can be thought of as a class that goes through given elements. For example, a file has lines, and an iterator would go through all those lines. A generator, on the other hand, can generate values under demand. For example, we could generate numbers one after the other while they are being requested.
The main difference would be that an iterator just returns values as they are, while generators can modify values on the fly. Is this a strict distinction? I would claim that it is not, and the gray area is so large, that it doesn't make much sense to stress about the specificities. However, I do like making the distinction if I have to explain my code to someone. I do believe that the words generator and iterator make a big difference when understanding the logic.
For example, if I am acquiring images from a Camera, I would say I am using a generator. The images are not available beforehand, nor I know how they are going to look like. On the other hand, if I am reviewing the frames of data stored on the hard drive, I would try to use the word iterator. The data is already there, and I am not altering it.
Conclusions
Iterators and Generators are two tools that are very handy to keep in your repository of strategies when planning code. They are especially handy when you are dealing with streams of data that may exceed the memory you have available. In the past, we have seen how to create your classes supporting context managers, with generators and iterators now you can support loops and iterations.
Understanding generators is essential not only to be able to add it to your code but also to be able to understand the logic behind other packages. If you quickly search around, Django uses plenty of generators in its code base. Once you can understand generators, it will give you a hint to how the developer was expecting you to use their code.