Importing is not only a matter of using external libraries, it also allows you to keep your code clean and organized. In this tutorial, we are going to discuss from the very basics of importing to complex topics such as lazy loading of modules in packages.
Introduction to importing
In Python it is important to distinguish between modules and packages in order to have a clear communication. Modules are, in essence, files with a .py extension. They can define variables, functions, classes, and they can also run code. Packages are collections of modules in a hierarchical structure, which in the end means organizing the module files in folders.
If we have a file called module.py, we can simply use the following syntax to import it:
We can also use modules that are bundled with Python, such as sys
:
In this case, sys
is a module that belongs to the Python Standard Library
. It provides functions to interact with
the interpreter itself. For example, we can use it to find out if any arguments where passed while executing a script.
We can create a file, test_argv.py, with the following code:
And we run it to see its output:
Python comes bundled with plenty of libraries for different tasks. You can find them all here.
Importing the entire sys
module may not be what we want since we are only using one of its elements. In this case,
we can also be selective with the importing procedure while keeping the same output:
Using the full import or just a selection of what we need depends to a great extent on personal preferences, and on how different packages where designed.
Importing *
In the examples above, we have either imported one entire module or just one element from sys
. To import more
elements from the same module, we can specify them on the same line:
We can import as many things as we need from the same module. To avoid lines becoming too long and hard to read, it is possible to stack the items vertically. For example:
Note that in this case we must use a set of (
and )
to make a clear list of imports. It is also possible to import
all the elements from a module by using a *
in the statement, for example:
However, this is a highly discouraged practice. When we import things without control, some functions may get overwritten without even realizing. The code becomes harder to read and understand. Let's see it with the following example:
Most programmers will be familiar with the sleep
function from the time
module, which halts the execution of a
program for a given number of seconds. If we run the script, however, we will notice that there is no delay between the
lines 'Here'
and 'After'
. This can be puzzling at first, and for larger projects can indeed become daunting. In this
case, both time
and asyncio
define a function sleep
which behaves in very different ways. If in such a simple and
compact example the risks of the *
imports become evident, it is easy to understand why almost all developers avoid
using the *
in their programs.
The case of time
and asyncio
is special, because both of them belong to the Python Standard Library and are very
well documented. However, not all programs are as organized and clean. Therefore, it becomes harder and harder to
remember all the modules and functions defined in packages. Moreover, some names are so handy (like sleep
), that it
is easy to find them defined in different packages and for many purposes.
The script above becomes much clearer with the following syntax:
There are no doubts of what is going on, and where problems may be arising, even if we haven't used the asyncio library before.
Importing As
Sometimes we face the situation in which we want to import specific functions from different modules but their names are
the same. For example, both time
and asyncio
define
sleep
and we are interested in using them. To avoid a name clash when importing, Python allows us to change the name
of what we are importing by doing the following:
In this way, we can use either sleep
from asyncio
or from time
avoiding name clashes. Of course this
flexibility must be taken seriously because it can also generate unreadable code. The following code would become very
hard to udnerstand for anybody reading it:
The use of the import as
is not only practical, in some cases it is the de-facto way of working. For
example, numpy
, pandas
, matplotlib
are almost always imported in the same way:
The three lines above are ubiquitous in many scientific programs. They are so common that code editors such as Pycharm
can suggest you to import numpy if they see a line that includes something like np.
. Changing the name of the import
can be useful not only to prevent name clashes, but also to shorten the notation. Instead of doing:
We simply have:
Different packages have different shortcuts. For example PyQtGraph
is normally shortened as pg
, and for sure
different fields use different abbreviations. Importing Numpy as np
or Pandas as pd
is not mandatory. However,
since it is what the community does, it will make the code much more readable.
The use of this notation is so widespread that, for example, even in StackOverflow numpy is used as
np
without even showing the import statement.
Importing your code
So far, we have seen how to import packages and modules developed by other people. Importing, however, is a great tool to structure different parts of the code. It makes it easier to maintan and collaborate. Therefore, sooner or later we are going to find ourselves importing our code. We can start simple and slowly build the complexity.
We can create a file first.py, with following code:
In a file called second.py, we can add the following code:
And we run it:
That is as easy as it gets. We define a function in a file, but we use it in another file. Once we have many files, it becomes handier to start creating some structure to organize the program. We can create a folder called package_a, and we add a new file, called third.py. The folder structure is like this:
In third we create a new function, appropriately called third_function
:
The examples are very basic, but they already start to show some patterns and caveats in the importing procedures. If we want to use the new function from the second.py, we need to import it:
If we run the code, we'll get the following output:
Pay attention to the notation we used to import the third_function
. We specified the folder, in this case, package_a
and then we referred to the file with a dot: .
. This is the way in which the hyerarchy of folders and files is
transformed into packages and modules in Python. In this case, replacing the folder separators by a .
we end up
having package_a.third
, and we stripped the .py
extension.
The use of the __init__ file
Most likely our code will not be isolated from other projects. When we install packages, they will have dependencies, and very quickly we lose track of what is actually installed. We can understand where the problem arises wtih a very simple example. We can assume that we have numpy already installed but we are not aware of that. If we create a new folder, called numpy, with a file called sleep.py, the folder structure will end up looking like this:
And in the file sleep.py, we can add the following function:
In the same way we did before, we can update second.py to use the new function sleep
:
Of course the code above will raise many alarms, but the best is to run it to see what happens:
The biggest challenge in this case is that the exception is utterly hard to understand. It is telling us that Python
tried to look for a module called sleep
in the numpy package. If we open the folder numpy we find the module
sleep. Therefore, there must be something else going on. In this example it is clear that Python is looking for the
module sleep within the official numpy package and not in our folder.
The quick solution to this problem is to create an empty file called __init__.py in our numpy folder:
We can run the code without problems this time:
It is important to understand what is going on and not just through quick solutions to see whether they work to voercome the immediate problems. The quid is to know how Python looks for packages on the computer. The topic is complex, and Python allows a great deal of customization. The official documentation shines some light into the matter once we have experience.
In short, Python will first look at whether we are trying to import, and check if it belongs to the standard library. If
the folder was time
instead of numpy
, the behavior would have been different. Adding the __init__.py file
wouldn't make a difference. Once Python knows the module is not in the standard library, it will check for external
modules. First, it starts searching the current directory. Then, it moves to the directories where packages are
installed, for example, where numpy ends up after doing pip install numpy
.
This raises a very interesting question: why did our code fail in the first attempt and it started working only after adding the __init__.py file. In order for Python to consider that a folder is a package, it must contain an * __init__.py* file. This is by design, exactly to prevent unintended name clashes unless we explicitly want them.
If Python does not find the package within the local or default installation directories, it moves to look into the
folders. That is why package_a
works even if we never defined the __init__.py file.
Bear in mind that once Python finds the package, it won't keep searching. Once it finds numpy in our local folder, it won't look for another numpy elsewhere. Therefore, we can't mix modules from different packages with the same name.
The PATH list of directories
A useful thing to do is to check the directories Python uses to look for modules and packages. We can see it by running the following code:
That will list something between 4 and 6 different folders. Most of them are quite logical: where Python is installed, the virtual environment folders, etc.
Adding Folder to the Path
An easy way of extending the capabilities of Python is to add folders to the list where it looks for packages. The first
option is to do it at runtime. We can easily append a directory to the
variable sys.path
. To add the current directory to the list, we can do the following:
We can add any directory, not only the current one. In this approach, we modify the system path only while the program runs. If we run two different programs, each will have its own path.
Another option is to modify a variable in the operating system itself. This has the advantage that it can be made permament and all programs will share the same information. For our application, we have to modify the PYTHONPATH environment variable. Environment variables are available on every operating system, how to set and modify them varies.
On Linux or Mac, the command to set these variables is export
. We can do the following:
The first line appends the folder /home/user
to the variable PYTHONPATH
. Note that we have used :
as a
directory separator.
On Windows, we need to right-click on "Computer", select "Properties". In the "Advanced System Settings" there is
the option "Environment variables". If PYTHONPATH
exists, we can modify it, if it does not exist, we can create it by
clicking on "New". Bear in mind that on Windows, you have to use ;
to separate directories, since :
is part of the
folder path (e.g.: C:\Users\Test\...
).
We can check whether the modifications to the system environment variables worked by running the same code:
Adding information to the Python Path is a great way of developing a structure on your own computer, with code in different folders, etc. However, it is also important to note that it also makes harder to maintain. The environmental variables in one computer are not the same in another, Python may be loading legacy code from an abscure place on the computer. On the other hand, environmental variables are very useful in contexts like a web server, where the definitions can be loaded before running a program.
As a quick side-note, it is worth mentioning that Python allows to read environment variables at runtime:
Note that on Windows, the changes to environment variables are permanent, but on Linux and Mac we need to follow extra steps if we want them to stay.
PYTHONPATH and Virtual Environments
When we work with virtual environments, we can modify environment variables when we activate or deactivate them. This works seamlessly on Linux and Mac, but Windows users may require some tinkering to adapt the examples below.
If we inspect the activate script (located in the folder venv/bin), we can get inspiration about what is done with
the PATH
variable, for example. The first step is to store the old variable, before modifying it, then we append
whatever we want. When we deactivate the virtual environment, we set the old variable back.
Virtual environments have three hooks to achieve this behavior. Next to the activate script, we can also see three files called postactivate, postdeactivate and predeactivate. We can modify postactivate, which should be empty, and add the following:
Next time we activate the virtual environment, we will have the directory /home/user
added to the PYTHONPATH
. It
is a good practice to go back to the original version of the python path once we deactivate tne environment. We can do
it directly in the predeactivate file:
We set the variable to the status it had before activating, and we remove the extra variable we created. Note that in
case we don't deactivate the environment, but simply close the terminal, the changes to the PYTHONPATH
won't be saved.
The predeactivate script is important if you switch from one environment to another and keep using the same terminal.
PYTHONPATH and PyCharm
Users of PyCharm, and probably most other IDE's around will be similar, can change the environment variables directly from within the program. If we open the Run menu, and select Edit Configurations we will be presented with the following menu:
<:image:PyCharm_config.png>
In between the options we can see "Add content roots to PYTHONPATH". This is what makes the imports work out of the box when we are in Pycharm but if we run the same code directly from the terminal may give you some issues. We can also edit the environment variables if we click on the small icon to the right of where it says "environment variables".
Keeping an eye on the environment variables can avoid problems in the long run. Especially if, for example, two developers share the computer. Although strange in many settings, lab computers are normally shared between people, and the software can be edited by multiple users. Perhaps one sets environment variables pointing to specific paths which are not what the second person is expecting.
Absolute Imports
In the examples of the previous sections, we imported a function downstream in the file system. This means, that the function was inside a folder next to the main script file. However, we should also study what happens if we want to import from a sibling package. Imagine we have the following situation:
We have a start file in the top-level directory and two packages, pkg_a and pkg_b. Each one has its own * __init__ file. The question is how can we have access to the contents of mod_a from withing mod_b. From the start* file, the import procedure is easy:
We can create some dummy code in order to have a concrete example. First, in the file mod_a, we can create a function:
Which, from the start file we can use as follows:
If we want to use the same function within the mod_b, the first thing we can try is to simply copy the same line. Thus, in mod_b we can try:
To make it complete, we can trigger it directly from withing the start file:
If we run it, we will get the output we were expecting:
However, and this is very big, HOWEVER, sometimes we don't want to runstart. Instead, we want to run directly * mod_b*. If we try to run it, the following happens:
And here we start to realize the headaches that the importing in Python can generate as soon as the program gets a bit
more sophisticated. In the end, the error was expected. When we run python mod\_b.py
, Python will try to
find pkg\_a
in the same folder, and not one level up. When we trigger start
there is no problem, because from
that directory both pkg\_a
and pkg\_b
are visible.
The same problem will appear if we trigger python from any other location in the computer:
What we did in the examples above is called absolute imports. It means that we specify the full path to the module
we want to import. What we have to remember is that the folder from which you trigger Python is the first place where
the program looks for modules. Then it goes to the paths stored in sys.path
. If we want the code to work, we need to
be sure that Python knows where pkg_a and pkg_b are stored.
The proper way would be to include the folder in the PYTHONPATH environment variable, as we explained earlier. A dirtier way would be to append the folder at runtime, we can add the following lines to mod_b.py:
This is very similar to what we have done earlier. It is important to highlight that the definition of BASE_PATH
is
the full path to the folder one level above where the current file (mod_b.py) is. Note also that we need to append
to the sys.path
before we try to import pkg_a
, or it will fail in the same way it did before.
If we think for a second about this approach, we can quickly notice that it has several drawbacks. The most obvious one
is that we should add those lines to every single file you are working with. Another problem is that we are adding a
folder that contains many packages, which can give some collisions. Imagine we are a theoretical physicists working on
string theory and we develop a module called string
. The code would look like this:
And it will give us problems because string
belongs to Python's standard library.
Therefore, it is always better to develop projects in their own folder, even if that forces a bit of a name repetition. In the very simple case we are dealing with here, the structure would be like this:
In the folder tree above we have a base folder code. Inside there are two packages, a and b. Although the name
of the folders repeat (we have twice pkg_a, twice pkg_b, for example), there are several advantages to working
in this way. The most important one is the granularity. We can add code/pkg_a
or code/pkg_b
to
the PYTHONPATH
. Having control is always better than getting blanket results.
The most important thing to remember is that in Python, absolute is relative. While importing, we are not specifying a path in the file system, but rather an import path. Therefore, the imports are always relative to the PYTHONPATH, even if called absolute.
Relative Imports
Another option for importing modules is to define the relative path. We can continue building on the example from the previous section. Imagine We have a folder structure like this:
Let's assume that each mod_X.py
defines a function called function_X
(where X is the letter of the file). The
function simply prints the name of the function. It should be clear that if we want to import function_c
from file_c
, the start.py
file should look like:
The situation becomes more interesting when we want to import function_a
in mod_b
. It is important to pay
attention because there are two different pkg_a
defined in our program. If we add
the following to mod_b
:
It would work, regardless of how we run the script:
But this is not what we wanted! We want function_a
from mod_a
. If we, however, add the following to mod_b
:
We would get the following error:
In this case is where relative imports become very handy. From mod_b, the module we want to import is one folder up.
To indicate that, we can use the ..
notation in Python:
Generally speaking, the first .
means in this directory, while the second means going one
level up, etc. However, if we run the file, there will be problems. If we run the file, we get the following error:
It doesn't matter if we change folders, if we move one level up, we will get the same problem:
At some point, this becomes nerve-wracking. It doesn't matter if we add folders to the PATH, create __init__.py files, etc. It all boils down to the fact that we are not treating our files as a module when we run it. To instruct Python to run the file as part of a package, we would do:
Bear in mind that the only way of running the code like this is if python knows where to find the folder code
. And
this brings us back to the discussion of the PYTHONPATH variables. If we are in the folder that contains code
and run
Python from there, we won't see any problems. If we, however, are in any other folder, Python will follow to usual rules
to try to understand where code
is.
There is one more important detail to discuss with relative imports. We can imagine that mod_c has the following code:
Since mod_c is deeper in the tree, we can try to run it in different ways:
However, the second option is going to fail. mod_c is importing mod_b which in turn is importing mod_a. Therefore, Python needs to be able to go all the way to the root folder code. Therefore, when we plan our code, we should be mindful not only on how to write it, but on how the program is meant to be used.
The last detail to cover is that we can't mix relative and absolute imports. For example, the following won't work:
We will get the following error:
When we decide to run your code as a module (using the -m
), then all the imports relative. One way
of solving the problem would be to change the following:
In this way it becomes clear that we want are importing from pkg_a
which is in the same folder as mod_b.
Mixing Absolute and Relative
It is possible mixing relative and absolute imports without any secrets to it. We can change mod_b.py like this:
Mixing relative and absolute is definitely a possibility. The question, as almost always, is why would we do it. The fact that we can does not mean we should.
Absolute or Relative: Conclusions
Deciding whether we want to use absolute imports or relative imports is basically up to the taste of the developer or the rules established by the group. If we are developing a package that has a lot of sub-packages and modules with several layers of nesting, using absolute imports can make the code clearer. For example, this is how the same import would look like in the two different cases:
For some people the first line is much clearer, there are no doubts about what are we importing. But it can get tiresome to type the entire path all the time. However, it is important to consider that typing less is not the only factor at play here.
If we are planning on allowing some files to run directly we should be mindful about the requirements for the relative imports. If we have many modules with similar names, sometimes the explicit path makes the code much clearer. It is really up to the developer to have enough sensitivity to decide whether the absolute import or the relative import makes the code clearer and the execution easier.
The example code for this article can be found on Github