Importing modules in Python - best practice - python

I am new to Python as I want to expand skills that I learned using R.
In R I tend to load a bunch of libraries, sometimes resulting in function name conflicts.
What is best practice in Python. I have seen some specific variations that I do not see a difference between
import pandas, from pandas import *, and from pandas import DataFrame
What are the differences between the first two and should I just import what I need.
Also, what would be the worst consequences for someone making small programs to process data and compute simple statistics.
UPDATE
I found this excellent guide. It explains everything.

Disadvantage of each form
When reading other people's code (and those people use very
different importing styles), I noticed the following problems with
each of the styles:
import modulewithaverylongname will clutter the code further down
with the long module name (e.g. concurrent.futures or django.contrib.auth.backends) and decrease readability in those places.
from module import * gives me no chance to see syntactically that,
for instance, classA and classB come from the same module and
have a lot to do with each other.
It makes reading the code hard.
(That names from such an import
may shadow names from an earlier import is the least part of that problem.)
from module import classA, classB, functionC, constantD, functionE
overloads my short-term memory with too many names
that I mentally need to assign to module in order to
coherently understand the code.
import modulewithaverylongname as mwvln is sometimes insufficiently
mnemonic to me.
A suitable compromise
Based on the above observations, I have developed the following
style in my own code:
import module is the preferred style if the module name is short
as for example most of the packages in the standard library.
It is also the preferred style if I need to use names from the module in
only two or three places in my own module;
clarity trumps brevity then ("Readability counts").
import longername as ln is the preferred style in almost every
other case.
For instance, I might import django.contrib.auth.backends as djcab.
By definition of criterion 1 above, the abbreviation will be used
frequently and is therefore sufficiently easy to memorize.
Only these two styles are fully pythonic as per the
"Explicit is better than implicit." rule.
from module import xx still occurs sometimes in my code.
I use it in cases where even the as format appears exaggerated,
the most famous example being from datetime import datetime
(but if I need more elements, I will import datetime as dt).

import pandas imports the pandas module under the pandas namespace, so you would need to call objects within pandas using pandas.foo.
from pandas import * imports all objects from the pandas module into your current namespace, so you would call objects within pandas using only foo. Keep in mind this could have unexepcted consequences if there are any naming conflicts between your current namespace and the pandas namespace.
from pandas import DataFrame is the same as above, but only imports DataFrame (instead of everything) into your current namespace.
In my opinion the first is generally best practice, as it keeps the different modules nicely compartmentalized in your code.

Here are some recommendations from PEP8 Style Guide.
Imports should usually be on separate lines, e.g.:
Yes: import os
import sys
No: import sys, os
but it is okay to
from subprocess import Popen, PIPE
Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.
Imports should be grouped in the following order:
standard library imports
related third party imports
local application/library specific imports
You should put a blank line between each group of imports.
Absolute imports are recommended
They are more readable and make debugging easier by giving better error messages in case you mess up import system.
import mypkg.sibling
from mypkg import sibling
from mypkg.sibling import example
or explicit relative imports
from . import sibling
from .sibling import example
Implicit relative imports should never be used and is removed in Python 3.
No: from ..grand_parent_package import uncle_package
Wildcard imports ( from <module> import * ) should be avoided, as they make it unclear which names are present in the namespace, confusing both readers and many automated tools.
Some recommendations about lazy imports from python speed performance tips.
Import Statement Overhead
import statements can be executed just about anywhere. It's often useful to place them inside functions to restrict their visibility and/or reduce initial startup time. Although Python's interpreter is optimized to not import the same module multiple times, repeatedly executing an import statement can seriously affect performance in some circumstances.
the given below is a scenario explained at the page,
>>> def doit1():
... import string
... string.lower('Python')
...
>>> import string
>>> def doit2():
... string.lower('Python')
...
>>> import timeit
>>> t = timeit.Timer(setup='from __main__ import doit1', stmt='doit1()')
>>> t.timeit()
11.479144930839539
>>> t = timeit.Timer(setup='from __main__ import doit2', stmt='doit2()')
>>> t.timeit()
4.6661689281463623

In general it is better to do explicit imports.
As in:
import pandas
frame = pandas.DataFrame()
Or:
from pandas import DataFrame
frame = DataFrame()
Another option in Python, when you have conflicting names, is import x as y:
from pandas import DataFrame as PDataFrame
from bears import DataFrame as BDataFrame
frame1 = PDataFrame()
frame2 = BDataFrame()

from A import B
essentially equals following three statements
import A
B = A.B
del A
That's it, that is it all.

They are all suitable in different contexts (which is why they are all available). There's no deep guiding principle, other than generic motherhood statements around clarity, maintainability and simplicity. Some examples from my own code:
import sys, os, re, itertools avoids name collisions and provides a very succinct way to import a bunch of standard modules.
from math import * lets me write sin(x) instead of math.sin(x) in math-heavy code. This gets a bit dicey when I also import numpy, which doubles up on some of these, but it doesn't overly concern me, since they are generally the same functions anyway. Also, I tend to follow the numpy documentation — import numpy as np — which sidesteps the issue entirely.
I favour from PIL import Image, ImageDraw just because that's the way the PIL documentation presents its examples.

Related

Python: Importing multiple methods from external module

With very common Python modules, I find that importing using the from .. import statement greatly increases the readability of my code, since I can reference methods by name without the dot notation. However, in some modules, the methods I require are nested differently, e.g in os:
from os.path import join
from os import listdir, getcwd
Why doesn't from os import path.join, listdir, getcwd work? What would be a "pythonic" way to import all the methods I need in a more succinct manner?
The opinion on whether from <module> import <identifier> is Pythonic itself is quite split - it hides away the origin of a method so it's not easy to figure out whence a certain variable/function comes from just by perusing the code. On the other hand, it reduces verbosity which some people consider Pythonic even tho it's not specifically mandated. Either way, Pythonic is as elusive term as you're going to get and more often than not it means "the way I think Python code should look like" backed up by several PEPs and obscure mail list posts while conveniently omitting the ones that go against one's notion of Pythonic.
from os import path.join doesn't work because os defines the os.path module (by directly writing to sys.modules of all things), it's not an identifier in the os module itself. path, however, is an identifier in the os module pointing to the os.path module so you can do from os import path or from os.path import join.
Finally, succinct and Pythonic are not synonyms, in fact PEP 8 for example prescribes using multiple lines for multiple imports even tho you can succinctly write import <module1>, <module2>, <module3> .... It says that it's OK to import multiple identifiers like that, tho, but keep in mind that os and os.path are two different modules so based on PEP 8 they shouldn't be on the same line and therefore should be written as:
from os import <identifier_1>, <identifier_2>
from os.path import <identifier_3>, <identifier_4>
Now, I would go as far as claiming that this is Pythonic but it makes the most sense based on PEP 8, at least to me.

Is there any way to speed up an import?

I have a CLI application that requires sympy. The speed of the CLI application matters - it's used a lot in a user feedback loop.
However, simply doing import sympy takes a full second. This gets incredibly annoying in a tight feedback loop. Is there anyway to 'preload' or optimize a module when a script is run again without a change to the module?
Obviously sympy does a lot when being imported. It could be initialization of internal data structures or similar. You could call this a flaw in the design of the sympy library.
Your only choice in this case would be to avoid redoing this initialization.
I assume that you find this behavior annoying because you intend to do it often. I propose to avoid doing it often. A way to achieve this could be to create a server which is started just once, imports sympy upon its startup, and then offers a service (via interprocess communication) which allows you to do whatever you want to do with sympy.
If this could be an option for you, I could elaborate on how to do this.
I took a look at what happens when you run import sympy, and it imports all of sympy.
https://github.com/sympy/sympy/blob/master/sympy/__init__.py
If you are only using certain parts of sympy, then only import those parts that you need.
It would be nice if you could do this:
import sympy.sets
But (as you point out) that imports sympy and then sets.
One solution is to write your own importer. You can do this with the help of the imp module.
import imp
sets = imp.load_module("sets", open("sympy/sets/__init__.py"), "sympy/sets/__init__.py", ('.py', 'U', 1))
But, even that may not optimize enough. Taking a look at sympy/sets/__init__.py I see that it does this:
from .sets import (Set, Interval, Union, EmptySet, FiniteSet, ProductSet,
Intersection, imageset, Complement, SymmetricDifference)
from .fancysets import TransformationSet, ImageSet, Range, ComplexRegion
from .contains import Contains
from .conditionset import ConditionSet
Maybe you can import only the sets module from simpy sets namespace?
import imp
sets = imp.load_module("sets", open("sympy/sets/set.py") "sympy/sets/set.py", ('.py', 'U', 1))
You should test if importing only the modules that you are using in the code improves the loading time.
IE:
from sympy import mod1, mod2, mod3
vs
import sympy
You should read these previous questions:
Python import X or from X import Y? (performance)
improving speed of Python module import
'import module' vs. 'from module import function'

Python import modules in another file

I'm currently re-factoring a project (formerly big one file) into several seperate python files, each of which runs a specific part of my application.
Eg, GUIthread.py runs the GUI, Computethread.py does some maths, etc etc.
Each thread includes the use of functions from imported modules like math, time, numpy, etc etc.
I already have a file globalClasses.py containing class definitions for my datatypes etc, which each .py file imports at the start, as per recomendation here: http://effbot.org/pyfaq/how-do-i-share-global-variables-across-modules.htm . This is working well.
What I would like to do is have all my 3rdparty module imports in the globals file as well, so that I can write, for example, import math once but have all of my project files able to use math functions.
Questions:
1. Is this possible?
2. Is it a good idea/is it good Python practice?
My current solution is just to put
import math
import time
import numpy
...
(plus imports for all the other modules I'm using as well)
at the top of every file in my project... But that doesn't seem very tidy, and it's easy to forget to move a dependency's import statement when moving code-chunks from file to file...
Yeah I guess there is a more elegant way of doing this which will save redundant line of code. Suppose you want to import some modules math, time, numpy(say), then you can create a file importing_modules(say) and import the various modules as from module_name import *, So the importing_modules.py may look something like this:
importing_modules.py
from math import *
from numpy import *
from time import *
main.py
from importing_modules import *
#Now you can call the methods of that module directly
print sqrt(25) #Now we can call sqrt() directly in place of math.sqrt() or importing_modules.math.sqrt().
The other answer shows how what you want is (sort of) possible, but didn't address your second question about good practice.
Using import * is almost invariably considered bad practice. See "Why is import * bad?" and "Importing * from a package" from the docs.
Remember from PEP 20 that explicit is better than implicit. With explicit, specific imports (e.g. from math import sqrt) in every module, there is never confusion about from where a name came, your module's namespace includes only what it needs, and bugs are prevented.
The downside of having to write a couple import statements per module does not outweigh the potential problems introduced by trying to get around writing them.

Is there anything bad about having multiple imports on one line?

When I'm programming in Python and I need to import multiple modules, I usually do I like this:
import random, time, matplotlib, cheese, doge
Then when I read over other people's code, this is what I see:
import random
import time
import matplotlib
import cheese
import doge
Why is this? Is there any difference between the two styles?
The practice of one import per line is standardized in PEP8, and following a common standard is reason enough to do as others do. Following a common standard follows the Principle of Least Astonishment, making it easier for people familiar with the standard to read and modify your code.
Even if you don't care about PEP8, though, one import per line makes your code more maintainable.
Imports are easier to skim/read:
It's easier to see that you are getting a fred in import fred than in import barney, betty, wilma, fred, bambam, pebbles
Imports are easier to locate:
Searching for "import fred" will find import fred and import fred, wilma, pebbles, but will not find import barney, fred
Imports are easier to edit:
Inserting and removing an entire line is fast in most editors.
There is only one module per line, so you don't have to search in the line to find the thing you wish to edit - it's at the end.
Relocating an import inside a module is just moving a whole line.
Copying one of several imports to another Python module is a copy-paste of a line,
rather than that copy-paste followed by trimming off the other imports you don't want.
Imports are easier to maintain:
Each changed module has its own line in the change-set - you don't have to read a line to figure out which module or modules changed.
Missing and added modules effect the line count on the file and in the change-set.
Typos are easier to pick out and correct on visual skim of the change-set.
One import per line would be a good idea even if it weren't the standard. Since it is the standard, it's doubly the best way to go.
As per PEP-8 (The Style Guide for Python Code)
Imports should usually be on separate lines, for e.g
Yes: import os
import sys
No: import sys, os
It's okay to say this though:
from subprocess import Popen, PIPE
To answer your question - both would work fine, but one is not conformant with the PEP8 guidelines.
I don't like to follow blindly without valid reason. As PEP20: Zen of Python states that "Readability Counts"
PEP8 "single line per import" works for general perspective. Although I respect his (i.e. Guido) opinion, I wouldn't always strictly follow this conventions all the time.
The exception for this rule is only when the # of code is smaller than the # of module import. e.g. 2 lines of code, but 4 module import.
This is more readable: (in my opinion)
import os, sys, math, time
def add_special():
return time.time() + math.floor(math.pow(sys.api_version + os.getpid(), 2))
instead of this
import os
import sys
import math
import time
def add_special():
return time.time() + math.floor(math.pow(sys.api_version + os.getpid(), 2))
But this readability matter differs for each individuals.
PEP-8, the official Python style guide, mandates that one package or module should be imported per line.
It is considered good style, and generally standardization makes programs easy to read. I don't think there are substantial differences under the hood to worry about, if that's what you're asking.
Those two examples are functionally equivalent. However, PEP 8, the official style-guide for Python, has a section here that condemns the practice of placing multiple imports on one line:
Imports should usually be on separate lines, e.g.:
Yes: import os
import sys
No: import sys, os
It's okay to say this though:
from subprocess import Popen, PIPE
Thus, many Python programmers place only one import per line in order to follow this guideline.

What are good rules of thumb for Python imports?

I am a little confused by the multitude of ways in which you can import modules in Python.
import X
import X as Y
from A import B
I have been reading up about scoping and namespaces, but I would like some practical advice on what is the best strategy, under which circumstances and why. Should imports happen at a module level or a method/function level? In the __init__.py or in the module code itself?
My question is not really answered by "Python packages - import by class, not file" although it is obviously related.
In production code in our company, we try to follow the following rules.
We place imports at the beginning of the file, right after the main file's docstring, e.g.:
"""
Registry related functionality.
"""
import wx
# ...
Now, if we import a class that is one of few in the imported module, we import the name directly, so that in the code we only have to use the last part, e.g.:
from RegistryController import RegistryController
from ui.windows.lists import ListCtrl, DynamicListCtrl
There are modules, however, that contain dozens of classes, e.g. list of all possible exceptions. Then we import the module itself and reference to it in the code:
from main.core import Exceptions
# ...
raise Exceptions.FileNotFound()
We use the import X as Y as rarely as possible, because it makes searching for usage of a particular module or class difficult. Sometimes, however, you have to use it if you wish to import two classes that have the same name, but exist in different modules, e.g.:
from Queue import Queue
from main.core.MessageQueue import Queue as MessageQueue
As a general rule, we don't do imports inside methods -- they simply make code slower and less readable. Some may find this a good way to easily resolve cyclic imports problem, but a better solution is code reorganization.
Let me just paste a part of conversation on django-dev mailing list started by Guido van Rossum:
[...]
For example, it's part of the Google Python style guides[1] that all
imports must import a module, not a class or function from that
module. There are way more classes and functions than there are
modules, so recalling where a particular thing comes from is much
easier if it is prefixed with a module name. Often multiple modules
happen to define things with the same name -- so a reader of the code
doesn't have to go back to the top of the file to see from which
module a given name is imported.
Source: http://groups.google.com/group/django-developers/browse_thread/thread/78975372cdfb7d1a
1: http://code.google.com/p/soc/wiki/PythonStyleGuide#Module_and_package_imports
I would normally use import X on module level. If you only need a single object from a module, use from X import Y.
Only use import X as Y in case you're otherwise confronted with a name clash.
I only use imports on function level to import stuff I need when the module is used as the main module, like:
def main():
import sys
if len(sys.argv) > 1:
pass
HTH
Someone above said that
from X import A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P
is equivalent to
import X
import X allows direct modifications to A-P, while from X import ... creates copies of A-P. For from X import A..P you do not get updates to variables if they are modified. If you modify them, you only modify your copy, but X does know about your modifications.
If A-P are functions, you won't know the difference.
Others have covered most of the ground here but I just wanted to add one case where I will use import X as Y (temporarily), when I'm trying out a new version of a class or module.
So if we were migrating to a new implementation of a module, but didn't want to cut the code base over all at one time, we might write a xyz_new module and do this in the source files that we had migrated:
import xyz_new as xyz
Then, once we cut over the entire code base, we'd just replace the xyz module with xyz_new and change all of the imports back to
import xyz
DON'T do this:
from X import *
unless you are absolutely sure that you will use each and every thing in that module. And even then, you should probably reconsider using a different approach.
Other than that, it's just a matter of style.
from X import Y
is good and saves you lots of typing. I tend to use that when I'm using something in it fairly frequently But if you're importing a lot from that module, you could end up with an import statement that looks like this:
from X import A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P
You get the idea. That's when imports like
import X
become useful. Either that or if I'm not really using anything in X very frequently.
I generally try to use the regular import modulename, unless the module name is long, or used often..
For example, I would do..
from BeautifulSoup import BeautifulStoneSoup as BSS
..so I can do soup = BSS(html) instead of BeautifulSoup.BeautifulStoneSoup(html)
Or..
from xmpp import XmppClientBase
..instead of importing the entire of xmpp when I only use the XmppClientBase
Using import x as y is handy if you want to import either very long method names , or to prevent clobbering an existing import/variable/class/method (something you should try to avoid completely, but it's not always possible)
Say I want to run a main() function from another script, but I already have a main() function..
from my_other_module import main as other_module_main
..wouldn't replace my main function with my_other_module's main
Oh, one thing - don't do from x import * - it makes your code very hard to understand, as you cannot easily see where a method came from (from x import *; from y import *; my_func() - where is my_func defined?)
In all cases, you could just do import modulename and then do modulename.subthing1.subthing2.method("test")...
The from x import y as z stuff is purely for convenience - use it whenever it'll make your code easier to read or write!
When you have a well-written library, which is sometimes case in python, you ought just import it and use it as it. Well-written library tends to take life and language of its own, resulting in pleasant-to-read -code, where you rarely reference the library. When a library is well-written, you ought not need renaming or anything else too often.
import gat
node = gat.Node()
child = node.children()
Sometimes it's not possible to write it this way, or then you want to lift down things from library you imported.
from gat import Node, SubNode
node = Node()
child = SubNode(node)
Sometimes you do this for lot of things, if your import string overflows 80 columns, It's good idea to do this:
from gat import (
Node, SubNode, TopNode, SuperNode, CoolNode,
PowerNode, UpNode
)
The best strategy is to keep all of these imports on the top of the file. Preferrably ordered alphabetically, import -statements first, then from import -statements.
Now I tell you why this is the best convention.
Python could perfectly have had an automatic import, which'd look from the main imports for the value when it can't be found from global namespace. But this is not a good idea. I explain shortly why. Aside it being more complicated to implement than simple import, programmers wouldn't be so much thinking about the depedencies and finding out from where you imported things ought be done some other way than just looking into imports.
Need to find out depedencies is one reason why people hate "from ... import *". Some bad examples where you need to do this exist though, for example opengl -wrappings.
So the import definitions are actually valuable as defining the depedencies of the program. It is the way how you should exploit them. From them you can quickly just check where some weird function is imported from.
The import X as Y is useful if you have different implementations of the same module/class.
With some nested try..import..except ImportError..imports you can hide the implementation from your code. See lxml etree import example:
try:
from lxml import etree
print("running with lxml.etree")
except ImportError:
try:
# Python 2.5
import xml.etree.cElementTree as etree
print("running with cElementTree on Python 2.5+")
except ImportError:
try:
# Python 2.5
import xml.etree.ElementTree as etree
print("running with ElementTree on Python 2.5+")
except ImportError:
try:
# normal cElementTree install
import cElementTree as etree
print("running with cElementTree")
except ImportError:
try:
# normal ElementTree install
import elementtree.ElementTree as etree
print("running with ElementTree")
except ImportError:
print("Failed to import ElementTree from any known place")
I'm with Jason in the fact of not using
from X import *
But in my case (i'm not an expert programmer, so my code does not meet the coding style too well) I usually do in my programs a file with all the constants like program version, authors, error messages and all that stuff, so the file are just definitions, then I make the import
from const import *
That saves me a lot of time. But it's the only file that has that import, and it's because all inside that file are just variable declarations.
Doing that kind of import in a file with classes and definitions might be useful, but when you have to read that code you spend lots of time locating functions and classes.

Categories