I was writing some code in Python when I suddenly became curious regarding blank line conventions for import statements.
I'm aware of the typical import statement conventions specified in the PEP 8 style guide and for blank lines as well. However, I became curious if there is a convention or unwritten rule for blank lines among import statements.
For example, I usually like to put a blank line in between the three categories that are specified in PEP 8 (i.e. standard library imports, related third party imports, local application/library specific imports) but I've also noticed that many people tend not to do so. My PyLint application even throws a warning whenever I put a blank line.
I personally felt that this added a bit of clarity as to what "category" each imported library falls into. Is there a sort of convention that I should be following?
Thanks in advance.
instead of blank line use a comment line in between the imports specifying what categories they fall into...
it brings more clarity and no warnings or errors will be raised
Yes. The convention is to separate the sections. http://github.com/timothycrosley/isort can help.
The sections might look like this.
from __future__ import absolute_import
import os
import sys
from third_party import (lib1, lib2, lib3, lib4, lib5, lib6, lib7, lib8,
lib9, lib10, lib11, lib12, lib13, lib14, lib15)
from my_lib import Object, Object2, Object3
Alternatively, another popular, but not universal, convention is to only import modules, not classes or functions, as suggested in the Google Python Style Guide.
from __future__ import absolute_import
import os
import sys
import third_party.module1
import third_party.module2
import my_lib
Related
In a python project I would like to globber imports into a single file called common_imports.py in order to reduce number of import statements in python files.
Instead of writing
file1.py
import foo
import bar
import baz
[...]
file2.py
import foo
import bar
import baz
[...]
I would like to write
file1.py
from common_imports import *
[...]
file2.py
from common_imports import *
[...]
common_imports.py
import foo
import bar
import baz
However, this gives me a lot of pylint false positives.
I can disable pylint warnings in the common_imports.py file by adding a pylint disable comment. I can disable wildcard imports. Unfortunately, I can disable unused imports only globally but not specific for all imports from common_imports.py.
Somebody has an idea howto get pylint on the track?
Summarising my comments above into a proper answer:
TL;DR:
While the reusable code motive is commendable, it's not fit for purpose here. Listen to the linter, and save your hard-earned respect among your colleagues. :-)
Pythonic Viewpoint:
Don't
Why? Python convention, in all its organisational glory and documented structure, states that if you use a library in a module, import it in the module. Plain and simple.
Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.
-- PEP8 - Imports
At a lower level, the sys.modules dict, which tracks imports, will only import a library if it hasn’t been imported already. So from an efficiency point of view, there is no gain.
Maintainer's Viewpoint:
Don't
Why? If (when) the code is changed / optimised in a module, thus alleviating the need for a specific import … "remind me where I look to find where that library is imported? Oh ya, here. But this other module needs that import, but not this new library I’m using to optimise this code. Where should I import that? Ugh!!!"
You've lost the hard-earned respect of following maintainers.
I'm currently re-factoring a project (formerly big one file) into several seperate python files, each of which runs a specific part of my application.
Eg, GUIthread.py runs the GUI, Computethread.py does some maths, etc etc.
Each thread includes the use of functions from imported modules like math, time, numpy, etc etc.
I already have a file globalClasses.py containing class definitions for my datatypes etc, which each .py file imports at the start, as per recomendation here: http://effbot.org/pyfaq/how-do-i-share-global-variables-across-modules.htm . This is working well.
What I would like to do is have all my 3rdparty module imports in the globals file as well, so that I can write, for example, import math once but have all of my project files able to use math functions.
Questions:
1. Is this possible?
2. Is it a good idea/is it good Python practice?
My current solution is just to put
import math
import time
import numpy
...
(plus imports for all the other modules I'm using as well)
at the top of every file in my project... But that doesn't seem very tidy, and it's easy to forget to move a dependency's import statement when moving code-chunks from file to file...
Yeah I guess there is a more elegant way of doing this which will save redundant line of code. Suppose you want to import some modules math, time, numpy(say), then you can create a file importing_modules(say) and import the various modules as from module_name import *, So the importing_modules.py may look something like this:
importing_modules.py
from math import *
from numpy import *
from time import *
main.py
from importing_modules import *
#Now you can call the methods of that module directly
print sqrt(25) #Now we can call sqrt() directly in place of math.sqrt() or importing_modules.math.sqrt().
The other answer shows how what you want is (sort of) possible, but didn't address your second question about good practice.
Using import * is almost invariably considered bad practice. See "Why is import * bad?" and "Importing * from a package" from the docs.
Remember from PEP 20 that explicit is better than implicit. With explicit, specific imports (e.g. from math import sqrt) in every module, there is never confusion about from where a name came, your module's namespace includes only what it needs, and bugs are prevented.
The downside of having to write a couple import statements per module does not outweigh the potential problems introduced by trying to get around writing them.
When I'm programming in Python and I need to import multiple modules, I usually do I like this:
import random, time, matplotlib, cheese, doge
Then when I read over other people's code, this is what I see:
import random
import time
import matplotlib
import cheese
import doge
Why is this? Is there any difference between the two styles?
The practice of one import per line is standardized in PEP8, and following a common standard is reason enough to do as others do. Following a common standard follows the Principle of Least Astonishment, making it easier for people familiar with the standard to read and modify your code.
Even if you don't care about PEP8, though, one import per line makes your code more maintainable.
Imports are easier to skim/read:
It's easier to see that you are getting a fred in import fred than in import barney, betty, wilma, fred, bambam, pebbles
Imports are easier to locate:
Searching for "import fred" will find import fred and import fred, wilma, pebbles, but will not find import barney, fred
Imports are easier to edit:
Inserting and removing an entire line is fast in most editors.
There is only one module per line, so you don't have to search in the line to find the thing you wish to edit - it's at the end.
Relocating an import inside a module is just moving a whole line.
Copying one of several imports to another Python module is a copy-paste of a line,
rather than that copy-paste followed by trimming off the other imports you don't want.
Imports are easier to maintain:
Each changed module has its own line in the change-set - you don't have to read a line to figure out which module or modules changed.
Missing and added modules effect the line count on the file and in the change-set.
Typos are easier to pick out and correct on visual skim of the change-set.
One import per line would be a good idea even if it weren't the standard. Since it is the standard, it's doubly the best way to go.
As per PEP-8 (The Style Guide for Python Code)
Imports should usually be on separate lines, for e.g
Yes: import os
import sys
No: import sys, os
It's okay to say this though:
from subprocess import Popen, PIPE
To answer your question - both would work fine, but one is not conformant with the PEP8 guidelines.
I don't like to follow blindly without valid reason. As PEP20: Zen of Python states that "Readability Counts"
PEP8 "single line per import" works for general perspective. Although I respect his (i.e. Guido) opinion, I wouldn't always strictly follow this conventions all the time.
The exception for this rule is only when the # of code is smaller than the # of module import. e.g. 2 lines of code, but 4 module import.
This is more readable: (in my opinion)
import os, sys, math, time
def add_special():
return time.time() + math.floor(math.pow(sys.api_version + os.getpid(), 2))
instead of this
import os
import sys
import math
import time
def add_special():
return time.time() + math.floor(math.pow(sys.api_version + os.getpid(), 2))
But this readability matter differs for each individuals.
PEP-8, the official Python style guide, mandates that one package or module should be imported per line.
It is considered good style, and generally standardization makes programs easy to read. I don't think there are substantial differences under the hood to worry about, if that's what you're asking.
Those two examples are functionally equivalent. However, PEP 8, the official style-guide for Python, has a section here that condemns the practice of placing multiple imports on one line:
Imports should usually be on separate lines, e.g.:
Yes: import os
import sys
No: import sys, os
It's okay to say this though:
from subprocess import Popen, PIPE
Thus, many Python programmers place only one import per line in order to follow this guideline.
According to PEP 8:
Imports should be grouped in the following order:
standard library imports
related third party imports
local application/library specific imports
You should put a blank line between each group of imports.
But it does not mention about __future__ imports. Should __future__ imports be grouped together with standard library imports or separated from standard library imports.
So, which is more preferred:
from __future__ import absolute_import
import sys
import os.path
from .submod import xyz
or:
from __future__ import absolute_import
import sys
import os.path
from .submod import xyz
I personally separate them. A __future__ import isn't just binding a name like other imports, it changes the meaning of the language. With things like from __future__ import division the module will likely run fine both with and without the import, but give different (wrong) results at places that have nothing telling me to go look at names imported if I want to know more about where they come from. __future__ imports should stand out as much as possible.
Also, I generally sort imports within a group alphabetically (no particularly good reason for doing that; I just find it has some very small benefits to diffs and merging branches), and __future__ imports have to be first, so I put them in their own group.
I am new to Python as I want to expand skills that I learned using R.
In R I tend to load a bunch of libraries, sometimes resulting in function name conflicts.
What is best practice in Python. I have seen some specific variations that I do not see a difference between
import pandas, from pandas import *, and from pandas import DataFrame
What are the differences between the first two and should I just import what I need.
Also, what would be the worst consequences for someone making small programs to process data and compute simple statistics.
UPDATE
I found this excellent guide. It explains everything.
Disadvantage of each form
When reading other people's code (and those people use very
different importing styles), I noticed the following problems with
each of the styles:
import modulewithaverylongname will clutter the code further down
with the long module name (e.g. concurrent.futures or django.contrib.auth.backends) and decrease readability in those places.
from module import * gives me no chance to see syntactically that,
for instance, classA and classB come from the same module and
have a lot to do with each other.
It makes reading the code hard.
(That names from such an import
may shadow names from an earlier import is the least part of that problem.)
from module import classA, classB, functionC, constantD, functionE
overloads my short-term memory with too many names
that I mentally need to assign to module in order to
coherently understand the code.
import modulewithaverylongname as mwvln is sometimes insufficiently
mnemonic to me.
A suitable compromise
Based on the above observations, I have developed the following
style in my own code:
import module is the preferred style if the module name is short
as for example most of the packages in the standard library.
It is also the preferred style if I need to use names from the module in
only two or three places in my own module;
clarity trumps brevity then ("Readability counts").
import longername as ln is the preferred style in almost every
other case.
For instance, I might import django.contrib.auth.backends as djcab.
By definition of criterion 1 above, the abbreviation will be used
frequently and is therefore sufficiently easy to memorize.
Only these two styles are fully pythonic as per the
"Explicit is better than implicit." rule.
from module import xx still occurs sometimes in my code.
I use it in cases where even the as format appears exaggerated,
the most famous example being from datetime import datetime
(but if I need more elements, I will import datetime as dt).
import pandas imports the pandas module under the pandas namespace, so you would need to call objects within pandas using pandas.foo.
from pandas import * imports all objects from the pandas module into your current namespace, so you would call objects within pandas using only foo. Keep in mind this could have unexepcted consequences if there are any naming conflicts between your current namespace and the pandas namespace.
from pandas import DataFrame is the same as above, but only imports DataFrame (instead of everything) into your current namespace.
In my opinion the first is generally best practice, as it keeps the different modules nicely compartmentalized in your code.
Here are some recommendations from PEP8 Style Guide.
Imports should usually be on separate lines, e.g.:
Yes: import os
import sys
No: import sys, os
but it is okay to
from subprocess import Popen, PIPE
Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.
Imports should be grouped in the following order:
standard library imports
related third party imports
local application/library specific imports
You should put a blank line between each group of imports.
Absolute imports are recommended
They are more readable and make debugging easier by giving better error messages in case you mess up import system.
import mypkg.sibling
from mypkg import sibling
from mypkg.sibling import example
or explicit relative imports
from . import sibling
from .sibling import example
Implicit relative imports should never be used and is removed in Python 3.
No: from ..grand_parent_package import uncle_package
Wildcard imports ( from <module> import * ) should be avoided, as they make it unclear which names are present in the namespace, confusing both readers and many automated tools.
Some recommendations about lazy imports from python speed performance tips.
Import Statement Overhead
import statements can be executed just about anywhere. It's often useful to place them inside functions to restrict their visibility and/or reduce initial startup time. Although Python's interpreter is optimized to not import the same module multiple times, repeatedly executing an import statement can seriously affect performance in some circumstances.
the given below is a scenario explained at the page,
>>> def doit1():
... import string
... string.lower('Python')
...
>>> import string
>>> def doit2():
... string.lower('Python')
...
>>> import timeit
>>> t = timeit.Timer(setup='from __main__ import doit1', stmt='doit1()')
>>> t.timeit()
11.479144930839539
>>> t = timeit.Timer(setup='from __main__ import doit2', stmt='doit2()')
>>> t.timeit()
4.6661689281463623
In general it is better to do explicit imports.
As in:
import pandas
frame = pandas.DataFrame()
Or:
from pandas import DataFrame
frame = DataFrame()
Another option in Python, when you have conflicting names, is import x as y:
from pandas import DataFrame as PDataFrame
from bears import DataFrame as BDataFrame
frame1 = PDataFrame()
frame2 = BDataFrame()
from A import B
essentially equals following three statements
import A
B = A.B
del A
That's it, that is it all.
They are all suitable in different contexts (which is why they are all available). There's no deep guiding principle, other than generic motherhood statements around clarity, maintainability and simplicity. Some examples from my own code:
import sys, os, re, itertools avoids name collisions and provides a very succinct way to import a bunch of standard modules.
from math import * lets me write sin(x) instead of math.sin(x) in math-heavy code. This gets a bit dicey when I also import numpy, which doubles up on some of these, but it doesn't overly concern me, since they are generally the same functions anyway. Also, I tend to follow the numpy documentation — import numpy as np — which sidesteps the issue entirely.
I favour from PIL import Image, ImageDraw just because that's the way the PIL documentation presents its examples.