Conventions for 'import ... as' - python

Typically, one uses import numpy as np to import the module numpy.
Are there general conventions for naming?
What about other modules, in particular from scientific computing like scipy, sympy and pylab or submodules like scipy.sparse.

SciPy recommends import scipy as sp in its documentation, though personally I find that rather useless since it only gives you access to re-exported NumPy functionality, not anything that SciPy adds to that. I find myself doing import scipy.sparse as sp much more often, but then I use that module heavily. Also
import matplotlib as mpl
import matplotlib.pyplot as plt
import networkx as nx
You might encounter more of these as you start using more libraries. There's no registry or anything for these shorthands and you're free to invent new ones as you see fit. There's also no general convention except that import lln as library_with_a_long_name obviously won't occur very often.
Aside from these shorthands, there's a habit among Python 2.x programmers to do things like
# Try to import the C implementation of StringIO; if that doesn't work
# (e.g. in IronPython or Jython), import the pure Python version.
# Make sure the imported module is called StringIO locally.
try:
import cStringIO as StringIO
except ImportError:
import StringIO
Python 3.x is putting an end to this, though, because it no longer offers partial C implementations of StringIO, pickle, etc.

Related

Python - Importing packages by running a script

I have a script which is importing lots of packages, including import numpy as np.
I have lots of scripts which need to import all of these packages (including some of my own). To make my life easier, I have a file called mysetup.py in my path to import all the packages. It includes the statement in a function called "import numpy as np".
I run "main.py". It runs the following
from mysetup import *
import_my_stuff()
np.pi()
"mysetup.py"
def import_my_stuff():
import numpy as np
return
However, I am unable to use numpy in "main.py" - this code will fail. Any suggestions as to why?
The problem you are facing is a consequence of a very important features of Python: namespaces.
https://docs.python.org/3/tutorial/classes.html#python-scopes-and-namespaces
https://realpython.com/python-namespaces-scope/
Basically, in your case, when you do that (numpy) import inside the (import_my_stuff) function, you are defining the code object numpy/np inside the function namespace. (scope, if you prefer).
To solve your issue (the way you are doing; not the only way), you should simply import everything at the module top level (without a function encapsulating the imports):
mysetup.py:
import numpy as np
# other modules...
main.py:
from mysetup import *
np.pi()
Imports in functions are not the best idea.
But you can just define whatever imports you need in top level code of mysetup.py
import numpy as np
and then it will be available when you import * from mysetup
from mysetup import *
print(np.pi)

how to only import module if necessary and only once

I have a class which can be plotted using matplotlib, but it can (and will) also be used without plotting it.
I would like to only import matplotlib if necessary, ie. if the plot method is called on an instance of the class, but at the same time I would like to only import matplotlib once if at all.
Currently what I do is:
class Cheese:
def plot(self):
from matplotlib import pyplot as plt
# *plot some cheese*
..but I suppose that this may lead to importing multiple times.
I can think of lots of ways to accomplish only importing once, but they are not pretty.
What is a pretty and "pythonic" way of doing this?
I don't mean for this to be "opinion based", so let me clarify what I mean by "pretty":
using the fewest lines of code.
most readable
most efficient
least error-prone
etc.
If a module is already loaded then it won't be loaded again. you will just get a reference to it. If you don't plan to use this class locally and just want to satisfy the typehinter then you can do the following
#imports
#import whatever you need localy
from typing import TYPE_CHECKING
if TYPE_CHECKING: # False at runtime
from matplotlib import pyplot as plt
Optional import in Python:
try:
import something
import_something = True
except ImportError:
import something_else
import_something_else = True
Conditional import in Python:
if condition:
import something
# something library related code
elif condition:
# code without library
import related to one function:
def foo():
import some_library_to_use_only_inside_foo
TLDR; Python does so for you already, for free.
Python import machinery imports module only once, even if it was imported multiple times. Even from different files (docs).
The most pythonic way to import something is to do so at the beginning of file. Unless you have special needs, like import different modules depending on some condition, eg. platform (windows, linux).

What's the purpose of the file "pylab.py"

I looked at the file "pylab.py" at matplotlab's directory and found that it contains a great bunch of imports, and then defines a single variable "bytes" at the last line. Here is the last several lines of this file:
from numpy.fft import *
from numpy.raenter code herendom import *
from numpy.linalg import *
import numpy as np
import numpy.ma as ma
# don't let numpy's datetime hide stdlib
import datetime
# This is needed, or bytes will be numpy.random.bytes from
# "from numpy.random import *" above
bytes = six.moves.builtins.bytes
I wonder what's the purpose of such a file when it only defines a seemingly useless variable. As a result, what's the purpose of writing code like from matplotlib import pylab?
The matplotlib docs say:
pylab is a convenience module that bulk imports matplotlib.pyplot (for plotting) and numpy (for mathematics and working with arrays) in a single name space. Although many examples use pylab, it is no longer recommended.
So for example, you can do
>>> from pylab import *
And you have imported all the names imported by pylab into your local namespace. This is convenient when using the interactive shell.
Additionally, pylab imports datetime and bytes. This is because the from numpy.foo import * statements import numpy objects named bytes and datetime which are not the same as the standard python objects with these names, so they need to be overridden with the standard versions.
The practice of importing names into a module just so other modules can import them from there instead of the original module is not unusual. For example, given this module:
$ cat foo/__init__.py
from bar import *
from baz.quux import *
from spam import eggs
Other modules can do from foo import eggs rather than from foo.spam import eggs. Apart from the convenience of less typing, this approach hides the internal structure of the foo package from its clients. As long as they import from the top level module they need not be concerned that the internal structure of the package may change over time. This is a form of the facade design pattern.

PEP 8 and deferred import

I am working on a large Python program which makes use of a multitude of modules depending on command-line options, in particular, numpy. We have recently found a need to run this on a small embedded module which precludes the use of numpy. From our perspective, this is easy enough (just don't use the problematic command line options.)
However, following PEP 8, our import numpy is at the beginning of each module that might need it, and the program will crash due to numpy not being installed. The straightforward solution is to move import numpy from the top of the file to the functions that need it. The question is, "How bad is this"?
(An alternative solution is to wrap import numpy in a try .. except. Is this better?)
Here is a best practice pattern to check if a module is installed and make code branch depending on it.
# GOOD
import pkg_resources
try:
pkg_resources.get_distribution('numpy')
except pkg_resources.DistributionNotFound:
HAS_NUMPY = False
else:
HAS_NUMPY = True
# You can also import numpy here unless you want to import it inside the function
Do this in every module imports having soft dependency to numpy. More information in Plone CMS coding conventions.
Another idiom which I've seen is to import the module as None if unavailable:
try:
import numpy as np
except ImportError:
np = None
Or, as in the other answer, you can use the pkg_resources.get_distribution above, rather than try/except (see the blog post linked to from the plone docs).
In that way, before using numpy you can hide numpy's use in an if block:
if np:
# do something with numpy
else:
# do something in vanilla python
The key is to ensure your CI tests have both environments - with and without numpy (and if you are testing coverage this should count both block as covered).

Importing modules in Python - best practice

I am new to Python as I want to expand skills that I learned using R.
In R I tend to load a bunch of libraries, sometimes resulting in function name conflicts.
What is best practice in Python. I have seen some specific variations that I do not see a difference between
import pandas, from pandas import *, and from pandas import DataFrame
What are the differences between the first two and should I just import what I need.
Also, what would be the worst consequences for someone making small programs to process data and compute simple statistics.
UPDATE
I found this excellent guide. It explains everything.
Disadvantage of each form
When reading other people's code (and those people use very
different importing styles), I noticed the following problems with
each of the styles:
import modulewithaverylongname will clutter the code further down
with the long module name (e.g. concurrent.futures or django.contrib.auth.backends) and decrease readability in those places.
from module import * gives me no chance to see syntactically that,
for instance, classA and classB come from the same module and
have a lot to do with each other.
It makes reading the code hard.
(That names from such an import
may shadow names from an earlier import is the least part of that problem.)
from module import classA, classB, functionC, constantD, functionE
overloads my short-term memory with too many names
that I mentally need to assign to module in order to
coherently understand the code.
import modulewithaverylongname as mwvln is sometimes insufficiently
mnemonic to me.
A suitable compromise
Based on the above observations, I have developed the following
style in my own code:
import module is the preferred style if the module name is short
as for example most of the packages in the standard library.
It is also the preferred style if I need to use names from the module in
only two or three places in my own module;
clarity trumps brevity then ("Readability counts").
import longername as ln is the preferred style in almost every
other case.
For instance, I might import django.contrib.auth.backends as djcab.
By definition of criterion 1 above, the abbreviation will be used
frequently and is therefore sufficiently easy to memorize.
Only these two styles are fully pythonic as per the
"Explicit is better than implicit." rule.
from module import xx still occurs sometimes in my code.
I use it in cases where even the as format appears exaggerated,
the most famous example being from datetime import datetime
(but if I need more elements, I will import datetime as dt).
import pandas imports the pandas module under the pandas namespace, so you would need to call objects within pandas using pandas.foo.
from pandas import * imports all objects from the pandas module into your current namespace, so you would call objects within pandas using only foo. Keep in mind this could have unexepcted consequences if there are any naming conflicts between your current namespace and the pandas namespace.
from pandas import DataFrame is the same as above, but only imports DataFrame (instead of everything) into your current namespace.
In my opinion the first is generally best practice, as it keeps the different modules nicely compartmentalized in your code.
Here are some recommendations from PEP8 Style Guide.
Imports should usually be on separate lines, e.g.:
Yes: import os
import sys
No: import sys, os
but it is okay to
from subprocess import Popen, PIPE
Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.
Imports should be grouped in the following order:
standard library imports
related third party imports
local application/library specific imports
You should put a blank line between each group of imports.
Absolute imports are recommended
They are more readable and make debugging easier by giving better error messages in case you mess up import system.
import mypkg.sibling
from mypkg import sibling
from mypkg.sibling import example
or explicit relative imports
from . import sibling
from .sibling import example
Implicit relative imports should never be used and is removed in Python 3.
No: from ..grand_parent_package import uncle_package
Wildcard imports ( from <module> import * ) should be avoided, as they make it unclear which names are present in the namespace, confusing both readers and many automated tools.
Some recommendations about lazy imports from python speed performance tips.
Import Statement Overhead
import statements can be executed just about anywhere. It's often useful to place them inside functions to restrict their visibility and/or reduce initial startup time. Although Python's interpreter is optimized to not import the same module multiple times, repeatedly executing an import statement can seriously affect performance in some circumstances.
the given below is a scenario explained at the page,
>>> def doit1():
... import string
... string.lower('Python')
...
>>> import string
>>> def doit2():
... string.lower('Python')
...
>>> import timeit
>>> t = timeit.Timer(setup='from __main__ import doit1', stmt='doit1()')
>>> t.timeit()
11.479144930839539
>>> t = timeit.Timer(setup='from __main__ import doit2', stmt='doit2()')
>>> t.timeit()
4.6661689281463623
In general it is better to do explicit imports.
As in:
import pandas
frame = pandas.DataFrame()
Or:
from pandas import DataFrame
frame = DataFrame()
Another option in Python, when you have conflicting names, is import x as y:
from pandas import DataFrame as PDataFrame
from bears import DataFrame as BDataFrame
frame1 = PDataFrame()
frame2 = BDataFrame()
from A import B
essentially equals following three statements
import A
B = A.B
del A
That's it, that is it all.
They are all suitable in different contexts (which is why they are all available). There's no deep guiding principle, other than generic motherhood statements around clarity, maintainability and simplicity. Some examples from my own code:
import sys, os, re, itertools avoids name collisions and provides a very succinct way to import a bunch of standard modules.
from math import * lets me write sin(x) instead of math.sin(x) in math-heavy code. This gets a bit dicey when I also import numpy, which doubles up on some of these, but it doesn't overly concern me, since they are generally the same functions anyway. Also, I tend to follow the numpy documentation — import numpy as np — which sidesteps the issue entirely.
I favour from PIL import Image, ImageDraw just because that's the way the PIL documentation presents its examples.

Categories