Correct way to create shareable module (package imports)

Correct way to create shareable module (package imports) - python

I want to create a module that I will share with others however I am quite new to this and am having issues with the final step of tidying it up for other's use. Imagine it is called something like my_module.py and looks like this:
import pandas as pd
def function_1(a,b):
return a*b
def function_2(c,d):
return pd.DataFrame(data=c,columns=d)
I want this to be able to be imported by someone else, so that they can use the underlying functions like:
my_module.function_1(a=5,b=2)
and so on. However, if I do import my_module then my_module.pd also appears in the autocomplete (as in the pandas import that my_module.py made).
This seems like terrible practice to me. So, what is the correct way to load these imports?
Ideally, this would be shareable so that someone could install it the way someone would install a stats module. I'm fine if the solution is just some kind of thing that checks to make sure things are imported in certain ways.

There's nothing inherently wrong with what you are doing. Your module requires pandas so you must import it. PEP8 specifies imports should go at the top, not nested within the functions. Doing this will add it as an attribute when you then import my_module. Because you are building upon pandas you cannot just share your module, you also need to share pandas (or check that they already have pandas installed with the correct or sufficient version).
Still, it might be overkill to import the entire pandas library when you have a single function that only uses the DataFrame class. In that case you can do:
from pandas import DataFrame
def function_1(a,b):
return a*b
def function_2(c,d):
return DataFrame(data=c,columns=d)
Now my_module will only have the .DataFrame class attached to it, not the entire pandas library. If you do wind up using more and more of the pandas library in your module, then importing separate parts is more of a nuisance, so just import pandas.
And to use pandas as an example, it's built upon numpy. Underlying every DataFrame is a numpy.ndarray so you may not have noticed it, but numpy is there:
import pandas as pd
pd.np?
Type: module
String form: <module 'numpy' from 'c:\\program files\\python36\\lib\\site-packages\\numpy\\__init__.py'>
File: c:\program files\python36\lib\site-packages\numpy\__init__.py
Docstring:
NumPy
=====
You can make it much more difficult to access the pandas attribute, but you need to reorganize how you distribute your Library. Let's say you want to share a library called MyLibrary which could be composed of several modules (that we will put with the module folder). They could each have their own functions, with names that should not overlap which we will need to import in a separate python script (api.py). Then you would do:
MyLibrary/
__init__.py
modules/
MyModule1.py
api.py
where we have the files:
__init__.py
from MyLibrary.modules.api import *
api.py
from MyLibrary.modules.MyModule1 import function_1, function_2
MyModule1.py
import pandas as pd
def function_1(a,b):
return a*b
def function_2(c,d):
return pd.DataFrame(data=c,columns=d)
Now we have access to the functions, but pd is no longer there:
import MyLibrary
MyLibrary.function_2([1], ['a'])
# a
#0 1
MyLibrary.pd
#AttributeError: module 'MyLibrary' has no attribute 'pd'
To be fair, pd is there, it's just hidden away much further down in MyLibrary.modules.MyModule1.pd. But then again, pandas has numpy everywhere. It's in pd.core.reshape.concat.np, pd.core.reshape.merge.np, pd.core.common.np and really almost every file, you cannot avoid it.

Related

Why mock patching works with random but not with np?

I have a module where a number of different functions use random numbers or random choices.
I am trying to use mock and patch to inject pre-chosen values in place of these random selections but can't understand an error I am receiving.
In the function I am testing, I use
np.random.randint
when I use the code
from unittest import mock
import random
mocked_random_int = lambda : 7
with mock.patch('np.random.randint', mocked_random_int):
I get an error message no module named np. However, numpy is imported as np and other functions are calling it just fine.
Even more perplexing if I edit the code above to remove the 'np' at the front it does what I want:
with mock.patch('random.randint', mocked_random_int):
But I want to understand why the code works without the np. Thank you!

There is a difference between a module or package name and the variable it is assigned to in any given namespace. A simple import
import numpy
tells python to check its imported module list, import numpy as necessary, and assign the module to the variable "numpy"
import numpy as np
is almost the same, except that you assign to a variable "np". Its still the same numpy package, its just that you've aliased it differently.
mock.patch will import and patch the module regardless of whether you've already imported it, but you need to give the module name, not your current module's alias to the module.

How should I avoid duplicate imports when writing a package? [duplicate]

This question already has answers here:
Should I hide imports of dependencies in a python module from the __dir__ method?
(1 answer)
How should I perform imports in a python module without polluting its namespace?
(6 answers)
Closed 1 year ago.
I'm making a python package to run analyses with pandas, and I use pandas objects in most files in the package. How do I import those functions so they're usable in the package but don't clutter the namespace for a user? Say I have this directory structure:
MyThing/
MyThing/
__init__.py
apis.py
MyClass.py
where MyClass.py provides a class I will instantiate to process data in memory and apis.py has interfaces to local and remote databases. As a demonstration, say __init__.py contains
from MyThing.MyClass import MyClass
from MyThing.apis import DBInterface
the contents of MyClass.py are
class MyClass:
def __init__():
pass
and apis.py is
import pandas as pd
class DBInterface:
def __init__():
pass
With complete code I expect the use case to look something like this
import MyThing as mt
# get some data
interface = mt.DBInterface()
some_data = interface.query(parameters)
# load it into MyThing
instance = mt.MyThing(some_data)
# add new data from another source
instance.read(filename)
# make some fancy products
instance.magic(parameters)
# update the database
interface.update_db(instance)
The concern I have is that dir(mt.apis) shows everything I've imported, meaning I can do things like make a pandas DataFrame with df = mt.apis.pd.DataFrame(). Is this how it's supposed to work? Should I be using import differently so the namespace isn't cluttered with dependencies? Should I design the package differently so the dependencies aren't available when I import MyThing?

What you are doing is fine and how it's supposed to work and I wouldn't advise trying hard to hide your pandas import.
The solution to this df = mt.apis.pd.DataFrame() is: don't do that.
If there is a function or variable within Mything.apis that you don't want others to use, you can prefix it with a single underscore (eg. _foo). By convention this is understood to be for "internal use" and is not imported when you do from Mything.apis import *. See this section of the PEP-8 style guide for more information about naming conventions of this sort.
If you'd like to be more explicit about what things your module exports you may define them like so __all__ = ['foo', 'bar']. This also makes it so that if you or someone does from Mything.apis import * (which is generally ill-advised anyway) they will only import foo and bar, but you should treat this as a mere suggestion, just like the leading underscore convention.

Make imported modules private to other modules

Suppose I have a code like this in module a.py
import numpy as np
def sqrt(x):
return np.sqrt(x)
And I have a module b.py written like this:
import a
print(a.sqrt(25))
print(a.np.sqrt(25))
I will see that the code runs fine and that when using autocomplete in most IDEs, I found that a.np is accessible. I want to make a.np private so that only a code can see that variable.
I don't want b to be able to access a.np.
What is a good approach to make this possible?
Why do I want a.np to be inaccessible? Because I want it to not show in the autocomplete when I type a. and press Tab in Jupyter Lab. It hides what the modules can do because there are so many imports that I use in my module.

The solution is the same as for "protected" attributes / methods in a class (names defined in a module are actually - at runtime - attributes of the module object): prefix those names with a single leading underscore, ie
import numpy as _np
def sqrt(x):
return _np.sqrt(x)
Note that this will NOT prevent someone to use a._np.sqrt(x), but at least it makes it quite clear that he is using a protected attribute.

I see 2 approaches here:
more user-friendly solution: change alias names to "underscored" ones
import numpy as _np
...
this will not prevent from importing it, but it will say to user that this are implementation details and one should not depend on them.
preferred-by-me solution: do nothing, leave it as it is, use semver and bump versions accordingly.

Can I force `import mymodule` to only import definitions and classes, and not modules (dependencies) imported in the first few lines?

I'm a beginner to python, and have written a module that looks something like this:
# filename: mymodule.py
import numpy as np
from datetime import datetime
def a():
...<stuff>...
def b():
...<stuff>...
The consensus in this thread generally (and in agreement with PEP8) seems to be that import statements should be at the file header. However, now when I import mymodule, running dir(mymodule) shows that objects np and datetime are part of mymodule -- which offhand, seems inefficient and "sloppy". It seems one way to preserve only classes and defs would be some kind of conditional deletion via dynamic iteration over globals() (which after trying and failing for a bit, seems really elusive), or just use the del keyword on everything.
The main question: can I do this, and can I do this dynamically instead of explicitly? Don't the defs work independently, regardless of whether the header modules are part of the import? Otherwise from <x> import <y> would break every time, I would think.

import only some parts of a module, but list those parts as strings

This question was marked as duplicate. However the duplicate questions deals with modules while my question is asking for how to import parts of module.
I know that I can import certain portions of the module by using from mymodule import myfunction. What if want to import several things, but I need to list them as strings. For example:
import_things = ['thing1', 'thing2', 'thing2']
from mymodule import import_things
I did ask this question, but it looks like I need to use the trick (if there is one) above for my code as well.
Any help is appreciated.

import importlib
base_module = importlib.import_module('mymodule')
imported_things = {thing: getattr(base_module, thing) for thing in import_things}
imported_things['thing1']() # Function call
If you want to be able to use the things which have been imported globally, then do:
globals().update(imported_things)
thing1() # function call
To remove the imports, you can do
del thing1
or
del globals()['thing1']
Another useful operation you can do is to reload a module

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Correct way to create shareable module (package imports) - python

Related

Why mock patching works with random but not with np?

How should I avoid duplicate imports when writing a package? [duplicate]

Make imported modules private to other modules

Can I force `import mymodule` to only import definitions and classes, and not modules (dependencies) imported in the first few lines?

import only some parts of a module, but list those parts as strings

Categories

Resources