Making util file not accessible in python - python

I am building a python library. The functions I want available for users are in stemmer.py. Stemmer.py uses stemmerutil.py
I was wondering whether there is a way to make stemmerutil.py not accessible to users.

If you want to hide implementation details from your users, there are two routes that you can go. The first uses conventions to signal what is and isn't part of the public API, and the other is a hack.
The convention for declaring an API within a python library is to add all classes/functions/names that should be exposed into an __all__-list of the topmost __init__.py. It doesn't do that many useful things, its main purpose nowadays is a symbolic "please use this and nothing else". Yours would probably look somewhat like this:
urdu/urdu/__init__.py
from urdu.stemmer import Foo, Bar, Baz
__all__ = [Foo, Bar, Baz]
To emphasize the point, you can also give all definitions within stemmerUtil.py an underscore before their name, e.g. def privateFunc(): ... becomes def _privateFunc(): ...
But you can also just hide the code from the interpreter by making it a resource instead of a module within the package and loading it dynamically. This is a hack, and probably a bad idea, but it is technically possible.
First, you rename stemmerUtil.py to just stemmerUtil - now it is no longer a python module and can't be imported with the import keyword. Next, update this line in stemmer.py
import stemmerUtil
with
import importlib.util
import importlib.resources
# in python3.7 and lower, this is importlib_resources and needs to be installed first
stemmer_util_spec = importlib.util.spec_from_loader("stemmerUtil", loader=None)
stemmerUtil = importlib.util.module_from_spec(stemmer_util_spec)
with importlib.resources.path("urdu", "stemmerUtil") as stemmer_util_path:
with open(stemmer_util_path) as stemmer_util_file:
stemmer_util_code = stemmer_util_file.read()
exec(stemmer_util_code, stemmerUtil.__dict__)
After running this code, you can use the stemmerUtil module as if you had imported it, but it is invisible to anyone who installed your package - unless they run this exact code as well.
But as I said, if you just want to communicate to your users which part of your package is the public API, the first solution is vastly preferable.

Related

Make imported modules private to other modules

Suppose I have a code like this in module a.py
import numpy as np
def sqrt(x):
return np.sqrt(x)
And I have a module b.py written like this:
import a
print(a.sqrt(25))
print(a.np.sqrt(25))
I will see that the code runs fine and that when using autocomplete in most IDEs, I found that a.np is accessible. I want to make a.np private so that only a code can see that variable.
I don't want b to be able to access a.np.
What is a good approach to make this possible?
Why do I want a.np to be inaccessible? Because I want it to not show in the autocomplete when I type a. and press Tab in Jupyter Lab. It hides what the modules can do because there are so many imports that I use in my module.
The solution is the same as for "protected" attributes / methods in a class (names defined in a module are actually - at runtime - attributes of the module object): prefix those names with a single leading underscore, ie
import numpy as _np
def sqrt(x):
return _np.sqrt(x)
Note that this will NOT prevent someone to use a._np.sqrt(x), but at least it makes it quite clear that he is using a protected attribute.
I see 2 approaches here:
more user-friendly solution: change alias names to "underscored" ones
import numpy as _np
...
this will not prevent from importing it, but it will say to user that this are implementation details and one should not depend on them.
preferred-by-me solution: do nothing, leave it as it is, use semver and bump versions accordingly.

Private functions in python

Is it possible to avoid importing a file with from file import function?
Someone told me i would need to put an underscore as prefix, like: _function, but isn't working.
I'm using Python 2.6 because of a legacy code.
There are ways you can prevent the import, but they're generally hacks and you want to avoid them. The normal method is to just use the underscore:
def _function():
pass
Then, when you import,
from my_module import *
You'll notice that _function is not imported because it begins with an underscore. However, you can always do this:
# In Python culture, this is considered rude
from my_module import _function
But you're not supposed to do that. Just don't do that, and you'll be fine. Python's attitude is that we're all adults. There are a lot of other things you're not supposed to do which are far worse, like
import my_module
# Remove the definition for _function!
del my_module._function
There is no privacy in Python. There are only conventions governing what external code should consider publicly accessible and usable.
Importing a module for the first time, triggers the creation of a module object and the execution of all top-level code in the module. The module object contains the global namespace with the result of that code having run.
Because Python is dynamic you can always introspect the module namespace; you can see all names defined, all objects those names reference, and you can access and alter everything. It doesn't matter here if those names start with underscores or not.
So the only reason you use a leading _ underscore for a name, is to document that the name is internal to the implementation of the module, and that external code should not rely on that name existing in a future version. The from module import * syntax will ignore such names for that reason alone. But you can't prevent a determined programmer from accessing such a name anyway. They simply do so at their own risk, it is not your responsibility to keep them from that access.
If you have functions or other objects that are only needed to initialise the module, you are of course free to delete those names at the end.

Circular & nested imports in python

I'm having some real headaches right now trying to figure out how to import stuff properly. I had my application structured like so:
main.py
util_functions.py
widgets/
- __init__.py
- chooser.py
- controller.py
I would always run my applications from the root directory, so most of my imports would be something like this
from util_functions import *
from widgets.chooser import *
from widgets.controller import *
# ...
And my widgets/__init__.py was setup like this:
from widgets.chooser import Chooser
from widgets.controller import MainPanel, Switch, Lever
__all__ = [
'Chooser', 'MainPanel', 'Switch', 'Lever',
]
It was working all fine, except that widgets/controller.py was getting kind of lengthy, and I wanted it to split it up into multiple files:
main.py
util_functions.py
widgets/
- __init__.py
- chooser.py
- controller/
- __init__.py
- mainpanel.py
- switch.py
- lever.py
One of issues is that the Switch and Lever classes have static members where each class needs to access the other one. Using imports with the from ___ import ___ syntax that created circular imports. So when I tried to run my re-factored application, everything broke at the imports.
My question is this: How can I fix my imports so I can have this nice project structure? I cannot remove the static dependencies of Switch and Lever on each other.
This is covered in the official Python FAQ under How can I have modules that mutually import each other.
As the FAQ makes clear, there's no silvery bullet that magically fixes the problem. The options described in the FAQ (with a little more detail than is in the FAQ) are:
Never put anything at the top level except classes, functions, and variables initialized with constants or builtins, never from spam import anything, and then the circular import problems usually don't arise. Clean and simple, but there are cases where you can't follow those rules.
Refactor the modules to move the imports into the middle of the module, where each module defines the things that need to be exported before importing the other module. This can means splitting classes into two parts, an "interface" class that can go above the line, and an "implementation" subclass that goes below the line.
Refactor the modules in a similar way, but move the "export" code (with the "interface" classes) into a separate module, instead of moving them above the imports. Then each implementation module can import all of the interface modules. This has the same effect as the previous one, with the advantage that your code is idiomatic, and more readable by both humans and automated tools that expect imports at the top of a module, but the disadvantage that you have more modules.
As the FAQ notes, "These solutions are not mutually exclusive." In particular, you can try to move as much top-level code as possible into function bodies, replace as many from spam import … statements with import spam as is reasonable… and then, if you still have circular dependencies, resolve them by refactoring into import-free export code above the line or in a separate module.
With the generalities out of the way, let's look at your specific problem.
Your switch.Switch and lever.Lever classes have "static members where each class needs to access the other one". I assume by this you mean they have class attributes that are initialized using class attributes or class or static methods from the other class?
Following the first solution, you could change things so that these values are initialized after import time. Let's assume your code looked like this:
class Lever:
switch_stuff = Switch.do_stuff()
# ...
You could change that to:
class Lever:
#classmethod
def init_class(cls):
cls.switch_stuff = Switch.do_stuff()
Now, in the __init__.py, right after this:
from lever import Lever
from switch import Switch
… you add:
Lever.init_class()
Switch.init_class()
That's the trick: you're resolving the ambiguous initialization order by making the initialization explicit, and picking an explicit order.
Alternatively, following the second or third solution, you could split Lever up into Lever and LeverImpl. Then you do this (whether as separate lever.py and leverimpl.py files, or as one file with the imports in the middle):
class Lever:
#classmethod
def get_switch_stuff(cls):
return cls.switch_stuff
from switch import Swift
class LeverImpl(Lever):
switch_stuff = Switch.do_stuff()
Now you don't need any kind of init_class method. Of course you do need to change the attribute to a method—but if you don't like that, with a bit of work, you can always change it into a "class #property" (either by writing a custom descriptor, or by using #property in a metaclass).
Note that you don't actually need to fix both classes to resolve the circularity, just one. In theory, it's cleaner to fix both, but in practice, if the fixes are ugly, it may be better to just fix the one that's less ugly to fix and leave the dependency in the opposite direction alone.

python unittest faulty module

I have a module that I need to test in python.
I'm using the unittest framework but I ran into a problem.
The module has some method definitions, one of which is used when it's imported (readConfiguration) like so:
.
.
.
def readConfiguration(file = "default.xml"):
# do some reading from xml
readConfiguration()
This is a problem because when I try to import the module it also tries to run the "readConfiguration" method which fails the module and the program (a configuration file does not exist in the test environment).
I'd like to be able to test the module independent of any configuration files.
I didn't write the module and it cannot be re-factored.
I know I can include a dummy configuration file but I'm looking for a "cleaner", more elegant, solution.
As commenters have already pointed out, imports should never have side effects, so try to get the module changed if at all possible.
If you really, absolutely, cannot do this, there might be another way: let readConfiguration() be called, but stub out its dependencies. For instance, if it uses the builtin open() function, you could mock that, as demonstrated in the mock documentation:
>>> mock = MagicMock(return_value=sentinel.file_handle)
>>> with patch('builtins.open', mock):
... import the_broken_module
... # do your testing here
Replace sentinel.file_handle with StringIO("<contents of mock config file>") if you need to supply actual content.
It's brittle as it depends on the implementation of readConfiguration(), but if there really is no other way, it might be useful as a last resort.

Python includes, module scope issue

I'm working on my first significant Python project and I'm having trouble with scope issues and executing code in included files. Previously my experience is with PHP.
What I would like to do is have one single file that sets up a number of configuration variables, which would then be used throughout the code. Also, I want to make certain functions and classes available globally. For example, the main file would include a single other file, and that file would load a bunch of commonly used functions (each in its own file) and a configuration file. Within those loaded files, I also want to be able to access the functions and configuration variables. What I don't want to do, is to have to put the entire routine at the beginning of each (included) file to include all of the rest. Also, these included files are in various sub-directories, which is making it much harder to import them (especially if I have to re-import in every single file).
Anyway I'm looking for general advice on the best way to structure the code to achieve what I want.
Thanks!
In python, it is a common practice to have a bunch of modules that implement various functions and then have one single module that is the point-of-access to all the functions. This is basically the facade pattern.
An example: say you're writing a package foo, which includes the bar, baz, and moo modules.
~/project/foo
~/project/foo/__init__.py
~/project/foo/bar.py
~/project/foo/baz.py
~/project/foo/moo.py
~/project/foo/config.py
What you would usually do is write __init__.py like this:
from foo.bar import func1, func2
from foo.baz import func3, constant1
from foo.moo import func1 as moofunc1
from foo.config import *
Now, when you want to use the functions you just do
import foo
foo.func1()
print foo.constant1
# assuming config defines a config1 variable
print foo.config1
If you wanted, you could arrange your code so that you only need to write
import foo
At the top of every module, and then access everything through foo (which you should probably name "globals" or something to that effect). If you don't like namespaces, you could even do
from foo import *
and have everything as global, but this is really not recommended. Remember: namespaces are one honking great idea!
This is a two-step process:
In your module globals.py import the items from wherever.
In all of your other modules, do "from globals import *"
This brings all of those names into the current module's namespace.
Now, having told you how to do this, let me suggest that you don't. First of all, you are loading up the local namespace with a bunch of "magically defined" entities. This violates precept 2 of the Zen of Python, "Explicit is better than implicit." Instead of "from foo import *", try using "import foo" and then saying "foo.some_value". If you want to use the shorter names, use "from foo import mumble, snort". Either of these methods directly exposes the actual use of the module foo.py. Using the globals.py method is just a little too magic. The primary exception to this is in an __init__.py where you are hiding some internal aspects of a package.
Globals are also semi-evil in that it can be very difficult to figure out who is modifying (or corrupting) them. If you have well-defined routines for getting/setting globals, then debugging them can be much simpler.
I know that PHP has this "everything is one, big, happy namespace" concept, but it's really just an artifact of poor language design.
As far as I know program-wide global variables/functions/classes/etc. does not exist in Python, everything is "confined" in some module (namespace). So if you want some functions or classes to be used in many parts of your code one solution is creating some modules like: "globFunCl" (defining/importing from elsewhere everything you want to be "global") and "config" (containing configuration variables) and importing those everywhere you need them. If you don't like idea of using nested namespaces you can use:
from globFunCl import *
This way you'll "hide" namespaces (making names look like "globals").
I'm not sure what you mean by not wanting to "put the entire routine at the beginning of each (included) file to include all of the rest", I'm afraid you can't really escape from this. Check out the Python Packages though, they should make it easier for you.
This depends a bit on how you want to package things up. You can either think in terms of files or modules. The latter is "more pythonic", and enables you to decide exactly which items (and they can be anything with a name: classes, functions, variables, etc.) you want to make visible.
The basic rule is that for any file or module you import, anything directly in its namespace can be accessed. So if myfile.py contains definitions def myfun(...): and class myclass(...) as well as myvar = ... then you can access them from another file by
import myfile
y = myfile.myfun(...)
x = myfile.myvar
or
from myfile import myfun, myvar, myclass
Crucially, anything at the top level of myfile is accessible, including imports. So if myfile contains from foo import bar, then myfile.bar is also available.

Categories