How to make variable(s) global over different modules in Python? - python

I have a small project where I need to initially introduce and use some quite large number of variables.
Obviously, I can make some configure file, where I set up all the values of the variables. I have made just some Python file, where I give values:
value_a = 'something'
value_b = 'something'
value_c = 5.0
and call the file conf.py. When I do from conf import *, I have all the variables with values initialized.
Nevertheless, I have different modules in project with different subroutines (methods in it) and I want to have all those values from conf.py known in every method and in every module.
Obviously, I can do from conf import * in every module and/or import conf in every subroutine, but is it the best way how to implement the initialization of variables?

Using a module as you describe is a viable way to setup configuration values for a script, but there are a few reasons you might be better off with something else.
A few others in the comments have pointed out that import * is frowned upon because it clutters up the root namespace with lots of variable names making it much easier to accidentally have name conflicts. Keeping them under the module name (ex: conf.varname) helps from an organizational standpoint in keeping track of names and preventing conflict.
If you plan to distribute code that requires configuration, using a .py module opens up your code to arbitrary code execution of anything that gets typed in that file. This is where things like ".ini .json .cfg etc" files are very useful. As an added bonus by using a common format (like json) it makes the configuration easy to port to other languages if a colleague is using a different language but needs to work on the same project. Off the top of my head, python includes libraries for .xml .json and .ini files.

In my opinion a solution that I would implement for that issue is to create a conf.py, as you said, and then define in it global variables as dictionary structures in order to be properly organized and easy-to-use on the modules that will be imported.
For example:
globals = {'value_a': 'something a',
'value_b': 'something b',
'value_c': '5.0',
'allowed_platforms': {
'windows': 'value 1',
'os x': 'value 2',
'linux': 'value 3'
},
'hosts': ['host a', 'host b', 'host c'],
...
}
You need to avoid the from some_module import * statement because you could put a lot of imports into the namespace and because it's not explicit about what is importing. So doing at the top of each module from your_package.conf import globals you could use it without the need of importing explicitly every single variable that you want to use or without importing the entire module. I prefer that solution, and also could be better if you use json files to store the info of that global variables and then read and serialize them in the conf.py module before being imported in your required modules.

I generally agree with ##Aaron. What he outlined is very general / portable and safe.
Since import * is an antipattern, you could easily do import config and then reference its values like config.varname.
I think it's fine to use .py files when needed. Aaron's point is good, but as long as the config is controlled by the person running the app, there's no security issue. The main reason to allow .py files is when some of the config items need to be derived from other config items, or looked up / loaded at run time. If there's no need for that (config is 100% flat and static) then .json or another flat file approach as Aaron mentioned would be best.

Related

Making util file not accessible in python

I am building a python library. The functions I want available for users are in stemmer.py. Stemmer.py uses stemmerutil.py
I was wondering whether there is a way to make stemmerutil.py not accessible to users.
If you want to hide implementation details from your users, there are two routes that you can go. The first uses conventions to signal what is and isn't part of the public API, and the other is a hack.
The convention for declaring an API within a python library is to add all classes/functions/names that should be exposed into an __all__-list of the topmost __init__.py. It doesn't do that many useful things, its main purpose nowadays is a symbolic "please use this and nothing else". Yours would probably look somewhat like this:
urdu/urdu/__init__.py
from urdu.stemmer import Foo, Bar, Baz
__all__ = [Foo, Bar, Baz]
To emphasize the point, you can also give all definitions within stemmerUtil.py an underscore before their name, e.g. def privateFunc(): ... becomes def _privateFunc(): ...
But you can also just hide the code from the interpreter by making it a resource instead of a module within the package and loading it dynamically. This is a hack, and probably a bad idea, but it is technically possible.
First, you rename stemmerUtil.py to just stemmerUtil - now it is no longer a python module and can't be imported with the import keyword. Next, update this line in stemmer.py
import stemmerUtil
with
import importlib.util
import importlib.resources
# in python3.7 and lower, this is importlib_resources and needs to be installed first
stemmer_util_spec = importlib.util.spec_from_loader("stemmerUtil", loader=None)
stemmerUtil = importlib.util.module_from_spec(stemmer_util_spec)
with importlib.resources.path("urdu", "stemmerUtil") as stemmer_util_path:
with open(stemmer_util_path) as stemmer_util_file:
stemmer_util_code = stemmer_util_file.read()
exec(stemmer_util_code, stemmerUtil.__dict__)
After running this code, you can use the stemmerUtil module as if you had imported it, but it is invisible to anyone who installed your package - unless they run this exact code as well.
But as I said, if you just want to communicate to your users which part of your package is the public API, the first solution is vastly preferable.

Best way to import several classes

I have defined several classes in a single python file. My wish is to create a library with these. I would ideally like to import the library in such a way that I can use the classes without a prefix (like mylibrary.myclass() as opposed to just myclass() ), if that's what you can call them, I am not entirely sure as I am a beginner.
What is the proper way to achieve this, or the otherwise best result? Define all classes in __init __? Define them all in a single file as I currently have like AllMyClasses.py? Or should I have a separate file for every class in the library directory like FirstClass.py, SecondClass.py etc.
I realize this is a question that should be easy enough to google, but since I am still quite new to python and programming in general I haven't quite figured out what the correct keywords are for a problem in this context(such as my uncertainty about "prefix")
More information can be found in the tutorial on modules (single files) or packages (when in a directory with an __init__.py file) on the python site.
The suggested way (according to the style guide) is to spell out each class import specifically.
from my_module import MyClass1, MyClass2
object1 = MyClass1()
object2 = MyClass2()
While you can also shorten the module name:
import my_module as mo
object = mo.MyClass1()
Using from my_module import * is recommended to be avoided as it can be confusing (even if it is the recommended way for some things, like tkinter)
If it's for your personal use, you can just put all your classes Class1, Class2, ... in a myFile.py and to use them call import myFile (without the .py extension)
import myFile
myVar1 = myFile.Class1()
myVar2 = myFile.Class2()
from within another script. If you want to be able to use the classes without the file name prefix, import the file like this:
from myFile import *
Note that the file you want to import should be in a directory where Python can find it (the same where the script is running or a directory in PYTHONPATH).
The _init_ is needed if you want to create a Python module for distribution. Here are the instructions: Distributing Python Modules
EDIT after checking the Python's style guide PEP 8 on imports:
Wildcard imports (from import) should be avoided, as they make it unclear which names are present in the namespace, confusing both readers and many automated tools
So in this example you should have used
from myFile import Class1, Class2

Recursively populating __all__ in __init__.py

I'm using the following code to populate __all__ in my module's __init__.py and I was wandering if there was a more efficient way. Any ideas?
import fnmatch
import os
__all__ = []
for root, dirnames, filenames in os.walk(os.path.dirname(__file__)):
root = root[os.path.dirname(__file__).__len__():]
for filename in fnmatch.filter(filenames, "*.py"):
__all__.append(os.path.join(root, filename[:-3]))
You probably shouldn't be doing this: The default behaviour of import is quite flexible. If you don't want a module (or any other variable) to be automatically exported, give it a name that starts with _ and python won't export it. That's the standard python way, and reinventing the wheel is considered unpythonic. Also, don't forget that other things besides modules may need exporting; once you set __all__, you'll need to find and export them as well.
Still, you ask how to best generate a list of your exportable modules. Since you can't export what's not present, I'd just check what modules of your own are known to your main module:
basedir = os.path.dirname(__file__)
for m in sys.modules:
if m in locals() and not m.startswith('_'): # Only export regular names
mod = locals()[m]
if '__file__' in mod.__dict__ and mod.__file__.startswith(basedir):
print m
sys.modules includes the names of every module that python has loaded, including many that have not been exported to your main module-- so we check if they're in locals().
This is faster than scanning your filesystem, and more robust than assuming that every .py file in your directory tree will somehow end up as a top-level submodule. Naturally you should run this code near the end of your __init__.py, when everything has been loaded.
I work with a few complex packages that have sub-packages and sub-modules. I like to control this on a module by module basis. I use a simple package called auto-all which makes it easy (full disclosure - I am the author).
https://pypi.org/project/auto-all/
Here's an example:
from auto_all import start_all, end_all
# Define some internal stuff
start_all(globals())
# Define some external stuff
end_all(globals())
The reason I use this approach is mainly because of imports. As mentioned by alexis, you can implicitly make things private by prefixing object names with an underscore, however this can get messy or just impractical for imported objects. Consider the following code:
from pyspark.sql.session import SparkSession
If this appears in your module then you will be implicitly making SparkSession available to be accessed from outside the module. The alternative is to prefix all imported items with underscores, for example:
from pyspark.sql.session import SparkSession as _SparkSession
This also isn't ideal, so manually managing __all__ is the only way (I'm aware of) to manage what you make externally available.
You can easily do this by explicitly setting the contents of the __all__ variable (which is the pythonic way), but this can become tedious when managing a large number of objects, and can also lead to issues if a developer adds a new object and doesn't expose it by adding to the __all__ variable. This type of thing can slip through code reviews. Using simple helper functions to manage the variable contents makes this much easier.

Populating namespace within a module before loading it

I designed a configuration mechanism in Python, where certain objects can operate in special ways to define problems in our domain.
The user specifies the problem by using this objects in a "config-file" manner. For instance:
# run configuration
CASES = [
('Case 1', Item('item1') + Item('item2') + Item('item3')),
('Case 2', Item('item1') + Item('item4')),
]
DATA = {
'Case 1' = {'Piece 1': 'path 1'},
'Case 2' = {'Piece 1': 'path 2'},
}
The Item objects are, of course, defined in a specific module. In order to use them you have to issue an import statement: from models import Item (of course, my actual imports are more complex, not a single one).
I would like the user to simply write the configuration presented, without having to import anything (users very easily can forget this).
I thought of reading the file as text, and creating a secondary text file with all the appropriate imports at the top, write that to a file, and import that file, but this seems clumsy.
Any advice?
Edit:
The workflow of my system is somewhat similar to Django, in that the user defines the "Settings" in a python file, and runs a script which imports that Settings file and does things with it. That is where I would like this functionality, to tell Python "given this namespace (where Item means something in particular), the user will provide a script - execute it and hand me the result so that I can spawn the different runs".
From the eval help:
>>> help(eval)
Help on built-in function eval in module __builtin__:
eval(...)
eval(source[, globals[, locals]]) -> value
Evaluate the source in the context of globals and locals.
The source may be a string representing a Python expression
or a code object as returned by compile().
The globals must be a dictionary and locals can be any mapping,
defaulting to the current globals and locals.
If only globals is given, locals defaults to it.
That is, you can pass in an arbitrary dictionary to use as the namespace for an eval call.
with open(source) as f:
eval(f.read, globals(), {'Item': Item})
Why have you decided that the user needs to write their configuration file in pure Python? There are many simple human-writable languages you could use instead. Have a look at ConfigParser, for instance, which reads basic configuration files of the sort Windows uses.
[cases]
case 1: item1 + item2 + item3
case 2: item1 + item4
[data]
case 1: piece1 - path1
case 2: piece1 - path2
1) the first thing that i have in mind is to offer to the user the generation of your config file; how so ?
you can add an argument in the script that launch your application :
$ python application_run.py --generate_settings
this will generate a config file with a skeleton of different import that the user should not have to add every time, something like this:
import sys
from models import Item
# Complete the information here please !!!
CASES = []
DATA = {}
2) a second way is to use execfile() , you can for this create a script that will read the settings.py:
root_settings.py
# All import defined Here.
from Model import Item
...
execfile('settings.py')
And now to read the settings file info just import the root_settings, by the way all variable that have been defined in settings.py are now in root_settings.py namespace .

Python includes, module scope issue

I'm working on my first significant Python project and I'm having trouble with scope issues and executing code in included files. Previously my experience is with PHP.
What I would like to do is have one single file that sets up a number of configuration variables, which would then be used throughout the code. Also, I want to make certain functions and classes available globally. For example, the main file would include a single other file, and that file would load a bunch of commonly used functions (each in its own file) and a configuration file. Within those loaded files, I also want to be able to access the functions and configuration variables. What I don't want to do, is to have to put the entire routine at the beginning of each (included) file to include all of the rest. Also, these included files are in various sub-directories, which is making it much harder to import them (especially if I have to re-import in every single file).
Anyway I'm looking for general advice on the best way to structure the code to achieve what I want.
Thanks!
In python, it is a common practice to have a bunch of modules that implement various functions and then have one single module that is the point-of-access to all the functions. This is basically the facade pattern.
An example: say you're writing a package foo, which includes the bar, baz, and moo modules.
~/project/foo
~/project/foo/__init__.py
~/project/foo/bar.py
~/project/foo/baz.py
~/project/foo/moo.py
~/project/foo/config.py
What you would usually do is write __init__.py like this:
from foo.bar import func1, func2
from foo.baz import func3, constant1
from foo.moo import func1 as moofunc1
from foo.config import *
Now, when you want to use the functions you just do
import foo
foo.func1()
print foo.constant1
# assuming config defines a config1 variable
print foo.config1
If you wanted, you could arrange your code so that you only need to write
import foo
At the top of every module, and then access everything through foo (which you should probably name "globals" or something to that effect). If you don't like namespaces, you could even do
from foo import *
and have everything as global, but this is really not recommended. Remember: namespaces are one honking great idea!
This is a two-step process:
In your module globals.py import the items from wherever.
In all of your other modules, do "from globals import *"
This brings all of those names into the current module's namespace.
Now, having told you how to do this, let me suggest that you don't. First of all, you are loading up the local namespace with a bunch of "magically defined" entities. This violates precept 2 of the Zen of Python, "Explicit is better than implicit." Instead of "from foo import *", try using "import foo" and then saying "foo.some_value". If you want to use the shorter names, use "from foo import mumble, snort". Either of these methods directly exposes the actual use of the module foo.py. Using the globals.py method is just a little too magic. The primary exception to this is in an __init__.py where you are hiding some internal aspects of a package.
Globals are also semi-evil in that it can be very difficult to figure out who is modifying (or corrupting) them. If you have well-defined routines for getting/setting globals, then debugging them can be much simpler.
I know that PHP has this "everything is one, big, happy namespace" concept, but it's really just an artifact of poor language design.
As far as I know program-wide global variables/functions/classes/etc. does not exist in Python, everything is "confined" in some module (namespace). So if you want some functions or classes to be used in many parts of your code one solution is creating some modules like: "globFunCl" (defining/importing from elsewhere everything you want to be "global") and "config" (containing configuration variables) and importing those everywhere you need them. If you don't like idea of using nested namespaces you can use:
from globFunCl import *
This way you'll "hide" namespaces (making names look like "globals").
I'm not sure what you mean by not wanting to "put the entire routine at the beginning of each (included) file to include all of the rest", I'm afraid you can't really escape from this. Check out the Python Packages though, they should make it easier for you.
This depends a bit on how you want to package things up. You can either think in terms of files or modules. The latter is "more pythonic", and enables you to decide exactly which items (and they can be anything with a name: classes, functions, variables, etc.) you want to make visible.
The basic rule is that for any file or module you import, anything directly in its namespace can be accessed. So if myfile.py contains definitions def myfun(...): and class myclass(...) as well as myvar = ... then you can access them from another file by
import myfile
y = myfile.myfun(...)
x = myfile.myvar
or
from myfile import myfun, myvar, myclass
Crucially, anything at the top level of myfile is accessible, including imports. So if myfile contains from foo import bar, then myfile.bar is also available.

Categories