There are two Python scripts: master.py and to_be_imported.py
Here is the master.py:
import os
os.foo = 12345
import to_be_imported
And here is the to_be_imported.py:
import os
if hasattr(os, 'foo'):
print 'os hasattr foo: %s'%os.foo
Now when I run master.py I get this:
os hasattr foo: 12345
indicating that the imported module to_be_imported.py picks up the variable declared inside the process that imported it (master.py).
While it works fine I would like to know why it works and also to make sure it is a safe practice.
If a module is already imported, subsequent imports to the module uses the cached version of the module. Even if you reference it via different names as in the following case
import os as a
import os as b
Both refer to the same os module that was imported the first time. So it is obvious that the variable assigned to a module will be shared.
You can verify it using the built-in python function id()
Nothing is a bad idea per se, but you must remember few things:
Modules are objects in Python. They are loaded only once and added to sys.modules. These objects can also be added attributes like regular objects (with no messy implementation of setattr).
Since they are objects, but not instantiable ones, you must consider them as singletons (they are singletons, after all), and you must consider the disadvantages and benefits of such model:
a. Singletons are only one object. Are you sure that accessing their attributes is concurrency-safe?
b. Modules are global objects. Are you sure you can track the whole behavior and access to their members? Are you sure you will be able to debug errors there?
Is the code something you will work with others?
While no idea is better than other, good practices tell us that using global variables is not well-seen, specially if we have a team to work with. On the other hand: if your code is concurrent and/or reentrant, avoid using global variables or relying on module attributes. OTOH you will have no problem assigning attributes like that. They will last for the life of your script execution.
This is not the place to chose the best alternative. Depending on how you state your problem, you can ask it either on programmers or codereview. You can chose many variants to share state without using global variables in modules, like passing those variables inside a state back and forth across arguments, or learning and using OOP. But, again, this site is no scope for that.
Related
I'm trying to dynamically load modules as explained here.
I have written a script that requires some modules that may not be installed by default on some systems (such as requests). The script code assumes that a regular import has been done (it uses requests.get).
If I use the code in the link above, to import requests I would have to use:
requests=importlib.import_module('requests')
But this leads to a lot of code duplication since I have several modules. I can't use that in a loop since the variable name must change with the imported module.
I have found that I can use:
for m in list_of_modules:
locals()[m]=importlib.import_module(m)
And everything happens as if I had done regular import's.
(of course the real code catches exceptions...).
So the question is how valid/risky this is? Too good to be true or not? (Python 2.7 if that makes a difference)
It is explicitely invalid. Doc of Python 2.7.15 says of locals() function:
The contents of this dictionary should not be modified; changes may not affect the values of local and free variables used by the interpreter.
locals() is a way for the program to know the list of variables in a function block. It is not a way to create local variables.
If you really need something like that, you can either use a local map, rely on the sys.modules map which is updated by import_module, or update the globals() map. Anyway, once a module was loaded, it exists (through the sys.module map) for the whole program, so it does not really make sense to store its reference in a local symbol table.
So if you really need to import a dynamically builded list of modules, I would do:
for m in list_of_modules:
globals()[m]=importlib.import_module(m)
I created a module named util that provides classes and functions I often use in Python.
Some of them need imported features. What are the pros and the cons of importing needed things inside class/function definition? Is it better than import at the beginning of a module file? Is it a good idea?
It's the most common style to put every import at the top of the file. PEP 8 recommends it, which is a good reason to do it to start with. But that's not a whim, it has advantages (although not critical enough to make everything else a crime). It allows finding all imports at a glance, as opposed to looking through the whole file. It also ensures everything is imported before any other code (which may depend on some imports) is executed. NameErrors are usually easy to resolve, but they can be annoying.
There's no (significant) namespace pollution to be avoided by keeping the module in a smaller scope, since all you add is the actual module (no, import * doesn't count and probably shouldn't be used anyway). Inside functions, you'd import again on every call (not really harmful since everything is imported once, but uncalled for).
PEP8, the Python style guide, states that:
Imports are always put at the top of
the file, just after any module
comments and docstrings, and before module globals and constants.
Of course this is no hard and fast rule, and imports can go anywhere you want them to. But putting them at the top is the best way to go about it. You can of course import within functions or a class.
But note you cannot do this:
def foo():
from os import *
Because:
SyntaxWarning: import * only allowed at module level
Like flying sheep's answer, I agree that the others are right, but I put imports in other places like in __init__() routines and function calls when I am DEVELOPING code. After my class or function has been tested and proven to work with the import inside of it, I normally give it its own module with the import following PEP8 guidelines. I do this because sometimes I forget to delete imports after refactoring code or removing old code with bad ideas. By keeping the imports inside the class or function under development, I am specifying its dependencies should I want to copy it elsewhere or promote it to its own module...
Only move imports into a local scope, such as inside a function definition, if it’s necessary to solve a problem such as avoiding a circular import or are trying to reduce the initialization time of a module. This technique is especially helpful if many of the imports are unnecessary depending on how the program executes. You may also want to move imports into a function if the modules are only ever used in that function. Note that loading a module the first time may be expensive because of the one time initialization of the module, but loading a module multiple times is virtually free, costing only a couple of dictionary lookups. Even if the module name has gone out of scope, the module is probably available in sys.modules.
https://docs.python.org/3/faq/programming.html#what-are-the-best-practices-for-using-import-in-a-module
I believe that it's best practice (according to some PEP's) that you keep import statements at the beginning of a module. You can add import statements to an __init__.py file, which will import those module to all modules inside the package.
So...it's certainly something you can do the way you're doing it, but it's discouraged and actually unnecessary.
While the other answers are mostly right, there is a reason why python allows this.
It is not smart to import redundant stuff which isn’t needed. So, if you want to e.g. parse XML into an element tree, but don’t want to use the slow builtin XML parser if lxml is available, you would need to check this the moment you need to invoke the parser.
And instead of memorizing the availability of lxml at the beginning, I would prefer to try importing and using lxml, except it’s not there, in which case I’d fallback to the builtin xml module.
I have a library that interfaces with an external tool and exposes some basic keywords to use from robotframework; This library is implemented as a python package, and I would like to implement extended functionality that implements complex logic, and exposes more keywords, within modules of this package. The package is given test case scope, but I'm not entirely sure how this works. If I suggest a few ways I have thought of, could someone with a bit more knowledge let me know where I'm on the right track, and where I'm barking up the wrong tree...
Use an instance variable - if the scope is such that the python interpreter will see the package as imported by the current test case (i.e this is treated as a separate package in different test cases rather than a separate instance of the same package), then on initialisation I could set a global variable INSTANCE to self and then from another module within the package, import INSTANCE and use it.
Use an instance dictionary - if the scope is such that all imports see the package as the same, I could use robot.running.context to set a dictionary key such that there is an item in the instance dictionary for each context where the package has been imported - this would then mean that I could use the same context variable as a lookup key in the modules that are based on this. (The disadvantage of this one is that it will prevent garbage collection until the package itself is out of scope, and relies on it being in scope persistently.)
A context variable that I am as of yet unaware of that will give me the instance that is in scope. The docs are fairly difficult to search, so it's fully possible that there is something that I'm missing that will make this trivial. Also just as good would be something that allowed me to call the keywords that are in scope.
Some excellent possibility I haven't considered....
So can anyone help?
Credit for this goes to Kevin O. from the robotframework user group, but essentially the magic lives in robot.libraries.BuiltIn.BuiltIn().get_library_instance(library_name) which can be used like this:
from robot.libraries.BuiltIn import BuiltIn
class SeleniumTestLibrary(object):
def element_should_be_really_visible(self):
s2l = BuiltIn().get_library_instance('Selenium2Library')
element = s2l._element_find(locator, True, False)
It sounds like you are talking about monkeypatching the imported code, so that other modules which import that package will also see your runtime modifications. (Correct me if I'm wrong; there are a couple of bits in your question that I'm not quite following)
For simple package imports, this should work:
import my_package
def method_override():
return "Foo"
my_package.some_method = method_override
my_package, in this case, refers to the imported module, and is not just a local name, so other modules will see the overridden method.
This won't work in cases where other code has already done
from my_package import some_method
Since in that case, some_method is a local name in the place it is imported. If you replace the method elsewhere, that change won't be seen.
If this is happening, then you either need to change the source to import the entire module, or patch a little bit deeper, by replacing method internals:
import my_package
def method_override():
return "Foo"
my_package.some_method.func_code = method_override.func_code
At that point, it doesn't matter how the method was imported in any other module; the code object associated with the method has been replaced, and your new code will run rather than the original.
The only thing to worry about in that case is that the module is imported from the same path in every case. The Python interpreter will try to reuse existing modules, rather than re-import and re-initialize them, whenever they are imported from the same path.
However, if your python path is set up to contain two directories, say: '/foo' and '/foo/bar', then these two imports
from foo.bar import baz
and
from bar import baz
would end up loading the module twice, and defining two versions of any objects (methods, classes, etc) in the module. If that happens, then patching one will not affect the other.
If you need to guard against that case, then you may have to traverse sys.modules, looking for the imported package, and patching each version that you find. This, of course, will only work if all of the other imports have already happened, you can't do that pre-emptively (without writing an import hook, but that's another level deeper again :) )
Are you sure you can't just fork the original package and extend it directly? That would be much easier :)
I am a Python newbie coming from a C++ background. While I know it's not Pythonic to try to find a matching concept using my old C++ knowledge, I think this question is still a general question to ask:
Under C++, there is a well known problem called global/static variable initialization order fiasco, due to C++'s inability to decide which global/static variable would be initialized first across compilation units, thus a global/static variable depending on another one in different compilation units might be initialized earlier than its dependency counterparts, and when dependant started to use the services provided by the dependency object, we would have undefined behavior. Here I don't want to go too deep on how C++ solves this problem. :)
On the Python world, I do see uses of global variables, even across different .py files, and one typycal usage case I saw was: initialize one global object in one .py file, and on other .py files, the code just fearlessly start using the global object, assuming that it must have been initialized somewhere else, which under C++ is definitely unaccept by myself, due to the problem I specified above.
I am not sure if the above use case is common practice in Python (Pythonic), and how does Python solve this kind of global variable initialization order problem in general?
Under C++, there is a well known problem called global/static variable initialization order fiasco, due to C++'s inability to decide which global/static variable would be initialized first across compilation units,
I think that statement highlights a key difference between Python and C++: in Python, there is no such thing as different compilation units. What I mean by that is, in C++ (as you know), two different source files might be compiled completely independently from each other, and thus if you compare a line in file A and a line in file B, there is nothing to tell you which will get placed first in the program. It's kind of like the situation with multiple threads: you cannot say whether a particular statement in thread 1 will be executed before or after a particular statement in thread 2. You could say C++ programs are compiled in parallel.
In contrast, in Python, execution begins at the top of one file and proceeds in a well-defined order through each statement in the file, branching out to other files at the points where they are imported. In fact, you could almost think of the import directive as an #include, and in that way you could identify the order of execution of all the lines of code in all the source files in the program. (Well, it's a little more complicated than that, since a module only really gets executed the first time it's imported, and for other reasons.) If C++ programs are compiled in parallel, Python programs are interpreted serially.
Your question also touches on the deeper meaning of modules in Python. A Python module - which is everything that is in a single .py file - is an actual object. Everything declared at "global" scope in a single source file is actually an attribute of that module object. There is no true global scope in Python. (Python programmers often say "global" and in fact there is a global keyword in the language, but it always really refers to the top level of the current module.) I could see that being a bit of a strange concept to get used to coming from a C++ background. It took some getting used to for me, coming from Java, and in this respect Java is a lot more similar to Python than C++ is. (There is also no global scope in Java)
I will mention that in Python it is perfectly normal to use a variable without having any idea whether it has been initialized/defined or not. Well, maybe not normal, but at least acceptable under appropriate circumstances. In Python, trying to use an undefined variable raises a NameError; you don't get arbitrary behavior as you might in C or C++, so you can easily handle the situation. You may see this pattern:
try:
duck.quack()
except NameError:
pass
which does nothing if duck does not exist. Actually, what you'll more commonly see is
try:
duck.quack()
except AttributeError:
pass
which does nothing if duck does not have a method named quack. (AttributeError is the kind of error you get when you try to access an attribute of an object, but the object does not have any attribute by that name.) This is what passes for a type check in Python: we figure that if all we need the duck to do is quack, we can just ask it to quack, and if it does, we don't care whether it's really a duck or not. (It's called duck typing ;-)
Python import executes new Python modules from beginning to end. Subsequent imports only result in a copy of the existing reference in sys.modules, even if still in the middle of importing the module due to a circular import. Module attributes ("global variables" are actually at the module scope) that have been initialized before the circular import will exist.
main.py:
import a
a.py:
var1 = 'foo'
import b
var2 = 'bar'
b.py:
import a
print a.var1 # works
print a.var2 # fails
I'm writing an application in Python, and I've got a number of universal variables (such as the reference to the main window, the user settings, and the list of active items in the UI) which have to be accessible from all parts of the program1. I only just realized I've named the module globals.py and I'm importing the object which contains those variables with a from globals import globals statement at the top of my files.
Obviously, this works, but I'm a little leery about naming my global object the same as the Python builtin. Unfortunately, I can't think of a much better naming convention for it. global and all are also Python builtins, universal seems imprecise, state isn't really the right idea. I'm leaning towards static or env, although both have a specific meaning in computer terms which suggests a different concept.
So, what (in Python) would you call the module which contains variables global to all your other modules?
1 I realize I could pass these (or the single object containing them) as a variable into every other function I call. This ends up being infeasible, not just because it makes the startup code and function signatures really ugly.
I would try to avoid such a global container module altogether, and instead put these variables into their own modules, which can then be imported from all parts of the system.
For example, the main window would probably go into a variable in main.py. User settings could go into usersettings.py which would provide functions to view and change the settings.
If another part of the system needs to access the user settings, that's a simple matter of:
from usersettings import get_setting, set_setting
...
# Do stuff with settings
A similar approach could probably be used for other stuff that needs to be globally accessible. This leads to clearer separation of concerns and more testable code, since you can test modules in isolation without depending on the globals module all the time.
I'd call it env. There's little risk that someone will confuse it with os.environ (especially if you organize your code so that you can call it myapp.environ).
I'd also make everything exposed by myapp.environ a property of a class, so that I can put breakpoints in the setter when the day comes that I need to.
`config` or `settings`
top? top_level?
from globals import Globals
This will fix the conflict and also follows PEP 8 recommendations.
Also, in other cases like this, Roget's Thesaurus is your friend. I always keep a copy nearby.
global is a keyword, not a built-in. 'globals' is not a keyword, but is a built-in function. It can be assigned to, but is bad practice. Code checkers like pylint and pychecker can catch these accidental assignments. How about config?