I noticed that when my object contains an explicit reference to a module, pickling it will fail because of this.
However, if I stick a reference to a function from that module into my object instead, it can be picked and unpickled successfully.
How come Python can pickle functions, but not modules?
Because they didn't code support for it. C level types (and even modules written in Python are implemented with a C level type) require pickle support to be coded explicitly.
It's not very easy to determine what should be pickled if a module is allowed to be pickled; importing the same name on the other side would seem simple, but if you're actually trying to pickle the module itself, the worry would be that you want to pickle module state as well. It's even more confusing if the module is a C extension module, where module state may not even be exposed to Python itself, only used internally at the C layer.
Given that usually you want specific things from a module, not the whole module (which is usually not referenced as state, just imported at the top level), the benefits of supporting pickling for modules are limited, and the semantics are unclear, they haven't bothered to implement it.
Related
There are two Python scripts: master.py and to_be_imported.py
Here is the master.py:
import os
os.foo = 12345
import to_be_imported
And here is the to_be_imported.py:
import os
if hasattr(os, 'foo'):
print 'os hasattr foo: %s'%os.foo
Now when I run master.py I get this:
os hasattr foo: 12345
indicating that the imported module to_be_imported.py picks up the variable declared inside the process that imported it (master.py).
While it works fine I would like to know why it works and also to make sure it is a safe practice.
If a module is already imported, subsequent imports to the module uses the cached version of the module. Even if you reference it via different names as in the following case
import os as a
import os as b
Both refer to the same os module that was imported the first time. So it is obvious that the variable assigned to a module will be shared.
You can verify it using the built-in python function id()
Nothing is a bad idea per se, but you must remember few things:
Modules are objects in Python. They are loaded only once and added to sys.modules. These objects can also be added attributes like regular objects (with no messy implementation of setattr).
Since they are objects, but not instantiable ones, you must consider them as singletons (they are singletons, after all), and you must consider the disadvantages and benefits of such model:
a. Singletons are only one object. Are you sure that accessing their attributes is concurrency-safe?
b. Modules are global objects. Are you sure you can track the whole behavior and access to their members? Are you sure you will be able to debug errors there?
Is the code something you will work with others?
While no idea is better than other, good practices tell us that using global variables is not well-seen, specially if we have a team to work with. On the other hand: if your code is concurrent and/or reentrant, avoid using global variables or relying on module attributes. OTOH you will have no problem assigning attributes like that. They will last for the life of your script execution.
This is not the place to chose the best alternative. Depending on how you state your problem, you can ask it either on programmers or codereview. You can chose many variants to share state without using global variables in modules, like passing those variables inside a state back and forth across arguments, or learning and using OOP. But, again, this site is no scope for that.
This question already has answers here:
create a global function in python
(1 answer)
How to add builtin functions?
(4 answers)
Closed 9 years ago.
I am looking to make a global function in python 3.3 like the print function. In particular we have embedded python in our own application and we want to expose a simple 'debug(value)' global function, available to any script. It is possible for us to do this by attaching the function to a module, however, for convenience it would be easier for it to be global like 'print(value)'.
How do you declare a global function that becomes available to any python file without imports, or is this a black box in python? Is it possible to do from the C side binding?
This is almost always a bad idea, but if you really want to do it…
If you print out or otherwise inspect the print function, you'll see it's in the module builtins. That's your clue. So, you can do this:
debugmodule.py:
import builtins
builtins.debug = debug
Now, after an import debugmodule, any other module can just call debug instead of debugmodule.debug.
Is it possible to do from the C side binding?
In CPython, C extension module can basically do the same thing that a pure Python module does. Or, even more simply, write a _debugmodule.so in C, then a debugmodule.py that imports it and copies debug into builtins.
If you're embedding CPython, you can do this just by injecting the function into the builtins module before starting the script/interactive shell/whatever, or at any later time.
While this definitely works, it's not entirely clear whether this is actually guaranteed to work. If you read the docs, it says:
As an implementation detail, most modules have the name __builtins__ made available as part of their globals. The value of __builtins__ is normally either this module or the value of this module’s __dict__ attribute. Since this is an implementation detail, it may not be used by alternate implementations of Python.
And, at least in CPython, it's actually that __builtins__ module or dict that gets searched as part of the lookup chain, not the builtins module. So, it might be possible that another implementation could look things up in __builtins__ like CPython does, but at the same time not make builtins (or user modifications to it) automatically available in __builtins__, in which case this wouldn't work. (Since CPython is the only 3.x implementation available so far, it's hard to speculate…)
If this doesn't work on some future Python 3.x implementation, the only option I can think of is to get your function injected into each module, instead of into builtins. You could do that with a PEP-302 import hook (which is a lot easier in 3.3 than it was when PEP 302 was written… read The import system for details).
In 2.x, instead of a module builtins that automatically injects things into a magic module __builtins__, there's just a magic module __builtin__ (notice the missing s). You may or may not have to import it (so you might as well, to be safe). And you may or may not be able to change it. But it works in (at least) CPython and PyPy.
So, what's the right way to do it? Simple: instead of import debugmodule, just from debugmodule import debug in all of your other modules. That way it ends up being a module-level global in every module that needs it.
Basically, I have a long running process where I would like to be able to unimport modules and recover memory via the gc. I've read about deleting modules How do I unload (reload) a Python module? and it seems like there are still dangling references that block gc.
However, what if I import and use the module only inside a namespace. In other words, something like this:
ns = {}
exec somecode in ns
Then I would cleanup sys.modules inside the namespace and finish off by deleting the namespace itself.
Would that free up the memory for reuse in CPython?
If not, then is it possible to access some part of the Python C API using ctypes, to accomplish this?
The important part of the end result is that memory is released so that a process running for weeks or months, can reliably unimport a module without reloading it. Of course it is entirely possible that any given module would be loaded and unloaded many times during that time period. I am assuming that a module could create a large number of objects while it is loaded, and that the normally cleanup (sys.modules and del) would leave those objects in memory forever.
Jochen: Yes, I could work around this in a number of ways but I am interested in exploring the limits of Python.
To un-import a module you will need to ensure that you've removed all references to the module. That means you have to delete the references from all modules that imported it, delete the reference from sys.modules, delete any references to any functions or classes defined in that module and delete all references to objects that are instances of classes defined in the module.
In almost all situations this is more effort than it is worth to retrieve what is a comparatively small amount of memory. If you really want to try this then gc.get_referrers() might be useful as you can delete all but one known reference the the module and then trace back to find what else still references it.
If what you really want is to avoid memory leaks, Your best bet is probably to arrange for importing the module once, in the normal way, with sys.modules in it's usual state. No matter how many times the module is later imported, it will not take any more memory, since the import machinery will just keep returning the same module.
If for some reason, this still doesn't suit, say modules are being created dynamically and only need to be used once, exec certainly isn't the solution. You should consider using an alternative execution model, perhaps forking new processes.
I was wondering whether objects serialized using CPython's cPickle are readable by using IronPython's cPickle; the objects in question do not require any modules outside of the built-ins that both Cpython and IronPython include. Thank you!
If you use the default protocol (0) which is text based, then things should work. I'm not sure what will happen if you use a higher protocol. It's very easy to test this ...
It will work because when you unpickle objects during load() it will use the current definitions of whatever classes you have defined now, not back when the objects were pickled.
IronPython is simply Python with the standard library implemented in C# so that everything emits IL. Both the CPython and the IronPython pickle modules have the same functionality, except one is implemented in C and the other in C#.
I am a Python newbie coming from a C++ background. While I know it's not Pythonic to try to find a matching concept using my old C++ knowledge, I think this question is still a general question to ask:
Under C++, there is a well known problem called global/static variable initialization order fiasco, due to C++'s inability to decide which global/static variable would be initialized first across compilation units, thus a global/static variable depending on another one in different compilation units might be initialized earlier than its dependency counterparts, and when dependant started to use the services provided by the dependency object, we would have undefined behavior. Here I don't want to go too deep on how C++ solves this problem. :)
On the Python world, I do see uses of global variables, even across different .py files, and one typycal usage case I saw was: initialize one global object in one .py file, and on other .py files, the code just fearlessly start using the global object, assuming that it must have been initialized somewhere else, which under C++ is definitely unaccept by myself, due to the problem I specified above.
I am not sure if the above use case is common practice in Python (Pythonic), and how does Python solve this kind of global variable initialization order problem in general?
Under C++, there is a well known problem called global/static variable initialization order fiasco, due to C++'s inability to decide which global/static variable would be initialized first across compilation units,
I think that statement highlights a key difference between Python and C++: in Python, there is no such thing as different compilation units. What I mean by that is, in C++ (as you know), two different source files might be compiled completely independently from each other, and thus if you compare a line in file A and a line in file B, there is nothing to tell you which will get placed first in the program. It's kind of like the situation with multiple threads: you cannot say whether a particular statement in thread 1 will be executed before or after a particular statement in thread 2. You could say C++ programs are compiled in parallel.
In contrast, in Python, execution begins at the top of one file and proceeds in a well-defined order through each statement in the file, branching out to other files at the points where they are imported. In fact, you could almost think of the import directive as an #include, and in that way you could identify the order of execution of all the lines of code in all the source files in the program. (Well, it's a little more complicated than that, since a module only really gets executed the first time it's imported, and for other reasons.) If C++ programs are compiled in parallel, Python programs are interpreted serially.
Your question also touches on the deeper meaning of modules in Python. A Python module - which is everything that is in a single .py file - is an actual object. Everything declared at "global" scope in a single source file is actually an attribute of that module object. There is no true global scope in Python. (Python programmers often say "global" and in fact there is a global keyword in the language, but it always really refers to the top level of the current module.) I could see that being a bit of a strange concept to get used to coming from a C++ background. It took some getting used to for me, coming from Java, and in this respect Java is a lot more similar to Python than C++ is. (There is also no global scope in Java)
I will mention that in Python it is perfectly normal to use a variable without having any idea whether it has been initialized/defined or not. Well, maybe not normal, but at least acceptable under appropriate circumstances. In Python, trying to use an undefined variable raises a NameError; you don't get arbitrary behavior as you might in C or C++, so you can easily handle the situation. You may see this pattern:
try:
duck.quack()
except NameError:
pass
which does nothing if duck does not exist. Actually, what you'll more commonly see is
try:
duck.quack()
except AttributeError:
pass
which does nothing if duck does not have a method named quack. (AttributeError is the kind of error you get when you try to access an attribute of an object, but the object does not have any attribute by that name.) This is what passes for a type check in Python: we figure that if all we need the duck to do is quack, we can just ask it to quack, and if it does, we don't care whether it's really a duck or not. (It's called duck typing ;-)
Python import executes new Python modules from beginning to end. Subsequent imports only result in a copy of the existing reference in sys.modules, even if still in the middle of importing the module due to a circular import. Module attributes ("global variables" are actually at the module scope) that have been initialized before the circular import will exist.
main.py:
import a
a.py:
var1 = 'foo'
import b
var2 = 'bar'
b.py:
import a
print a.var1 # works
print a.var2 # fails