i'm stucked at a point and i can't progress, sorry for this silly question. I searched a lot for that but i couldn't know what i am missing. Please help me.
I studied modules and classes in python. Now i want to make some operations using python and apt. I'm studying from : http://apt.alioth.debian.org/python-apt-doc/library/apt.cache.html
However, i couldn't understand, the module is apt.cache, as shown in the top of the page. I expected that object should be created by writing apt.cache.Cache(), but object is created by writing apt.Cache(), as shown below. Why?
import apt
import apt.progress
# First of all, open the cache
cache = apt.Cache()
# Now, lets update the package list
cache.update()
# We need to re-open the cache because it needs to read the package list
cache.open(None)
# Now we can do the same as 'apt-get upgrade' does
cache.upgrade()
# or we can play 'apt-get dist-upgrade'
cache.upgrade(True)
# Q: Why does nothing happen?
# A: You forgot to call commit()!
cache.commit(apt.progress.TextFetchProgress(),
apt.progress.InstallProgress())
Second similar question is about below code, Cache class is imported from module apt.cache. I expected that object would be created by writing apt.cache.Cache(), but it is created by writing apt.Cache(). Why?
>>> from apt.cache import FilteredCache, Cache, MarkedChangesFilter
>>> cache = apt.Cache()
>>> changed = apt.FilteredCache(cache)
>>> changed.set_filter(MarkedChangesFilter())
>>> print len(changed) == len(cache.get_changes()) # Both need to have same length
True
Thanks in advance
If you look at the __init__.py file of the apt package, you see the line:
__all__ = ['Cache', 'Cdrom', 'Package']
The python documentation says:
The import statement uses the following convention: if a package’s
__ init__.py code defines a list named all, it is taken to be the list of module names that should be imported when from package import
* is encountered.
That's the reason why you can use apt.Cache()
For the second part of your question you can import directly the Cache class with
from apt.cache import Cache
cache = Cache()
You can also import the Cache class with
import apt
cache = apt.Cache() //because of the __all__ variable in __init__.py
cache = apt.cache.Cache() //because it's a fully qualified name
Related
I have issues understanding some subtleties of the Python import system. I have condensed my doubts around a minimal example and a number of concrete and related questions detailed below.
I have defined a package in a folder called modules, whose content is an __init__.py and two regular modules, one with general functionality for the package and other with the definitions for the end user. The content is as simple as:
init.py
from .base import *
from .implementation import *
base.py
class FactoryClass():
registry = {}
#classmethod
def add_to_registry(cls, newclass):
cls.registry[newclass.__name__] = newclass
#classmethod
def getobject(cls, classname, *args, **kwargs):
return cls.registry[classname](*args, **kwargs)
class BaseClass():
def hello(self):
print(f"Hello from instance of class {type(self).__name__}")
implementation.py
from .base import BaseClass, FactoryClass
class First(BaseClass):
pass
class Second(BaseClass):
pass
FactoryClass.add_to_registry(First)
FactoryClass.add_to_registry(Second)
The user of the package will use it as:
import modules
a = modules.FactoryClass.getobject("First")
b = modules.FactoryClass.getobject("Second")
a.hello()
b.hello()
This works. The problem comes because I'm developing this, and my workflow includes adding functionality in implementation.py and then continaully test it by reloading the module. But I can not understand/predict what module I have to reload to have the functions updated. I'm making changes that have no effect and it drives me crazy (until yesterday I was working on a large .py file with all code lumped together, so I had none of these problems).
Here are some test I have done, and I'd like to understand what's happening and why.
First, I start commenting out all mentions to Second class in implementation.py (to pretend it was not yet developed):
from importlib import reload
import modules
modules.base.FactoryClass is modules.FactoryClass # returns True
modules.FactoryClass.registry # just First class is in registry
a = modules.FactoryClass.getobject("First")
b = modules.FactoryClass.getobject("Second") # raises KeyError as expected
This code and its output is pretty clear. The only thing that puzzles me is why there is a modules.base module at all (I did not import it!). Further, it is redundant as their classes point to the same objects. Why importing modules also imports modules.base and modules.implementation as separate but essentially identical objects?
Now things become interesting as I comment out, i.e. I finish developing Second, and I'd like to test it without having to restart the Python session. I have tried 3 different reloads:
reload (modules)
This does absolutely nothing. I'd expect some sort of recursivity, but as I have found in many other threats, this is the expected behavior.
Now I try to manually reload one of those "unexpected" modules:
reload (modules.implementation)
modules.base.FactoryClass is modules.FactoryClass # True
modules.FactoryClass.registry # First and Second
a = modules.FactoryClass.getobject("First")
b = modules.FactoryClass.getobject("Second") # Works as expected
This seems to be the right way to go. It updates the module contents as expected and the new functionality is usable. What puzzles me is why modules.FactoryClass has been updated (its registry) despite the fact that I did not reload the modules.base module. I'd expect this function to stay "outdated".
Finally, and starting from the just freshly uncommented version, I have tried
reload (modules.base)
modules.base.FactoryClass is modules.FactoryClass # False
modules.FactoryClass.registry # just First class is in registry
modules.base.FactoryClass.registry # empty
a = modules.base.FactoryClass.getobject("First")
b = modules.base.FactoryClass.getobject("Second") # raises KeyError
This is very odd. modules.FactoryClass is outdated (Second is unknown). modules.base.Factory is empty. Why are now modules.FactoryClass and modules.base.FactoryClass different objects?
Could someone explain why the three different versions of reload a package have so different behaviour?
You are confused about how the Python import system works, so I strongly recommend you read the corresponding documentations : the import system and importlib.reload.
A foreword : code hot-reloading in Python is tricky. I recommend to not do that if it is not required. You have seen it yourself : bugs are very tricky.
Then to your questions :
Why importing modules also imports modules.base and modules.implementation as separate but essentially identical objects?
As #Kemp answered as a comment (and I upvoted), imports are transitive. When you import a, Python will parse/compile/execute the corresponding library file. If the module does import b then Python will do it again for the b library file, and again and again. You don't see it, but when your program starts there is already a lot of things that have been imported.
Given this file :
print("nothing else")
When I set my debugger to pause before executing the print line, if I look into sys.modules I already have 338 different libraries imported : builtins (where print came from), sys, itertools, enum, json, ...
Understand that "no visible import statement" does not mean "nothing have been imported".
When you execute import a, Python will start by checking its sys.modules cache to determine if the library have already been read from disk, parsed, compiled and executed into a module object. If this library was not yet imported during this program, then Python will take the time to do all that. But because it is slow, Python optimize with a cache.
The result is a module object, that gets bind the current namespace, so that you can access it.
We can summerize it like that :
def import_library(name: str) -> Module:
if name not in sys.modules:
# cache miss
filepath = locate_library(name)
bytecode = compile_library(filepath)
module = execute(bytecode)
sys.modules[name] = module
# in any case, at this point, the module is available
return sys.modules[name]
You are thus confusing module objects with variables.
In any module you can declare variables with whatever name (but allowed by Python's grammar). And some of them will reference modules.
here is an example :
# file: main.py
import lib # create a variable named `lib` referencing the `lib` library
import lib as horse # create a variable named `horse` referencing the `lib` library
print(lib.a.number) # 14
print(horse.a.number) # 14
print(lib is horse) # True
print(lib.a.sublib.__name__) # lib.sublib
import lib.sublib
from lib import sublib
import lib.sublib as lib_sublib
print((lib.sublib is sublib, sublib is lib_sublib, lib.a.zebra is sublib)) # (True, True, True)
import sys
print(sys.modules["lib"] is lib) # True
print(sys.modules["lib.sublib"] is sublib) # True
print(lib.sublib.package_color) # blue
print(lib.sublib.color) # AttributeError: module 'lib.sublib' has no attribute 'color'
# file: lib/__init__.py
from . import a
# file: lib/a.py
from . import sublib
from . import sublib as zebra
number = 14
# file: lib/sublib/__init__.py
from .b import color as package_color
# file: lib/sublib/b.py
color = "blue"
Python offers a lot of flexibility about how to import things, what to expose, how to access. But I admit it is confusing.
Also take a look at the role of __all__ in __init__.py. Given that, you should now understand your question on subpackage naming/visibility.
reload (modules) This does absolutely nothing. I'd expect some sort of recursivity, but as I have found in many other threats, this is the expected behavior.
Given what I explained, can you now understand what it does ? And why what it does is not what you want it to do ?
Because what you want is to get modules.implementations hot-reloaded, but you ask for modules.
>>> from importlib import reload
>>> import lib
>>> lib.sublib.package_color
'blue'
>>> # I edit the "b.py" file so that the color is "red"
>>> old_lib = lib
>>> new_lib = reload(lib)
>>> lib is new_lib, lib is old_lib
(True, True)
>>> lib.sublib.package_color
'blue'
>>> lib.sublib.b.color
'red'
>>> import sys
>>> sys.modules["lib.sublib.b"].color
'red'
First, the top-level reload did not work, because what the file only did was import sublib, which hit the cache, so nothing really gets done.
You have to reload the actual module for its content to takes effect. But it does not work magically : it will create new objects (module-level definitions) and put them into the same module object, but it can't update references that may exist on the preceding module's content. That is why we see a "blue" even after the module has been reloaded : the package_color is a reference to the first version's color variable, it does not get updated when the module is reloaded. This is dangerous : there may be different copies of similar things lying around.
why modules.FactoryClass has been updated
You are reloading modules.implementation in this case. What happens is that it reloads the whole file to populate the module object, I highlighted the perceived effects :
from .base import BaseClass, FactoryClass # relative library "base" already in the `sys.modules` cache
class First(BaseClass): # was already defined, redefined
pass
class Second(BaseClass): # was not defined yed, created
pass
FactoryClass.add_to_registry(First) # overwritting the registry for "First" with the redefinition of class `First`
FactoryClass.add_to_registry(Second) # registering for "Second" the definition of class `Second`
You can see it another way :
>>> import modules
>>> from importlib import reload
>>> First_before = modules.implementation.First
>>> reload(modules.implementation)
<module 'modules.implementation' from 'C:\\PycharmProjects\\stack_overflow\\68069397\\modules\\implementation.py'>
>>> First_after = modules.implementation.First
>>> First_before_reload is First_after_reload
False
When you are reloading a module, Python will re-execute all the code, in a way that may be different than the previous time(s). Here, each time you are doing FactoryClass.add_to_registry so the FactoryClass gets updated with the (re)definitions.
Why are now modules.FactoryClass and modules.base.FactoryClass different objects?
Because you reloaded modules.base, creating a new FactoryClass class object, but the from .base import BaseClass, FactoryClass does not get reloaded, so it is still using the class object from before the reload.
Because you reloaded, you got yourselves copy of everything. The problem is that you still have lingering references to versions of before the reload.
I hope it answers your questions.
Import is not easy, but reloading is notably tricky.
If you truly desire to reload your code, then you will have to take extra extra extra care to correctly re-import everything, in the correct order (if such an order exist, if there is not too much side-effects). You need a proper script to update everything correctly, and in case it does not work perfectly, you will frequently have horrible, sad and mind-bending bugs.
But if you prefer keep your sanity, if your workflow does not require you to reload parts of the program, just close it and restart it. There is nothing wrong with that. It is the simple solution. The sane solution.
TL;DR: don't use importlib.reload if you don't know how it works exactly and understand the risks.
tl;dr
I create a dummy module module in /tmp, insert its path do sys.path, import the dummy module, del it when it's not needed anymore, remove it sys.path and I am still able to import it. Why?
Full question
Consider this pytest fixture code and its comments:
def test_function(temp_dir):
with open(os.path.join(temp_dir, 'dum.py'), 'w') as f:
f.write("""
class A:
\"\"\"
test docstring
\"\"\"
test: int
test_2: str = "asdf"
class B(A):
pass
""")
import sys
sys.path.insert(1, temp_dir)
assert temp_dir in sys.path # assertion passes
# noinspection PyUnresolvedReferences
import dum # import successful
yield dum
# teardown executed after each usage of the fixture
del dum # deletion successful
sys.path.remove(temp_dir) # removing from path successful
assert temp_dir not in sys.path # assertion passes
with pytest.raises(ModuleNotFoundError): # fails - importing dum is successful
import dum
Why? How do I actually remove it? It happens even when I force sys.path to be empty by executing sys.path = []. There is no dum.py anywhere in the project directory. Am I missing something? Is Python caching the imported modules even after their deletion?
After you import a module once, the next time you want to import it, Python first checks the collection of already imported modules, and uses the module that was already loaded. Therefore, the second time you import a module, sys.path is not even consulted.
This is essential, if you think of it: suppose you are directly importing some modules, and each of them uses, some other modules, and so on. If a module had to be retrieved from disk and parsed, every time an import statement was encountered, you would be spending and awful lot of time just waiting for modules to load.
It is also worth noting that the second time a module A imports a module B, nothing really happens. There are ways to explicitly reload an already-loaded module, but usually, you do not want to do that (web servers with live update are a notable exception).
Great and detailed information was provided by Amitai Irron in his answer, so go there to understand why. To answer the question of how to actually remove it (and not be able to import it again), you need to execute not only all the deletions for the original question but also:
del sys.modules['dum']
This removes the module from sys.modules which is keeping all the cached modules. Thanks to 0x5453 for pointing me to sys.modules.
I have a main file that imports a class from another file as such:
from pybrain.rl.environments.HoldemTask import HoldemTask.
When I change HoldemTask.py, the changes are not reflected in the main file. The only workaround I have found is to run Pybrain's
python setup.py install
Can I reload the module or something? Reload() doesn't seem to work.
First off: python setup.py install generally makes a copy of the code it is installing, so if you're finding that you need to run that before changes take effect, chances are that for development you should be adjusting your PYTHONPATH or sys.path so that your relevant imports come directly from the source tree rather than from the Python site-packages library. You can quickly check which file your code is importing by putting this on the top of the main file when you run it:
from pybrain.rl.environments import HoldemTask # module object, not class
print(HoldemTask.__file__)
Secondly, in general it is far better to restart a Python process when making code changes to ensure that they come into effect. If you really need to get changes to show up without a restart, read on.
Reloading a module in Python only affects future imports. For a reload to work in-process, you have to replace the imported class object after the reload. For example, in the context of the "main file" performing the import you listed (inside a class method or function is fine):
# we need a module object to reload(), not the class inside it
from import pybrain.rl.environments import HoldemTask as HoldemTask_module
reload(HoldemTask_module)
# we then need to replace the old class object with the reloaded one
# in the main file's module-wide (aka "global") namespace
global HoldemTask
HoldemTask = HoldemTask_module.HoldemTask
One final caveat here is that any existing HoldemTask objects will continue to use the old code, because they embed in themselves a reference to the pre-reload class object. The only way for an in-process reload to be complete is if the code is specifically written to throw away every instance of anything it made based on the original module.
well, as the title say, i got a group of import, all import a class, all in the same folder as the script running it:
from lvl import lvl
from get import get
from image import image
from video import vid
from video import MLStripper
from system import system
from setting import setting
from listsearch import lists
python3 doesn't have reload iirc but there is imp.reload() but it doesn't seem to work,
it just throw a error saying it not a module (it a class so it doesn't work)
after every little edit in those class that are imported, i would need to restart the script
isn't there a way to reload/reimport the class to show the effect of the edit without needing to start the script or rewriting most of the script so that imp.reload() works?
python3, linux (but prefer if it also work on window)
edit1:
example: if i use:
import system
system.system.temp()
it return:
65°C
if i change it to show °F and reload it using imp.reload
imp.reload(system)
system.system.temp()
it return:
149°F
so, it works BUT if i use
import system as _system
from system import system
system.temp()
it return:
65°C
then i change it to show °F and reload it using imp.reload
imp.reload(_system)
from system import system
system.temp()
it still return
65°C
BUT again, if i call it this way:
_system.system.temp()
it return
149°F
idk why it that but it is cause it happen in a while loop?
edit2:
file name: system.py:
before changing for test:
class system:
def temp():
temperature = open("/sys/class/thermal/thermal_zone0/temp","r").read()
temperature = temperature[:2]
return(temperature+"°C")
after changing for test:
class system:
def temp():
temperature = open("/sys/class/thermal/thermal_zone0/temp","r").read()
temperature = temperature[:2]
temperature = str(9.0 / 5.0 * int(temperature) + 32).split(".")[0]
return(temperature+"°C")
You can only reload a module:
The argument must be a module object, so it must have been successfully imported before.
In your case, you don't have any reference to the module object. You will have to import it, even if you don't want to use it for anything else, just for the sake of calling reload later.
Also, after the reload, you have re-import the names:
Other references to the old objects (such as names external to the module) are not rebound to refer to the new objects and must be updated in each namespace where they occur if that is desired.
When you do from foo import bar, that bar is a "name external to the module", so you have to rebind it explicitly.
If you think about it, it has to work this way. There's no way reload can enumerate all objects whose definition is dependent on the previous version of the module to update them. Even if it could, there could be infinite cycles. And what would happen if the new version didn't even have a definition for a class that was in the old one? Or if the class were defined dynamically?
Looking at it another way, from foo import bar is very similar to import foo; bar = foo.bar. bar is a name in your namespace, not foo's namespace, so reload(foo) will not touch it; you need to copy the new foo.bar over again.
The easy way to solve all of these problems is to just repeat all of your from foo import bar lines after the reload.
For simple cases:
import video
from video import MLStripper
# ... later
imp.reload(video)
from video import MLStripper
However, most of your examples have an obvious naming conflict: once you from video import video, you can no longer reload(video). So, you need another reference to the video module object.
Python keeps one around for you, which you can use:
imp.reload(sys.modules['video'])
from video import MLStripper
Or, alternatively, you can use an as clause, or just an = assignment, to give it whatever name you want.
import video as _video
from video import video
# ... later
imp.reload(_video)
from video import video
From your comments and your edited question, it sounds like you have a further problem. Let's use one of the simple non-colliding cases to discuss it.
I believe you're actually doing something like this:
import video
from video import MLStripper
stripper = MLStripper('foo")
# ... later
imp.reload(video)
from video import MLStripper
The first line will successfully reload the video module, and the second will copy its MLStripper class into your globals, so any new MLStripper instances you created will be of the new type.
But that doesn't affect any existing MLStripper instances, like stripper.
Just like MLStripper was, stripper is one of those "names external to the module". But it's actually even worse. In order to adjust it, reload would have to figure out what its state would have been, had the new version of the code been in effect from the time it was created. It should be obvious that this is an unsolvable problem.
If you know the instances you want to patch up, you can deal with them in effectively the same way you dealt with the classes: just create them again:
imp.reload(video)
from video import MLStripper
stripper = MLStripper('foo")
If that's not good enough, there are three hacky possibilities that may be what you want:
Monkeypatch the methods, attributes, etc. into the instance(s) and their __class__(es).
Patch the instances' __class__ attribute directly, so anything that was inherited from the class will now be inherited from the new class.
Serialize the instances with pickle before the reload, then deserialize after.
For very simple cases, all three of these will work. For more complex cases, you will have to understand what you're doing.
Note that you can wrap a lot of this stuff up in a function, but you have to understand how locals and globals work (and how import and reload work) or you're going to end up confusing yourself.
A simpler solution is to just create "dump all state" and "load all state" functions. Then you can dump everything, quit, relaunch, and restore. The Python tutorial and the ipython docs both describe a few different ways to do this in place of using reload; it's probably worth going back and rereading those.
Access the module through sys.modules, reload that, then reassign the imported names:
imp.reload(sys.modules['lvl'])
from lvl import lvl
imp.reload(sys.modules['get'])
from get import get
etc.
All the from something import name syntax does is import something then bind name to the same object something.name refers to.
By using sys.modules you don't have to explicitly import the module again, and can reach the new definitions of the objects for rebinding after reloading.
I need to make a copy of a socket module to be able to use it and to have one more socket module monkey-patched and use it differently.
Is this possible?
I mean to really copy a module, namely to get the same result at runtime as if I've copied socketmodule.c, changed the initsocket() function to initmy_socket(), and installed it as my_socket extension.
You can always do tricks like importing a module then deleting it from sys.modules or trying to copy a module. However, Python already provides what you want in its Standard Library.
import imp # Standard module to do such things you want to.
# We can import any module including standard ones:
os1=imp.load_module('os1', *imp.find_module('os'))
# Here is another one:
os2=imp.load_module('os2', *imp.find_module('os'))
# This returns True:
id(os1)!=id(os2)
Python3.3+
imp.load_module is deprecated in python3.3+, and recommends the use of importlib
#!/usr/bin/env python3
import sys
import importlib.util
SPEC_OS = importlib.util.find_spec('os')
os1 = importlib.util.module_from_spec(SPEC_OS)
SPEC_OS.loader.exec_module(os1)
sys.modules['os1'] = os1
os2 = importlib.util.module_from_spec(SPEC_OS)
SPEC_OS.loader.exec_module(os2)
sys.modules['os2'] = os2
del SPEC_OS
assert os1 is not os2, \
"Module `os` instancing failed"
Here, we import the same module twice but as completely different module objects. If you check sys.modules, you can see two names you entered as first parameters to load_module calls. Take a look at the documentation for details.
UPDATE:
To make the main difference of this approach obvious, I want to make this clearer: When you import the same module this way, you will have both versions globally accessible for every other module you import in runtime, which is exactly what the questioner needs as I understood.
Below is another example to emphasize this point.
These two statements do exactly the same thing:
import my_socket_module as socket_imported
socket_imported = imp.load_module('my_socket_module',
*imp.find_module('my_socket_module')
)
On second line, we repeat 'my_socket_module' string twice and that is how import statement works; but these two strings are, in fact, used for two different reasons.
Second occurrence as we passed it to find_module is used as the file name that will be found on the system. The first occurrence of the string as we passed it to load_module method is used as system-wide identifier of the loaded module.
So, we can use different names for these which means we can make it work exactly like we copied the python source file for the module and loaded it.
socket = imp.load_module('socket_original', *imp.find_module('my_socket_module'))
socket_monkey = imp.load_module('socket_patched',*imp.find_module('my_socket_module'))
def alternative_implementation(blah, blah):
return 'Happiness'
socket_monkey.original_function = alternative_implementation
import my_sub_module
Then in my_sub_module, I can import 'socket_patched' which does not exist on system! Here we are in my_sub_module.py.
import socket_patched
socket_patched.original_function('foo', 'bar')
# This call brings us 'Happiness'
This is pretty disgusting, but this might suffice:
import sys
# if socket was already imported, get rid of it and save a copy
save = sys.modules.pop('socket', None)
# import socket again (it's not in sys.modules, so it will be reimported)
import socket as mysock
if save is None:
# if we didn't have a saved copy, remove my version of 'socket'
del sys.modules['socket']
else:
# if we did have a saved copy overwrite my socket with the original
sys.modules['socket'] = save
Here's some code that creates a new module with the functions and variables of the old:
def copymodule(old):
new = type(old)(old.__name__, old.__doc__)
new.__dict__.update(old.__dict__)
return new
Note that this does a fairly shallow copy of the module. The dictionary is newly created, so basic monkey patching will work, but any mutables in the original module will be shared between the two.
Edit: According to the comment, a deep copy is needed. I tried messing around with monkey-patching the copy module to support deep copies of modules, but that didn't work. Next I tried importing the module twice, but since modules are cached in sys.modules, that gave me the same module twice. Finally, the solution I hit upon was removing the modules from sys.modules after importing it the first time, then importing it again.
from imp import find_module, load_module
from sys import modules
def loadtwice(name, path=None):
"""Import two copies of a module.
The name and path arguments are as for `find_module` in the `imp` module.
Note that future imports of the module will return the same object as
the second of the two returned by this function.
"""
startingmods = modules.copy()
foundmod = find_module(name, path)
mod1 = load_module(name, *foundmod)
newmods = set(modules) - set(startingmods)
for m in newmods:
del modules[m]
mod2 = load_module(name, *foundmod)
return mod1, mod2
Physically copy the socket module to socket_monkey and go from there? I don't feel you need any "clever" work-around... but I might well be over simplifying!