Understanding Python's `importlib.reload`

Understanding Python's `importlib.reload` - python

I'm trying to understand how the importlib.reload method actually behaves.
I'll give a boiled down example
import importlib
import sys
from pathlib import Path
import gc
def write_dummy_class(return_value):
target = Path(__file__).parent / 'test_reload_import.py'
target.write_text(
"class Dummy:\n"
f" var = {return_value}\n"
" def run(self):\n"
" print(f'Dummy.run(self) >> self.var = {id(self.var):x}')\n"
f" return self.var\n"
)
write_dummy_class(1)
from test_reload_import import Dummy
print(f'id Dummy: {id(Dummy):x}')
print(Dummy.run)
assert Dummy().run() == 1, "Initial one failed??"
write_dummy_class(2)
old_module = sys.modules["test_reload_import"]
old_dummy = old_module.Dummy # Keep a reference alive
print(f'Reloading, old module: {id(old_module):x}')
new_module = importlib.reload(old_module)
print(f'Reloaded, new module: {id(new_module):x}')
print(f'id new Dummy: {id(new_module.Dummy):x}')
print(f'id old Dummy: {id(old_dummy):x}')
print(f'id Dummy: {id(new_module.Dummy):x}')
print(new_module.Dummy.run)
new_run = new_module.Dummy().run()
assert new_run == 2, f'Dummy.run() returned {new_run} instead of 2.'
This is the output:
id Dummy: 1dd320c0fa0
<function Dummy.run at 0x000001DD325CC700>
Dummy.run(self) >> self.var = 1dd31d06930
Reloading, old module: 1dd325c7950
Reloaded, new module: 1dd325c7950
id new Dummy: 1dd320c30d0
id old Dummy: 1dd320c0fa0
<function Dummy.run at 0x000001DD325CC790>
Dummy.run(self) >> self.var = 1dd31d06930
Traceback (most recent call last):
File "test_reload.py", line 240, in <module>
assert new_run == 2, f'Dummy.run() returned {new_run} instead of 2.'
AssertionError: Dummy.run() returned 1 instead of 2.
Observations:
reloading a module returns the same memory address as the previous one for the module.
objects do get reloaded inside the module (Dummy class has another id).
But what is baffling to me is that the memory address of the class variable 'Dummy.var' still points to the OLD one.
Could someone explains this last bit to me? How is it that the class is re-loaded, but the class variables are not? Isn't the code re-interpreted? And as such, the var should be re-interpreted as well, no? So basically getting another memory address?
Which leads me to my next question: what is also not reloaded?
BTW, I know that small integers are mapped to the same memory addresses in Python. That is not what is at play here. As I'm changing a class variable from a 1 to a 2, it should be another memory address. Or if it's would be the same address, it should have a different value.
But after reloading the class, the memory address of the class variable isn't updated somehow, which baffles me. And which leads me to wonder what other objects are exhibiting the same behavior.
(Python version: 3.9.9)
Oh, and one very strange thing is that this script works perfectly fine when running under "Debug" in PyCharm. But when "Run"... It breaks at the 2nd assert.
Thanks a lot!

This is an import system bug, Python issue 31772. If a source file is updated quickly without changing the file length, the import system won't realize that it's changed.
Your importlib.reload call is reexecuting stale bytecode instead of rereading the file. That's why var isn't updated to the new value - the import system is using the old bytecode for the var = 1 version of the file.

Related

Mocking os.path.exists and os.makedirs returning AssertionError

I have a function like below.
# in retrieve_data.py
import os
def create_output_csv_file_path_and_name(output_folder='outputs') -> str:
"""
Creates an output folder in the project root if it doesn't already exist.
Then returns the path and name of the output CSV file, which will be used
to write the data.
"""
if not os.path.exists(output_folder):
os.makedirs(output_folder)
logging.info(f"New folder created for output file: " f"{output_folder}")
return os.path.join(output_folder, 'results.csv')
I also created a unit test file like below.
# in test_retrieve_data.py
class OutputCSVFilePathAndNameCreationTest(unittest.TestCase):
#patch('path.to.retrieve_data.os.path.exists')
#patch('path.to.retrieve_data.os.makedirs')
def test_create_output_csv_file_path_and_name_calls_exists_and_makedirs_once_when_output_folder_is_not_created_yet(
self,
os_path_exists_mock,
os_makedirs_mock
):
os_path_exists_mock.return_value = False
retrieve_cradle_profile_details.create_output_csv_file_path_and_name()
os_path_exists_mock.assert_called_once()
os_makedirs_mock.assert_called_once()
But when I run the above unit test, I get the following error.
def assert_called_once(self):
"""assert that the mock was called only once.
"""
if not self.call_count == 1:
msg = ("Expected '%s' to have been called once. Called %s times.%s"
% (self._mock_name or 'mock',
self.call_count,
self._calls_repr()))
raise AssertionError(msg)
AssertionError: Expected 'makedirs' to have been called once. Called 0 times.
I tried poking around with pdb.set_trace() in create_output_csv_file_path_and_name method and I'm sure it is receiving a mocked object for os.path.exists(), but the code never go pasts that os.path.exists(output_folder) check (output_folder was already created in the program folder but I do not use it for unit testing purpose and want to keep it alone). What could I possibly be doing wrong here to mock os.path.exists() and os.makedirs()? Thank you in advance for your answers!

You have the arguments to your test function reversed. When you have stacked decorators, like:
#patch("retrieve_data.os.path.exists")
#patch("retrieve_data.os.makedirs")
def test_create_output_csv_file_path_...():
They apply bottom to top, so you need to write:
#patch("retrieve_data.os.path.exists")
#patch("retrieve_data.os.makedirs")
def test_create_output_csv_file_path_and_name_calls_exists_and_makedirs_once_when_output_folder_is_not_created_yet(
self, os_makedirs_mock, os_path_exists_mock
):
With this change, if I have this in retrieve_data.py:
import os
import logging
def create_output_csv_file_path_and_name(output_folder='outputs') -> str:
"""
Creates an output folder in the project root if it doesn't already exist.
Then returns the path and name of the output CSV file, which will be used
to write the data.
"""
if not os.path.exists(output_folder):
os.makedirs(output_folder)
logging.info(f"New folder created for output file: " f"{output_folder}")
return os.path.join(output_folder, 'results.csv')
And this is test_retrieve_data.py:
import unittest
from unittest.mock import patch
import retrieve_data
class OutputCSVFilePathAndNameCreationTest(unittest.TestCase):
#patch("retrieve_data.os.path.exists")
#patch("retrieve_data.os.makedirs")
def test_create_output_csv_file_path_and_name_calls_exists_and_makedirs_once_when_output_folder_is_not_created_yet(
self, os_makedirs_mock, os_path_exists_mock
):
os_path_exists_mock.return_value = False
retrieve_data.create_output_csv_file_path_and_name()
os_path_exists_mock.assert_called_once()
os_makedirs_mock.assert_called_once()
Then the tests run successfully:
$ python -m unittest -v
test_create_output_csv_file_path_and_name_calls_exists_and_makedirs_once_when_output_folder_is_not_created_yet (test_retrieve_data.OutputCSVFilePathAndNameCreationTest.test_create_output_csv_file_path_and_name_calls_exists_and_makedirs_once_when_output_folder_is_not_created_yet) ... ok
----------------------------------------------------------------------
Ran 1 test in 0.001s
OK
Update I wanted to leave a comment on the diagnostics I performed here, because I didn't initially spot the reversed arguments, either, but the problem became immediately apparent when I added a breakpoint() the beginning of the test and printed out the values of the mocks:
(Pdb) p os_path_exists_mock
<MagicMock name='makedirs' id='140113966613456'>
(Pdb) p os_makedirs_mock
<MagicMock name='exists' id='140113966621072'>
The fact that the names were swapped made the underlying problem easy to spot.

python importlib seems to be sharing data between instances

Ok, so I am having a weird one. I am running python in a SideFX Hython (their custom build) implementation that is using PDG. The only real difference between Hython and vanilla Python is some internal functions for handling geometry data and compiled nodes, which shouldn't be an issue even though they are being used.
The way the code runs, I am generating a list of files from the disk which creates PDG work items. Those work items are then processed in parallel by PDG. Here is the code for that:
import importlib.util
import pdg
import os
from pdg.processor import PyProcessor
import json
class CustomProcessor(PyProcessor):
def __init__(self, node):
PyProcessor.__init__(self,node)
self.extractor_module = 'GeoExtractor'
def onGenerate(self, item_holder, upstream_items, generation_type):
for upstream_item in upstream_items:
new_item = item_holder.addWorkItem(parent=upstream_item, inProcess=True)
return pdg.result.Success
def onCookTask(self, work_item):
spec = importlib.util.spec_from_file_location("callback", "Geo2Custom.py")
GE = importlib.util.module_from_spec(spec)
spec.loader.exec_module(GE)
GE.convert(f"{work_item.attribValue('directory')}/{work_item.attribValue('filename')}{work_item.attribValue('extension')}", work_item.index, f'FRAME { work_item.index }', self.extractor_module)
return pdg.result.Success
def bulk_convert (path_pattern, extractor_module = 'GeoExtractor'):
type_registry = pdg.TypeRegistry.types()
try:
type_registry.registerNode(CustomProcessor, pdg.nodeType.Processor, name="customprocessor", label="Custom Processor", category="Custom")
except Exception:
pass
whereItWorks = pdg.GraphContext("testBed")
whatWorks = whereItWorks.addScheduler("localscheduler")
whatWorks.setWorkingDir(os.getcwd (), '$HIP')
whereItWorks.setValues(f'{whatWorks.name}', {'maxprocsmenu':-1, 'tempdirmenu':0, 'verbose':1})
findem = whereItWorks.addNode("filepattern")
whereItWorks.setValue(f'{findem.name}', 'pattern', path_pattern, 0)
generic = whereItWorks.addNode("genericgenerator")
whereItWorks.setValue(generic.name, 'itemcount', 4, 0)
custom = whereItWorks.addNode("customprocessor")
custom.extractor_module = extractor_module
node1 = [findem]
node2 = [custom]*len(node1)
for n1, n2 in zip(node1, node2):
whereItWorks.connect(f'{n1.name}.output', f'{n2.name}.input')
n2.cook(True)
for node in whereItWorks.graph.nodes():
node.dirty(False)
whereItWorks.disconnect(f'{n1.name}.output', f'{n2.name}.input')
print ("FULLY DONE")
import os
import hou
import traceback
import CustomWriter
import importlib
def convert (filename, frame_id, marker, extractor_module = 'GeoExtractor'):
Extractor = importlib.__import__ (extractor_module)
base, ext = os.path.splitext (filename)
if ext == '.sc':
base = os.path.splitext (base)[0]
dest_file = base + ".custom"
geo = hou.Geometry ()
geo.loadFromFile (filename)
try:
frame = Extractor.extract_geometry (geo, frame_id)
except Exception as e:
print (f'F{ frame_id } Geometry extraction failed: { traceback.format_exc () }.')
return None
print (f'F{ frame_id } Geometry extracted. Writing file { dest_file }.')
try:
CustomWriter.write_frame (frame, dest_file)
except Exception as e:
print (f'F{ frame_id } writing failed: { e }.')
print (marker + " SUCCESS")
The onCookTask code is run when the work item is processed.
Inside of the GeoExtractor.py program I am importing the geometry file defined by the work item, then converting it into a couple Pandas dataframes to collate and process the massive volumes of data quickly, which is then passed to a custom set of functions for writing binary files to disk from the Pandas data.
Everything appears to run flawlessly, until I check my output binaries and see that they escalate in file size much more than they should, indicating that either something is being shared between instances or not cleared from memory and subsequent loads of the extractor code is appending the dataframes which are named the same.
I have run the GeoExtractor code sequentially with the python instance closing between each file conversion using the exact same code and the files are fine, growing only very slowly as the geometry data volume grows, so the issue has to lie somewhere in the parallelization of it using PDG and calling the GeoExtractor.py code over and over for each work item.
I have contemplated moving the importlib stuff to the __init__() for the class leaving only the call to the member function in the onCookTask() function. Maybe even going so far as to pass a unique variable for each work item which is used inside GeoExtractor to create a closure of the internal functions so they are unique instances in memory.
I tried to do a stripped down version of GeoExtractor and since I'm not sure where the leak is, I just ended up pulling out comments with proprietary or superfluous information and changing some custom library names, but the file ended up kinda long so I am including a pastebin: https://pastebin.com/4HHS8D2W
As for CustomGeometry and CustomWriter, there is no working form of either of those libraries that will be NDA safe, so unfortunately they have to stay blackboxed. The CustomGeometry is a handful of container classes which organize all of the data coming out of the geometry, and the writer is a formatter/writer for the binary format we are utilizing. I am hoping the issue wouldn't be in either of them.
Edit 1: I fixed an issue in the example code.
Edit 2: Added larger examples.

how to avoid importing self, intra-package relationships

The directory structure is like so:
AppCenter/
main.pyw
|
|
_apps/
__init__.py
TabularApp.py
UserAdministrationApp.py
RegisterApp.py
FnAdminApp.py
PyUi/
The contents of __init__.py:
import sys
sys.path.insert(1, '.')
__all__ = ['TabularApp',
'UserAdministrationApp',
'RegisterApp',
'FnAdminApp']
The problems pop up:
When main.pyw tries to from _apps import *.
In UserAdministrationApp.py i am trying to dynamically add tooltips to some QListWidget items like so:
for app in self.__APPS__:
app_icon = str(os.path.join(app_icons, f"{app}.png")).replace('\\', '/')
icon = QIcon(app_icon)
if app != self.__class__:
ttip_txt = eval(f'_apps.{app}.__doc__')
else:
ttip_txt = self.__doc__
item = QListWidgetItem(icon, app)
item.setText(app)
item.setToolTip(ttip_txt)
wdg.addItem(item)
The self.__APPS__ is just a copy of _apps.__all__.
The first problem I encountered was that i would get an AttributeError saying module x has no attribute y in ttip_txt = eval(f'_apps.{app}.__doc__') I resolved this by from _apps import * in UserAdministrationApp module. At this point I had already renamed this module for testing purposes and everything worked, but when I changed the name back to UserAdministrationApp.py I got another AttributeError saying module __apps has no attribute UserAdministrationApp.
Questions
I tried reading the python import docs but nothing in it really spoke to me.
I am sensing it has something to do with the script trying to import itself.
But i am still intrigued by these questions:
Why did the import fail in the first case, when i have import _apps?
Why in the second case does it not at least see itself and then produce an ImportError instead of AtributeError?
What is the optimal way to handle these types of situations?

Okay I found a solution, and though i think it is a bit dirty and not in best style, it works.
First
remove the from _apps import * and just from _apps import __all__.
Then
In initialization of the main class from the module UserAdministrationApp import in a loop skipping self.__class_.__name__
self.__APPS__ = _apps.__all__
self.class_name = self.__class__.__name__
for app in self.__APPS__:
if self.class_name != app:
exec(f'import _apps.{app}')
Finally
for app in self.__APPS__:
app_icon = str(os.path.join(app_icons, f"{app}.png")).replace('\\', '/')
icon = QIcon(app_icon)
if app != self.class_name:
ttip_txt = eval(f'_apps.{app}.__doc__')
else:
ttip_txt = self.__doc__
Having found the solution, I would still like to hear why the error was in the first place, for educational purposes.
So if anybody at any time glances over this and knows how to...you are more than welcome.

How do I access variables/objects in my whole application?

I'm developping a PyQT application. This application is able to load some data from a database, and then do various analysis on these data. All of this works. But as the analyses can be quite complicated, and as a will not be the only user, I had to develop a system with users defined script.
Basically, there's a text editor where the user can program his own small python script (with functions). Then the user can save the script or execute it, by loading the file as a module (within the application).
Here, there a simplified version of my application.
The core of the application is in My_apps.py
and the plugins are here in the same folder i.e. Plugin_A.py
this is the code of My_apps.py:
import sys,os
class Analysis(object):
def __init__(self):
print "I'm the core of the application, I do some analysis etc..."
def Analyze_Stuff(self):
self.Amplitudes_1=[1,2,3,1,2,3]
class Plugins(object):
def __init__(self):
newpath = "C:\Users\Antoine.Valera.NEUROSECRETION\Desktop\Model" #where the file is
sys.path.append(newpath)
Plugin_List=[]
for module in os.listdir(newpath):
if os.path.splitext(module)[1] == ".py":
module=module.replace(".py","")
Plugin_List.append(module)
for plugin in Plugin_List:
a=__import__(plugin)
setattr(self,plugin,a)
def Execute_a_Plugin(self):
Plugins.Plugin_A.External_Function(self)
if __name__ == "__main__":
Analysis=Analysis()
Plugins=Plugins()
Plugins.Execute_a_Plugin()
and here is an example of the code of Plugin_A.py
def External_Function(self):
Analysis.Analyze_Stuff()
print Analysis.Amplitudes_1
why do I get :
Traceback (most recent call last):
File "C:\Users\Antoine.Valera.NEUROSECRETION\Desktop\Model\My_Apps.py", line 46, in <module>
Plugins.Execute_a_Plugin()
File "C:\Users\Antoine.Valera.NEUROSECRETION\Desktop\Model\My_Apps.py", line 37, in Execute_a_Plugin
Plugins.Plugin_A.External_Function(self)
File "C:\Users\Antoine.Valera.NEUROSECRETION\Desktop\Model\Plugin_A.py", line 8, in External_Function
Analysis.Analyze_Stuff()
NameError: global name 'Analysis' is not defined
but if I add the 2 lines following lines instead of Plugins.Execute_a_Plugin()
Analysis.Analyze_Stuff()
print Analysis.Amplitudes_1
then, it works.
How could I indicate to every dynamically loaded plugins that he has to use the variables/objects already existing in Analysis? Why can't I print Analysis.Amplitudes_1 from within the plugin?
Thank you!!

The error message seems perfectly clear: the name "Analysis" doesn't exist in the namespace of the Plugin_A module you imported, and so External_Function cannot access it.
When you import the Plugin_A module, it doesn't get access to the names in the namespace of the importing module, My_apps. So it cannot "see" the instance of the Analysis class that you created there.
A simple solution to this is to change the signature of External_Function (and other related functions), so that it can take an instance of the Analysis class:
Plugin_A.py:
def External_Function(self, analysis):
analysis.Analyze_Stuff()
print analysis.Amplitudes_1
My_apps.py:
...
def Execute_a_Plugin(self):
plugins.Plugin_A.External_Function(self, analysis)
if __name__ == "__main__":
analysis = Analysis()
plugins = Plugins()
plugins.Execute_a_Plugin()
Note that I have altered the naming so that the instance names don't shadow the class names.

You have to import your module. Add the following on top of Plugin_A.py
from My_apps import Analysis
A = Analysis()
A.Analyze_Stuff()
print A.Amplitudes_1

Declaring classes inside a python module

I have a python module that I've made that contains regular function defintions as well as classes. For some reason when I call the constructor and pass a value, it's not updating the class variable.
My module is called VODControl (VODControl.py). The class I have declared inside the module is called DRMPath. The DRMPath class has two instance vairables: logfile and results. logfile is a string and results is a dictionary.
My constructor looks like this:
def __init__(self, file):
self.logilfe = file
self.results['GbE1'] = ""
self.results['GbE2'] = ""
self.results['NetCrypt'] = ""
self.results['QAM'] = ""
when I import it from my other python script I do:
import VODControl
The call i use is the following:
d = VODControl.DRMPath('/tmp/adk.log')
However, when I print the value of the logfile instance variable, it isn't updated with what I passed to the constructor:
print d.logfile
After printing, it's still an empty string. What gives?

self.logilfe = file is not the same as self.logfile = file In addition, it is likely returning None, not an empty string.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Understanding Python's `importlib.reload` - python

Related

Mocking os.path.exists and os.makedirs returning AssertionError

python importlib seems to be sharing data between instances

how to avoid importing self, intra-package relationships

How do I access variables/objects in my whole application?

Declaring classes inside a python module

Categories

Resources