Python class, data structure and proper architecture - python

I'm writing a program, which request user information from different services, puts them together in some ways. manages stuff and does some slack interaction.
All my python projects get problematic at a certain size. imports start to become recursive and handling data around becomes annoying.
A quick example of a problem I just come across can be shown with this simple example. I have a main module (here A) which creates the main objects (singletons).
These objects need to call functions from each other, so I use main as a connector. In this given example I don't understand when B is created the list that it requests from A is (None) NoneType. The getter function is not necessarily the way I go, but it helped in another situation. Do you have any tips, reads to point, how to structure middle-sized python programs. Thanks!
import B
some_list = None
b = None
def get_list():
return some_list
if __name__ == "__main__":
some_list = [1,2,3]
b = B.B()
print b.my_list
And module B
from A import get_list
class B:
def __init__(self):
self.my_list = get_list().map(lambda v : v * 2) # CRASH HERE!

You have two copies of the main module now, each a separate entry in sys.modules:
The initial Python script, started from the command-line, is always called __main__.
You imported the A.py file as the A module. This is separate from the __main__ module.
Yes, the same source file provided both modules, but Python sees them as distinct.
As a result, that second copy does not have the if __name__ == '__main__': block executed, because the __name__ variable is set to 'A' instead. As such, A.some_list and A.b remain set to None; you wanted __main__.some_list and __main__.b instead.
Don't put code in your main entry point that other modules need to import to have access to. Pass in such dependencies, or have them managed by a separate module that both the main module and other modules can import.
You could, for example, pass in the function to the B() class:
b = B.B(get_list)

Related

Why are different objects created when using globals in a file vs importing them?

Below is a simple code example that may help to explain my question.
file_1.py
from functools import lru_cache
from file_2 import add_stuff, add_stats
#lru_cache()
def add(x, y):
return x + y
if __name__ == "__main__":
add(1, 2)
add(1, 2)
add(3, 4)
print(add.cache_info)
print(add.cache_info())
add_stuff(1, 2)
add_stuff(3, 4)
add_stats()
file_2.py
def add_stuff(x, y):
from file_1 import add
add(x, y)
def add_stats():
from file_1 import add
print(add.cache_info)
print(add.cache_info())
And the output looks like this:
<built-in method cache_info of functools._lru_cache_wrapper object at 0x017E9E48>
CacheInfo(hits=1, misses=2, maxsize=128, currsize=2)
<built-in method cache_info of functools._lru_cache_wrapper object at 0x017E9D40>
CacheInfo(hits=0, misses=2, maxsize=128, currsize=2)
When I use the function inside of the file it was defined in, the function object is different from when another file imports it. Which means that for things like lru_cache, if you didn't realize this, you could be populating two caches inside of your process/threads if you don't keep the cached functions inside of a different file from where they are used.
My question is, is this a python gotcha to look out for? Or is there documentation somewhere that I just never read that explains this more in depth? I looked at the lru_cache documentation, and this was not called out there as anything to be aware of.
When I use the function inside of the file it was defined in, the function object is different from when another file imports it.
Yes; there are two separate caches, because each is decorating a separate function object. The reason there are two separate function objects is because there are two separate modules created from the same source code.
One of these modules was created by from file_1 import add, which causes a module to be cached in the sys.modules with the key 'file_1' and a __name__ attribute of file_1. (Subsequent uses of import will look up this module in the cache).
The other one is created by running file_1.py as the main script. This causes a module with a __name__ attribute of __main__ to be created.
This is why and how the if __name__ == '__main__': trick works. The global variables available to a module - i.e., what you get by using globals() - come from attributes of the module object. Top-level scripts are also represented with module objects - they just aren't imported using import (although they are created using much of the same machinery, and cached; a '__main__' key will appear in sys.modules). That's where the information comes from, and thus why __name__ exists as a global variable in normal circumstances.
is this a python gotcha to look out for? Or is there documentation somewhere that I just never read that explains this more in depth? I looked at the lru_cache documentation, and this was not called out there as anything to be aware of.
It isn't explained in the lru_cache documentation because it isn't lru_cache's fault. It would happen with any decorator. In fact, it would happen with any code that makes the separate identity of the function objects relevant. For example, if we create this module_example.py:
def example():
print(example.module)
example.module = __name__
if __name__ == '__main__':
example()
(The reoccurrence of __name__ should make it obvious what is going on - although, of course, we could even just use __name__ directly in the function)
Now we test the code in interactive mode - run it, use the global function, and then import the module and use the imported function:
$ python -i module_example.py
__main__
>>> example()
__main__
>>> import module_example
>>> module_example.example()
module_example
>>> quit()
$
This is only a gotcha insofar as expecting a module to work as both the top-level code and as something importable, imposes some design considerations. Normally, if the code is intended to be imported, the "driver" code block (if any) will just do an informal test; or offer a simple, one-off UI for the module's functionality that doesn't care about consistency with an imported-module version of the same code.
Alternately put: the real problem here is a circular import. file_1 is indirectly importing itself to get at its own functionality, and it only "works" because of the implicit renaming of the module to __main__ the first time.

accessing and changing module level variable [duplicate]

I've run into a bit of a wall importing modules in a Python script. I'll do my best to describe the error, why I run into it, and why I'm tying this particular approach to solve my problem (which I will describe in a second):
Let's suppose I have a module in which I've defined some utility functions/classes, which refer to entities defined in the namespace into which this auxiliary module will be imported (let "a" be such an entity):
module1:
def f():
print a
And then I have the main program, where "a" is defined, into which I want to import those utilities:
import module1
a=3
module1.f()
Executing the program will trigger the following error:
Traceback (most recent call last):
File "Z:\Python\main.py", line 10, in <module>
module1.f()
File "Z:\Python\module1.py", line 3, in f
print a
NameError: global name 'a' is not defined
Similar questions have been asked in the past (two days ago, d'uh) and several solutions have been suggested, however I don't really think these fit my requirements. Here's my particular context:
I'm trying to make a Python program which connects to a MySQL database server and displays/modifies data with a GUI. For cleanliness sake, I've defined the bunch of auxiliary/utility MySQL-related functions in a separate file. However they all have a common variable, which I had originally defined inside the utilities module, and which is the cursor object from MySQLdb module.
I later realised that the cursor object (which is used to communicate with the db server) should be defined in the main module, so that both the main module and anything that is imported into it can access that object.
End result would be something like this:
utilities_module.py:
def utility_1(args):
code which references a variable named "cur"
def utility_n(args):
etcetera
And my main module:
program.py:
import MySQLdb, Tkinter
db=MySQLdb.connect(#blahblah) ; cur=db.cursor() #cur is defined!
from utilities_module import *
And then, as soon as I try to call any of the utilities functions, it triggers the aforementioned "global name not defined" error.
A particular suggestion was to have a "from program import cur" statement in the utilities file, such as this:
utilities_module.py:
from program import cur
#rest of function definitions
program.py:
import Tkinter, MySQLdb
db=MySQLdb.connect(#blahblah) ; cur=db.cursor() #cur is defined!
from utilities_module import *
But that's cyclic import or something like that and, bottom line, it crashes too. So my question is:
How in hell can I make the "cur" object, defined in the main module, visible to those auxiliary functions which are imported into it?
Thanks for your time and my deepest apologies if the solution has been posted elsewhere. I just can't find the answer myself and I've got no more tricks in my book.
Globals in Python are global to a module, not across all modules. (Many people are confused by this, because in, say, C, a global is the same across all implementation files unless you explicitly make it static.)
There are different ways to solve this, depending on your actual use case.
Before even going down this path, ask yourself whether this really needs to be global. Maybe you really want a class, with f as an instance method, rather than just a free function? Then you could do something like this:
import module1
thingy1 = module1.Thingy(a=3)
thingy1.f()
If you really do want a global, but it's just there to be used by module1, set it in that module.
import module1
module1.a=3
module1.f()
On the other hand, if a is shared by a whole lot of modules, put it somewhere else, and have everyone import it:
import shared_stuff
import module1
shared_stuff.a = 3
module1.f()
… and, in module1.py:
import shared_stuff
def f():
print shared_stuff.a
Don't use a from import unless the variable is intended to be a constant. from shared_stuff import a would create a new a variable initialized to whatever shared_stuff.a referred to at the time of the import, and this new a variable would not be affected by assignments to shared_stuff.a.
Or, in the rare case that you really do need it to be truly global everywhere, like a builtin, add it to the builtin module. The exact details differ between Python 2.x and 3.x. In 3.x, it works like this:
import builtins
import module1
builtins.a = 3
module1.f()
As a workaround, you could consider setting environment variables in the outer layer, like this.
main.py:
import os
os.environ['MYVAL'] = str(myintvariable)
mymodule.py:
import os
myval = None
if 'MYVAL' in os.environ:
myval = os.environ['MYVAL']
As an extra precaution, handle the case when MYVAL is not defined inside the module.
This post is just an observation for Python behaviour I encountered. Maybe the advices you read above don't work for you if you made the same thing I did below.
Namely, I have a module which contains global/shared variables (as suggested above):
#sharedstuff.py
globaltimes_randomnode=[]
globalist_randomnode=[]
Then I had the main module which imports the shared stuff with:
import sharedstuff as shared
and some other modules that actually populated these arrays. These are called by the main module. When exiting these other modules I can clearly see that the arrays are populated. But when reading them back in the main module, they were empty. This was rather strange for me (well, I am new to Python). However, when I change the way I import the sharedstuff.py in the main module to:
from globals import *
it worked (the arrays were populated).
Just sayin'
A function uses the globals of the module it's defined in. Instead of setting a = 3, for example, you should be setting module1.a = 3. So, if you want cur available as a global in utilities_module, set utilities_module.cur.
A better solution: don't use globals. Pass the variables you need into the functions that need it, or create a class to bundle all the data together, and pass it when initializing the instance.
The easiest solution to this particular problem would have been to add another function within the module that would have stored the cursor in a variable global to the module. Then all the other functions could use it as well.
module1:
cursor = None
def setCursor(cur):
global cursor
cursor = cur
def method(some, args):
global cursor
do_stuff(cursor, some, args)
main program:
import module1
cursor = get_a_cursor()
module1.setCursor(cursor)
module1.method()
Since globals are module specific, you can add the following function to all imported modules, and then use it to:
Add singular variables (in dictionary format) as globals for those
Transfer your main module globals to it
.
addglobals = lambda x: globals().update(x)
Then all you need to pass on current globals is:
import module
module.addglobals(globals())
Since I haven't seen it in the answers above, I thought I would add my simple workaround, which is just to add a global_dict argument to the function requiring the calling module's globals, and then pass the dict into the function when calling; e.g:
# external_module
def imported_function(global_dict=None):
print(global_dict["a"])
# calling_module
a = 12
from external_module import imported_function
imported_function(global_dict=globals())
>>> 12
The OOP way of doing this would be to make your module a class instead of a set of unbound methods. Then you could use __init__ or a setter method to set the variables from the caller for use in the module methods.
Update
To test the theory, I created a module and put it on pypi. It all worked perfectly.
pip install superglobals
Short answer
This works fine in Python 2 or 3:
import inspect
def superglobals():
_globals = dict(inspect.getmembers(
inspect.stack()[len(inspect.stack()) - 1][0]))["f_globals"]
return _globals
save as superglobals.py and employ in another module thusly:
from superglobals import *
superglobals()['var'] = value
Extended Answer
You can add some extra functions to make things more attractive.
def superglobals():
_globals = dict(inspect.getmembers(
inspect.stack()[len(inspect.stack()) - 1][0]))["f_globals"]
return _globals
def getglobal(key, default=None):
"""
getglobal(key[, default]) -> value
Return the value for key if key is in the global dictionary, else default.
"""
_globals = dict(inspect.getmembers(
inspect.stack()[len(inspect.stack()) - 1][0]))["f_globals"]
return _globals.get(key, default)
def setglobal(key, value):
_globals = superglobals()
_globals[key] = value
def defaultglobal(key, value):
"""
defaultglobal(key, value)
Set the value of global variable `key` if it is not otherwise st
"""
_globals = superglobals()
if key not in _globals:
_globals[key] = value
Then use thusly:
from superglobals import *
setglobal('test', 123)
defaultglobal('test', 456)
assert(getglobal('test') == 123)
Justification
The "python purity league" answers that litter this question are perfectly correct, but in some environments (such as IDAPython) which is basically single threaded with a large globally instantiated API, it just doesn't matter as much.
It's still bad form and a bad practice to encourage, but sometimes it's just easier. Especially when the code you are writing isn't going to have a very long life.

Module namespace initialisation before execution

I'm trying to dynamically update code during runtime by reloading modules using importlib.reload. However, I need a specific module variable to be set before the module's code is executed. I could easily set it as an attribute after reloading but each module would have already executed its code (e.g., defined its default arguments).
A simple example:
# module.py
def do():
try:
print(a)
except NameError:
print('failed')
# main.py
import module
module.do() # prints failed
module.a = 'succeeded'
module.do() # prints succeeded
The desired pseudocode:
import_module_without_executing_code module
module.initialise(a = 'succeeded')
module.do()
Is there a way to control module namespace initialisation (like with classes using metaclasses)?
It's not usually a good idea to use reload other than for interactive debugging. For example, it can easily create situations where two objects of type module.A are not the same type.
What you want is execfile. Pass a globals dictionary (you don't need an explicit locals dictionary) to keep each execution isolated; anything you store in it ahead of time acts exactly like the "pre-set" variables you want. If you do want to have a "real" module interface change, you can have a wrapper module that calls (or just holds as an attribute) the most recently loaded function from your changing file.
Of course, since you're using Python 3, you'll have to use one of the replacements for execfile.
Strictly speaking, I don't believe there is a way to do what you're describing in Python natively. However, assuming you own the module you're trying to import, a common approach with Python modules that need some initializing input is to use an init function.
If all you need is some internal variables to be set, like a in you example above, that's easy: just declare some module-global variables and set them in your init function:
Demo: https://repl.it/MyK0
Module:
## mymodule.py
a = None
def do():
print(a)
def init(_a):
global a
a = _a
Main:
## main.py
import mymodule
mymodule.init(123)
mymodule.do()
mymodule.init('foo')
mymodule.do()
Output:
123
foo
Where things can get trickier is if you need to actually redefine some functions because some dynamic internal something is dependent on the input you give. Here's one solution, borrowed from https://stackoverflow.com/a/1676860. Basically, the idea is to grab a reference to the current module by using the magic variable __name__ to index into the system module dictionary, sys.modules, and then define or overwrite the functions that need it. We can define the functions locally as inner functions, then add them to the module:
Demo: https://repl.it/MyHT/2
Module:
## mymodule.py
import sys
def init(a):
current_module = sys.modules[__name__]
def _do():
try:
print(a)
except NameError:
print('failed')
current_module.do = _do

executing python code from string loaded into a module

I found the following code snippet that I can't seem to make work for my scenario (or any scenario at all):
def load(code):
# Delete all local variables
globals()['code'] = code
del locals()['code']
# Run the code
exec(globals()['code'])
# Delete any global variables we've added
del globals()['load']
del globals()['code']
# Copy k so we can use it
if 'k' in locals():
globals()['k'] = locals()['k']
del locals()['k']
# Copy the rest of the variables
for k in locals().keys():
globals()[k] = locals()[k]
I created a file called "dynamic_module" and put this code in it, which I then used to try to execute the following code which is a placeholder for some dynamically created string I would like to execute.
import random
import datetime
class MyClass(object):
def main(self, a, b):
r = random.Random(datetime.datetime.now().microsecond)
a = r.randint(a, b)
return a
Then I tried executing the following:
import dynamic_module
dynamic_module.load(code_string)
return_value = dynamic_module.MyClass().main(1,100)
When this runs it should return a random number between 1 and 100. However, I can't seem to get the initial snippet I found to work for even the simplest of code strings. I think part of my confusion in doing this is that I may misunderstand how globals and locals work and therefore how to properly fix the problems I'm encountering. I need the code string to use its own imports and variables and not have access to the ones where it is being run from, which is the reason I am going through this somewhat over-complicated method.
You should not be using the code you found. It is has several big problems, not least that most of it doesn't actually do anything (locals() is a proxy, deleting from it has no effect on the actual locals, it puts any code you execute in the same shared globals, etc.)
Use the accepted answer in that post instead; recast as a function that becomes:
import sys, imp
def load_module_from_string(code, name='dynamic_module')
module = imp.new_module(name)
exec(code, mymodule.__dict__)
return module
then just use that:
dynamic_module = load_module_from_string(code_string)
return_value = dynamic_module.MyClass().main(1, 100)
The function produces a new, clean module object.
In general, this is not how you should dynamically import and use external modules. You should be using __import__ within your function to do this. Here's a simple example that worked for me:
plt = __import__('matplotlib.pyplot', fromlist = ['plt'])
plt.plot(np.arange(5), np.arange(5))
plt.show()
I imagine that for your specific application (loading from code string) it would be much easier to save the dynamically generated code string to a file (in a folder containing an __init__.py file) and then to call it using __import__. Then you could access all variables and functions of the code as parts of the imported module.
Unless I'm missing something?

Find variables defined in other module (python)

I have a module testing system in Python where individual modules call something like:
class Hello(object):
_DOC_ATTR = { 'greeting': '''
a greeting message.
>>> h = Hello()
>>> h.greeting = 'hi there'
>>> h.greeting
'hi there'
''' }
def __init__(self):
self.greeting = "hello"
class Test(unittest.TestCase):
# tests here
if __name__ == '__main__':
import tester
tester.test(Test)
inside tester, I run the tests in Test along with a doctest on "__main__". This works great and has worked fine for a long time. Our specialized _DOC_ATTR dictionary documents individual attributes on the function when we build into Sphinx. However, doctests within this dictionary are not called. What I would like to do is within tester.test() to run doctests on the values in each class's _DOC_ATTR as well.
The problem that I'm having is trying to find a way within tester.test() to figure out all the variables (specifically classes) defined in __main__. I've tried looking at relevant places in traceback to no avail. I thought that because I was passing in a class from __main__, namely __main__.Test that I'd be able to use the .__module__ from Test to get access to the local variables there, but I can't figure out how to do it.
I would rather not need to alter the call to tester.test(Test) since it's used in hundreds of modules and I've trained all the programmers working on the project to follow this paradigm. Thanks for any help!
I think that I may have found the answer:
import inspect
stacks = inspect.stack()
if len(stacks) > 1:
outerFrame = stacks[1][0]
else:
outerFrame = stacks[0][0]
localVariables = outerFrame.f_locals
for lv in list(localVariables.keys()):
lvk = localVariables[lv]
if (inspect.isclass(lvk)):
docattr = getattr(lvk, '_DOC_ATTR', None)
if docattr is not None:
# ... do something with docattr ...
Another solution: since we are passing the "Test" class in, and in order to run there needs to be a "runTest" function defined, one could also use the func_globals on that function. Note that it cannot be a function inherited from a superclass, such as __init__, so it may have limited functionality for wider use cases.
import inspect
localVariables = Test.runTest.func_globals
for lv in list(localVariables.keys()):
lvk = localVariables[lv]
if (inspect.isclass(lvk)):
#### etc.

Categories