Structuring a program. Classes and functions in Python - python

I'm writing a program that uses genetic techniques to evolve equations.
I want to be able to submit the function 'mainfunc' to the Parallel Python 'submit' function.
The function 'mainfunc' calls two or three methods defined in the Utility class.
They instantiate other classes and call various methods.
I think what I want is all of it in one NAMESPACE.
So I've instantiated some (maybe it should be all) of the classes inside the function 'mainfunc'.
I call the Utility method 'generate()'. If we were to follow it's chain of execution
it would involve all of the classes and methods in the code.
Now, the equations are stored in a tree. Each time a tree is generated, mutated or cross
bred, the nodes need to be given a new key so they can be accessed from a dictionary attribute of the tree. The class 'KeySeq' generates these keys.
In Parallel Python, I'm going to send multiple instances of 'mainfunc' to the 'submit' function of PP. Each has to be able to access 'KeySeq'. It would be nice if they all accessed the same instance of KeySeq so that none of the nodes on the returned trees had the same key, but I could get around that if necessary.
So: my question is about stuffing EVERYTHING into mainfunc.
Thanks
(Edit) If I don't include everything in mainfunc, I have to try to tell PP about dependent functions, etc by passing various arguements in various places. I'm trying to avoid that.
(late Edit) if ks.next() is called inside the 'generate() function, it returns the error 'NameError: global name 'ks' is not defined'
class KeySeq:
"Iterator to produce sequential \
integers for keys in dict"
def __init__(self, data = 0):
self.data = data
def __iter__(self):
return self
def next(self):
self.data = self.data + 1
return self.data
class One:
'some code'
class Two:
'some code'
class Three:
'some code'
class Utilities:
def generate(x):
'___________'
def obfiscate(y):
'___________'
def ruminate(z):
'__________'
def mainfunc(z):
ks = KeySeq()
one = One()
two = Two()
three = Three()
utilities = Utilities()
list_of_interest = utilities.generate(5)
return list_of_interest
result = mainfunc(params)

It's fine to structure your program that way. A lot of command line utilities follow the same pattern:
#imports, utilities, other functions
def main(arg):
#...
if __name__ == '__main__':
import sys
main(sys.argv[1])
That way you can call the main function from another module by importing it, or you can run it from the command line.

If you want all of the instances of mainfunc to use the same KeySeq object, you can use the default parameter value trick:
def mainfunc(ks=KeySeq()):
key = ks.next()
As long as you don't actually pass in a value of ks, all calls to mainfunc will use the instance of KeySeq that was created when the function was defined.
Here's why, in case you don't know: A function is an object. It has attributes. One of its attributes is named func_defaults; it's a tuple containing the default values of all of the arguments in its signature that have defaults. When you call a function and don't provide a value for an argument that has a default, the function retrieves the value from func_defaults. So when you call mainfunc without providing a value for ks, it gets the KeySeq() instance out of the func_defaults tuple. Which, for that instance of mainfunc, is always the same KeySeq instance.
Now, you say that you're going to send "multiple instances of mainfunc to the submit function of PP." Do you really mean multiple instances? If so, the mechanism I'm describing won't work.
But it's tricky to create multiple instances of a function (and the code you've posted doesn't). For example, this function does return a new instance of g every time it's called:
>>> def f():
def g(x=[]):
return x
return g
>>> g1 = f()
>>> g2 = f()
>>> g1().append('a')
>>> g2().append('b')
>>> g1()
['a']
>>> g2()
['b']
If I call g() with no argument, it returns the default value (initially an empty list) from its func_defaults tuple. Since g1 and g2 are different instances of the g function, their default value for the x argument is also a different instance, which the above demonstrates.
If you'd like to make this more explicit than using a tricky side-effect of default values, here's another way to do it:
def mainfunc():
if not hasattr(mainfunc, "ks"):
setattr(mainfunc, "ks", KeySeq())
key = mainfunc.ks.next()
Finally, a super important point that the code you've posted overlooks: If you're going to be doing parallel processing on shared data, the code that touches that data needs to implement locking. Look at the callback.py example in the Parallel Python documentation and see how locking is used in the Sum class, and why.

Your concept of classes in Python is not sound I think. Perhaps, it would be a good idea to review the basics. This link will help.
Python Basics - Classes

Related

how to verify all but selected functions from the same module were not called?

I have a simple module (no classes, just utility functions) where a function foo() calls a number of functions from the same module, like this:
def get_config(args):
...
return config_dictionary
def get_objs(args):
...
return list_of_objects
def foo(no_run=False):
config = get_config(...)
if no_run:
return XYZ
objs = get_objects(config)
for obj in objs:
obj.work()
... # number of other functions from the same module called
Is it possible to use Python Mockito to verify that get_config() was the last function called from my module in foo() ? (for certain arguments)
Currently this is verified in this way:
spy2(mymodule.get_config)
spy2(mymodule.get_objects)
assert foo(no_run=True) == XYZ
verify(mymodule).get_config(...)
# Assumes that get_objects() is the first function to be called
# in foo() after the configuration is retrieved.
verify(mymodule, times=0).get_objects(...)
Perhaps something like generating the spy() and verify() calls dynamically ? Rewrite the module into a class and stub the whole class ?
Basically, I do not like the assumption of the test - the code in foo() can be reordered and the test would still pass.
That's not your real code, and then it is often not describing the real problem you have here. If for example you don't expect a function is called at all, like get_objects in your case, then why begin with spy2 in the first place. expect(<module>, times=0).<fn>(...) reads better in that case, and a subsequent verify is not needed.
There is verifyNoMoreInteractions(<module>) and inorder.verify testing. But all this is guessing as you don't tell how XYZ is computed. (Basically why spy2(get_config) and not a when call here. T.i. why calling the original implementation and not mocking the answer?)

Python structuring 2 functions with same dependencies

Issue: I have 2 functions that both require the same nested functions to operate so they're currently copy-pasted into each function. These functions cannot be combined as the second function relies on calling the first function twice. Unnesting the functions would result in the addition of too many parameters.
Question: Is it better to run the nested functions in the first function and append their values to an object to be fed into the 2nd function, or is it better to copy and paste the nested functions?
Example:
def func_A(thing):
def sub_func_A(thing):
thing += 1
return sub_func_A(thing)
def func_B(thing):
def sub_func_B(thing):
thing += 1
val_A, val_B = func_A(5), func_A(5)
return sub_func_B(val_A), sub_func_B(val_B)
Imagine these functions couldn't be combined and the nested function relied on so many parameters that moving it outside and calling it would be too cluttered
The "better option" depends on a few factors -:
The type of optimization you want to achieve.
The time taken by the functions to execute.
If the type of optimization to be achieved here is based on the time taken to execute the second function in the two cases, then it depends on the time taken for the nested function to fully execute, if that time is less than the time taken to store it's output when it's first called by the first function then its better copy pasting them.
While, if the time taken by the nested function to execute is more than the time taken to store it's output, then its a better option to execute it first time and then store it's output for future use.
Further, As mentioned by #DarylG in the comments, a class based approach can also be used wherein the nested function(subfunction) can be a private function(only accessible by the class's inner components), while the two functions(func_A and func_B) can be public thus allowing them to be used and accessed widely from the outside as well. If implemented in code it might look something like this -:
class MyClass() :
def __init__(self, ...) :
...
return
def __subfunc(self, thing) :
# PRIVATE SUBFUNC
thing += 1
return thing
def func_A(self, thing):
# PUBLIC FUNC A
return self.__subfunc(thing)
def func_B(self, thing):
# PUBLIC FUNC B
val_A, val_B = self.func_A(5), self.func_A(5)
return self.__subfunc(val_A), self.__subfunc(val_B)

Unpacking Sympy variables from dictionary

I am making a program to do some calculations for my Microeconomics class. Since there are some ways of working depending on the problem I am given, I have created a class. The class parses an Utility function and a 'mode' from the command line and calls a function or another depending on the mode.
Since every function uses the same variables I initiate them in __init__():
self.x = x = Symbol('x') # Variables are initiated
self.y = y = Symbol('y')
self.Px, self.Py, self.m = Px, Py, m = Symbol('Px'), Symbol('Py'), Symbol('m')
I need a local definition to successfully process the function. Once the function is initiated through sympify() I save it as an instance variable:
self.function = sympify(args.U)
Now I need to pass the variables x,yPx,Py,m to the different functions. This is where I have the problem. As I want a local definition I could simply x=self.x with all the variables. I would need to repeat this in every piece of code which isn't really sustainable. Another option is to pass all the variables as arguments.
But since I'm using a dictionary to choose which function to call depending on the mode this would mean I have to pass the same arguments for every function, whether I use them or not.
So I have decided to create a dictionary such as:
variables = { #A dictionary of variables is initiated
'x':self.x,
'y':self.y,
'Px':self.Px,
'Py':self.Py,
'm':self.m
}
This dictionary is initiated after I declare the variables as sympy Symbols. What I would like is to pass this dictionary in an unpacked form to every function. This way i would only need **kwargs as an argument and I could use the variables I want.
What I want is something like this:
a = 3
arggs = {'a' = a}
def f(**kwargs):return a+1
f(**args)
This returns 4. However when I pass my dictionary as an argument I get a non-defined 'x' or 'y' variables error. It can't be an scope issue because all the variables have been initiated for all the instance.
Here is my code calling the function:
self.approaches[self.identification][0](**self.variables)
def default(self, **kwargs):
solutions = dict()
self.MRS = S(self.function.diff(x) / self.function.diff(y)) # This line provokes the exception
What's my error?
PS: Some information may be unclear. English is not my main language. Apologies in advance.
Unfortunately, Python doesn't quite work like that. When you use **kwargs, the only variable this assigns is the variable kwargs, which is a dictionary of the keyword arguments. In general, there's no easy way to inject names into a function's local namespace, because of the way locals namespaces work. There are ways to do it, but they are fairly hacky.
The easiest way to make the variables available without having to define them each time is to define them at the module level. Generally speaking, this is somewhat bad practice (it really does belong on the class), but since SymPy Symbols are immutable and defined entirely by their name (and assumptions if you set any), it's just fine to set
Px, Py, m = symbols("Px Py m")
at the module level (i.e., above your class definition), because even if some other function defines its own Symbol("Px"), SymPy will consider it equal to the Px you defined from before.
In general, you can play somewhat fast and loose with immutable objects in this way (and all SymPy objects are immutable) because it doesn't really matter if an immutable object gets replaced with a second, equal object. It would matter, if, say, you had a list (a mutable container) because it would make a big difference if it were defined on the module level vs. the class level vs. the instance level.

Call function from class without declaring name object

We have a Tree, each node is an object.
The function that this tree has are 3, add(x);getmin();getmax()
The tree works perfectly; for example if i write
a = Heap()
a.add(5)
a.add(15)
a.add(20)
a.getmin()
a.getmax()
the stack look like this [5,15,20], now if i call getmin() it will print min element = 5 and the stack will look like [15,20] and so on.
The problem comes now;
the professor asked us to submit two files which are already created: main.py and minmaxqueue.py
main.py starts like this from minmaxqueue import add, getmin, getmax, and then is has already a list of functions calls of the kind
add(5)
add(15)
add(20)
getmin()
getmax()
in order to make work my script i had to do a=Heap() and then call always a.add(x). Since the TA's are going to run the script from a common file, i cant modify main.py such that it creates an object a=Heap(). It should run directly with add(5) and not with a.add(5)
Is there a way to fix this?
You can modify your module to create a global Heap instance, and define functions that forward everything to that global instance. Like this:
class Heap(object):
# all of your existing code
_heap = Heap()
def add(n):
return _heap.add(n)
def getmin():
return _heap.getmin()
def getmax():
return _heap.getmax()
Or, slightly more briefly:
_heap = Heap()
add = _heap.add
getmin = _heap.getmin
getmax = _heap.getmax
If you look at the standard library, there are modules that do exactly this, like random. If you want to create multiple Random instances, you can; if you don't care about doing that, you can just call random.choice and it works on the hidden global instance.
Of course for Random it makes sense; for Heap, it's a lot more questionable. But if that's what the professor demands, what can you do?
You can use this function to do that more quickly:
def make_attrs_global(obj):
for attr in dir(obj):
if not attr.startswith('__'):
globals()[attr] = getattr(obj, attr)
It makes all attributes of obj defined in global scope.
Just put this code at the end of your minmaxqueue.py file:
a = Heap()
make_attrs_global(a)
Now you should be able to call add directly without a. This is ugly but well...

How to avoid excessive parameter passing?

I am developing a medium size program in python spread across 5 modules. The program accepts command line arguments using OptionParser in the main module e.g. main.py. These options are later used to determine how methods in other modules behave (e.g. a.py, b.py). As I extend the ability for the user to customise the behaviour or the program I find that I end up requiring this user-defined parameter in a method in a.py that is not directly called by main.py, but is instead called by another method in a.py:
main.py:
import a
p = some_command_line_argument_value
a.meth1(p)
a.py:
meth1(p):
# some code
res = meth2(p)
# some more code w/ res
meth2(p):
# do something with p
This excessive parameter passing seems wasteful and wrong, but has hard as I try I cannot think of a design pattern that solves this problem. While I had some formal CS education (minor in CS during my B.Sc.), I've only really come to appreciate good coding practices since I started using python. Please help me become a better programmer!
Create objects of types relevant to your program, and store the command line options relevant to each in them. Example:
import WidgetFrobnosticator
f = WidgetFrobnosticator()
f.allow_oncave_widgets = option_allow_concave_widgets
f.respect_weasel_pins = option_respect_weasel_pins
# Now the methods of WidgetFrobnosticator have access to your command-line parameters,
# in a way that's not dependent on the input format.
import PlatypusFactory
p = PlatypusFactory()
p.allow_parthenogenesis = option_allow_parthenogenesis
p.max_population = option_max_population
# The platypus factory knows about its own options, but not those of the WidgetFrobnosticator
# or vice versa. This makes each class easier to read and implement.
Maybe you should organize your code more into classes and objects? As I was writing this, Jimmy showed a class-instance based answer, so here is a pure class-based answer. This would be most useful if you only ever wanted a single behavior; if there is any chance at all you might want different defaults some of the time, you should use ordinary object-oriented programming in Python, i.e. pass around class instances with the property p set in the instance, not the class.
class Aclass(object):
p = None
#classmethod
def init_p(cls, value):
p = value
#classmethod
def meth1(cls):
# some code
res = cls.meth2()
# some more code w/ res
#classmethod
def meth2(cls):
# do something with p
pass
from a import Aclass as ac
ac.init_p(some_command_line_argument_value)
ac.meth1()
ac.meth2()
If "a" is a real object and not just a set of independent helper methods, you can create an "p" member variable in "a" and set it when you instantiate an "a" object. Then your main class will not need to pass "p" into meth1 and meth2 once "a" has been instantiated.
[Caution: my answer isn't specific to python.]
I remember that Code Complete called this kind of parameter a "tramp parameter". Googling for "tramp parameter" doesn't return many results, however.
Some alternatives to tramp parameters might include:
Put the data in a global variable
Put the data in a static variable of a class (similar to global data)
Put the data in an instance variable of a class
Pseudo-global variable: hidden behind a singleton, or some dependency injection mechanism
Personally, I don't mind a tramp parameter as long as there's no more than one; i.e. your example is OK for me, but I wouldn't like ...
import a
p1 = some_command_line_argument_value
p2 = another_command_line_argument_value
p3 = a_further_command_line_argument_value
a.meth1(p1, p2, p3)
... instead I'd prefer ...
import a
p = several_command_line_argument_values
a.meth1(p)
... because if meth2 decides that it wants more data than before, I'd prefer if it could extract this extra data from the original parameter which it's already being passed, so that I don't need to edit meth1.
With objects, parameter lists should normally be very small, since most appropriate information is a property of the object itself. The standard way to handle this is to configure the object properties and then call the appropriate methods of that object. In this case set p as an attribute of a. Your meth2 should also complain if p is not set.
Your example is reminiscent of the code smell Message Chains. You may find the corresponding refactoring, Hide Delegate, informative.

Categories