I have a program that must continuously create thousands of objects off of a class that has about 12–14 methods. Will the fact that they are of a complex class cause a performance hit over creating a simpler object like a list or dictionary, or even another object with fewer methods?
Some details about my situation:
I have a bunch of “text” objects that continuously create and refresh “prints” of their contents. The print objects have many methods but only a handful of attributes. The print objects can’t be contained within the text objects because the text objects need to be “reusable” and make multiple independent copies of their prints, so that rules out just swapping out the print objects’ attributes on refresh.
Am I better off,
Continuously creating the new print objects with all their methods as the application refreshes?
Unraveling the class and turning the print objects into simple structs and the methods into independent functions that take the objects as arguments?
This I assume would depend on whether or not there is a large cost associated with generating new objects with all the methods included in them, versus having to import all the independent functions to wherever they would have been called as object methods.
It doesn't matter how complex the class is; when you create an instance, you only store a reference to the class with the instance. All methods are accessed via this one reference.
No, it should not make a difference.
Consider that when you do the following:
a = Foo()
a.bar()
The call to the bar method is in fact translated under the covers to:
Foo.bar(a)
I.e. bar is "static" under the class definition, and there exists only one instance of the function. When you look at it this way then it suggests that no, there will be no significant impact from the number of methods. The methods are instantiated when you first run the Python program rather than create objects.
I did some testing.
I have the following function:
def call_1000000_times(f):
start = time.time()
for i in xrange(1000000):
f(a=i, b=10000-i)
return time.time() - start
As you can see, this function takes another function, calls it 1000000 times, and returns how long that took, in seconds.
I also created two classes:
A small class:
class X(object):
def __init__(self, a, b):
self.a = a
self.b = b
And a rather large one:
class Y(object):
def __init__(self, a, b):
self.a = a
self.b = b
def foo(self): pass
def bar(self): pass
def baz(self): pass
def anothermethod(self):
pass
#classmethod
def hey_a_class_method(cls, argument):
pass
def thisclassdoeswaytoomuch(self): pass
def thisclassisbecomingbloated(self): pass
def almostattheendoftheclass(self): pass
def imgonnahaveacouplemore(self): pass
def somanymethodssssss(self): pass
def should_i_add_more(self): pass
def yes_just_a_couple(self): pass
def just_for_lolz(self): pass
def coming_up_with_good_method_names_is_hard(self): pass
The results:
>>> call_1000000_times(dict)
0.2680389881134033
>>> call_1000000_times(X)
0.6771988868713379
>>> call_1000000_times(Y)
0.6260080337524414
As you can see, the difference between a large class and a small class is very small, with the large class even being faster in this case. I assume that if you ran this function multiple times with the same type, and averaged the numbers, they'd be even closer, but it's 3AM and I need sleep, so I'm not going to set that up right now.
On the other hand, just calling dict was about 2.5x faster, so if your bottleneck is instantiation, this might be a place to optimize things.
Be wary of premature optimization though. Classes, by holding data and code together, can make your code easier to understand and build upon (functional programming lovers, this is not the place for an argument). It might be a good idea to use the python profiler or other performance measuring tools to find out what parts of your code are slowing it down.
Related
I have a question about multiple class inheritance in python. I think that I have already implemented it correctly, however it is somehow not in line with my usual understanding of inheritance (the usage of super(), specifically) and I am not really sure whether this could lead to errors or certain attributes not be updated etc.
So, let me try to describe the basic problem clearly:
I have three classes Base, First and Second
Both First and Second need to inherit from Base
Second also inherits from First
Base is an external module that has certain base methods needed for First and Second to function correctly
First is a base class for Second, which contains methods that I would have to repetitively write down in Second
Second is the actual class that I use. It implements additional methods and attributes. Second is a class for which the design may vary a lot, so I want to flexibly change it without having all the code from first written in Second.
The most important point about Second however is the following: As visible below, in Second's init, I firstly want to inherit from Base and perform some operations that require methods from Base. Then, after that, I would like to launch the operations in the init of First, which manipulate some of the parameters that are instantiated in Second. For that, I inherit from First at the end of Second's init-body.
You can see how the variable a is manipulated by throughout the initialization of Second. The current behavior is as I wish, but the structure of my code looks somehow weird, which is why I am asking.
Why the hell do I want to do this? Think of the First class having many methods and also performing many operations (on parameters from Second) in it's init body. I don't want to have all these methods in the body of Second and all these operations in the init of Second (here the parameter a). First is a class that will rarely change, so it is better for clarity and compactness to move it back to another file, at least in my opinion ^^. Also, due to the sequence of calls in Second's init, I did not find another way to realize it.
Now the code:
class Base():
def __init__(self):
pass
def base_method1(self):
print('Base Method 1')
pass
def base_method2(self):
pass
# ...
class First(Base):
def __init__(self):
super().__init__()
print('Add in Init')
self.first_method1()
def first_method1(self):
self.a += 1.5
def first_method2(self):
pass
# ...
class Second(First):
def __init__(self,a):
# set parameters
self.a = a
# inherit from Base class
Base.__init__(self)
# some operations that rely on Base-methods
self.base_method1()
print(self.a)
# inherit from First and perform operations in First-init
# that must follow AFTER the body of Second-init
First.__init__(self)
print(self.a)
# checking whether Second has inherited the method(s) from First
print('Add by calling method')
self.first_method1()
print(self.a)
sec = Second(0)
The output of the statement sec = Second(0) prints:
Base Method 1
0
Add in Init
1.5
Add by calling method
3.0
I hope it is more or less clear; if not, I am glad to clarify!
Thanks, I appreciate any comment!
Best, JZ
So - the basic problem here is that you are trying something for which multiple-inheritance is not a solution on itself - however there are ways to structure your code so that they work.
When using multiple-inheritance proper in Python, one should only have to call super().method once in each defined method, out of the base class, and only once, and do not call specific versions of the method by hardcoding an ancestor like you do in Base.__init__() on Second class. Just for start, with this design as is, Base.__init__() will run twice each time you instantiate Second.
The main problem in your assumptions lies in
I firstly want to inherit from Base and perform some operations that require methods from Base. Then, after that, I would like to launch the operations in the init of First, which manipulate some of the parameters that are instantiated in Second. For that, I inherit from First at the end of Second's init-body.
So - if you call super()as you shall have perceived, no matter if you write Second as class Second(Base, First) , super will run the method in First before the method in Base. It happens because Python does linearize all ancestor classes when there is multiple-inheritance, so that there is always a predictable and deterministic order, in which each ancestor class shows up only once, to find attributes and call methods (the "mro" or "Method Resolution Order") - and the only possible way with your arrangement is to go Second->First->Base.
Now, if you really want to perform initialization of certain parts in Base prior to running intialization in First, that is where "more than multiple inheritance" comes into play: you have to define slots in your class - that is a design decision you do by yourself, and nothing ready-made on the language: agree on some convention of methods that Base.__init__ will call at varying states of the initialization, so that they are activated at the proper time.
This way you could even skip having a First.__init__ method - just have some method you know will be called on the child, by Base.__init__ and override those as you need.
The language itself use this strategy when it offers both a __new__ and a __init__ method spec: each will run a different stage of initialization of a new instance.
Note that by doing it this way, you don't even need multiple inheritance for this case:
class Base():
def __init__(self):
# initial step
self.base_method1()
...
# first initalization step:
self.init_1()
#Intemediate common initialization
...
# second initialization step
...
def init_1(self):
pass
def init_2(self):
pass
def base_method1(self):
pass
# ...
class First(Base):
# __init__ actually uneeded here:
#def __init__(self):
# super().__init__()
def init_1(self):
print('Add in Init')
self.first_method1()
return super().init_1()
# init_2 not used,
def first_method1(self):
self.a += 1.5
def first_method2(self):
pass
# ...
class Second(First):
def __init__(self,a):
# set parameters
self.a = a
# some operations that rely on Base-methods
super().__init__() # will call Base.__init__ which should call "base_method1"
# if you need access to self.a _before_ it is changed by First, you implement
# the "init_1" slot in this "Second" class
# At this point, the code in First that updates the attribute has already run
print(self.a)
def init_1(self):
# this part of the code will be called by Base.__init__
_before_ First.init_1 is executed:
print("Initial value of self.a", a)
# delegate to "init_1" stage on First:
return super().init_1()
sec = Second(0)
The part of my code that I need to parallelize is something like this:
for ClassInstance in ClassInstancesList:
ClassInstance.set_attributes(arguments)
With the method "set_attributes" having no return and just setting the attributes of the class instance.
I tried using multiprocessing and concurrent.futures, but both of those do a copy of the class instance which is not what I want.
The fixes that I saw (returning self, returning all the attributes and using another method to set the attributes, or using multiprocessing.Value) would either make copies of a big number of lists of lists or make me change the methods in my class in such a way as to make it very difficult to read. (set_attributes actually calls various methods set_attribute_A, set_attribute_B etc..)
In my case the threads can be completely independent.
EDIT: Here is my attempt at a minimal reproducible example:
class Object:
def _init_(self, initial_attributes):
self.attributes1 = initial_attributes
def update(self, attributes):
self.attributes1.append(attributes)
def set_attributes2(self, args):
# Computations based on attributes 1 and args, in the real code many other
# similar private methods are called
self._set_attribute(args)
def detect_and_fill_Objects(args):
ObjectList = detect(args) # other function which initializes instances and updates them
# at this point, the objects instances only have attributes 1 set
# This following loop is the one that I want to parallelize, the one that sets
# the attributes 2
for Object in ObjectList:
Object.set_attributes2(args)
When I ran the code using multiprocessing there was a great speed-up but all the computations were lost because they were done one copies of the instances and not the instances themselves, therefore I believe that a decent speedup could be obtained ?
Here's my problem:
I have a class. And I have two objects of that class: ObjectOne and ObjectTwo
I'd like my class to have certain methods for ObjectOne and different methods for ObjectTwo.
I'd also like to choose those methods from a variety depending on some condition.
and of course, I need to call the methods I have 'from the outside code'
As I see the solution on my own (just logic, no code):
I make a default class. And I make a list of functions defined somewhere.
IF 'some condition' is True I construct a child class that takes one of those functions and adds it into class as class method. Otherwise I add some default set of methods. Then I make ObjectOne of this child class.
The question is: can I do that at all? And how do I do that? And how do I call such a method once it is added? They all would surely be named differently...
I do not ask for a piece of working code here. If you could give me a hint on where to look or maybe a certain topic to learn, this would do just fine!
PS: In case you wonder, the context is this: I am making a simple game prototype, and my objects represent two game units (characters) that fight each other automatically. Something like an auto-chess. Each unit may have unique abilities and therefore should act (make decisions on the battlefield) depending on the abilities it has. At first I tried to make a unified decision-making routine that would include all possible abilities at once (such as: if hasDoubleStrike else if... etc). But it turned out to be a very complex task, because there are tens of abilities overall, each unit may have any two, so the number of combinations is... vast. So, now I am trying to distribute this logic over separate units: each one would 'know' only of its own two abilities.
I mean I believe this is what would generally be referred to as a bad idea, but... you could have an argument passed into the class's constructor and then define the behavior/existence of a function depending on that condition. Like So:
class foo():
def __init__(self, condition):
if condition:
self.func = lambda : print('baz')
else:
self.func = lambda : print('bar')
if __name__ == '__main__':
obj1 = foo(True)
obj2 = foo(False)
obj1.func()
obj2.func()
Outputs:
baz
bar
You'd likely be better off just having different classes or setting up some sort of class hierarchy.
So in the end the best solution was the classical factory method and factory class. Like this:
import abc
import Actions # a module that works as a library of standard actions
def make_creature(some_params):
creature_factory = CreatureFactory()
tempCreature = creature_factory.make_creature(some_params)
return tempCreature
class CreatureFactory:
def make_creature(some_params):
...
if "foo" in some_params:
return fooChildCreature()
class ParentCreature(metaclass=abc.ABCMeta):
someStaticParams = 'abc'
#abc.abstractmethod
def decisionMaking():
pass
class fooChildCreature(ParentCreature):
def decisionMaking():
Actions.foo_action()
Actions.bar_action()
# some creature-specific decision making here that calls same static functions from 'Actions'
NewCreature = make_creature(some_params)
This is not ideal, this still requires much manual work to define decision making for various kinds of creatures, but it is still WAY better than anything else. Thank you very much for this advice.
Concretely, I have a user-defined class of type
class Foo(object):
def __init__(self, bar):
self.bar = bar
def bind(self):
val = self.bar
do_something(val)
I need to:
1) be able to call on the class (not an instance of the class) to recover all the self.xxx attributes defined within the class.
For an instance of a class, this can be done by doing a f = Foo('') and then f.__dict__. Is there a way of doing it for a class, and not an instance? If yes, how? I would expect Foo.__dict__ to return {'bar': None} but it doesn't work this way.
2) be able to access all the self.xxx parameters called from a particular function of a class. For instance I would like to do Foo.bind.__selfparams__ and recieve in return ['bar']. Is there a way of doing this?
This is something that is quite hard to do in a dynamic language, assuming I understand correctly what you're trying to do. Essentially this means going over all the instances in existence for the class and then collecting all the set attributes on those instances. While not infeasible, I would question the practicality of such approach both from a design as well as performance points of view.
More specifically, you're talking of "all the self.xxx attributes defined within the class"—but these things are not defined at all, not at least in a single place—they more like "evolve" as more and more instances of the class are brought to life. Now, I'm not saying all your instances are setting different attributes, but they might, and in order to have a reliable generic solution, you'd literally have to keep track of anything the instances might have done to themselves. So unless you have a static analysis approach in mind, I don't see a clean and efficient way of achieving it (and actually even static analysis is of no help generally speaking in a dynamic language).
A trivial example to prove my point:
class Foo(object):
def __init__(self):
# statically analysable
self.bla = 3
# still, but more difficult
if SOME_CONSTANT > 123:
self.x = 123
else:
self.y = 321
def do_something(self):
import random
setattr(self, "attr%s" % random.randint(1, 100), "hello, world of dynamic languages!")
foo = Foo()
foo2 = Foo()
# only `bla`, `x`, and `y` attrs in existence so far
foo2.do_something()
# now there's an attribute with a random name out there
# in order to detect it, we'd have to get all instances of Foo existence at the moment, and individually inspect every attribute on them.
And, even if you were to iterate all instances in existence, you'd only be getting a snapshot of what you're interested, not all possible attributes.
This is not possible. The class doesn't have those attributes, just functions that set them. Ergo, there is nothing to retrieve and this is impossible.
This is only possible with deep AST inspection. Foo.bar.func_code would normally have the attributes you want under co_freevars but you're looking up the attributes on self, so they are not free variables. You would have to decompile the bytecode from func_code.co_code to AST and then walk said AST.
This is a bad idea. Whatever you're doing, find a different way of doing it.
To do this, you need some way to find all the instances of your class. One way to do this is just to have the class itself keep track of its instances. Unfortunately, keeping a reference to every instance in the class means that those instances can never be garbage-collected. Fortunately, Python has weakref, which will keep a reference to an object but does not count as a reference to Python's memory management, so the instances can be garbage-collected as per usual.
A good place to update the list of instances is in your __init__() method. You could also do it in __new__() if you find the separation of concerns a little cleaner.
import weakref
class Foo(object):
_instances = []
def __init__(self, value):
self.value = value
cls = type(self)
type(self)._instances.append(weakref.ref(self,
type(self)._instances.remove))
#classmethod
def iterinstances(cls):
"Returns an iterator over all instances of the class."
return (ref() for ref in cls._instances)
#classmethod
def iterattrs(cls, attr, default=None):
"Returns an iterator over a named attribute of all instances of the class."
return (getattr(ref(), attr, default) for ref in cls._instances)
Now you can do this:
f1, f2, f3 = Foo(1), Foo(2), Foo(3)
for v in Foo.iterattrs("value"):
print v, # prints 1 2 3
I am, for the record, with those who think this is generally a bad idea and/or not really what you want. In particular, instances may live longer than you expect depending on where you pass them and what that code does with them, so you may not always have the instances you think you have. (Some of this may even happen implicitly.) It is generally better to be explicit about this: rather than having the various instances of your class be stored in random variables all over your code (and libraries), have their primary repository be a list or other container, and access them from there. Then you can easily iterate over them and get whatever attributes you want. However, there may be use cases for something like this and it's possible to code it up, so I did.
i have a class that will make multiple instances. whats the difference between making a method and calling that method versus making a class and a function then using that function on the class? Does the first cost more memory because the method is "instantiated"?
Example:
class myclass:
def __init__(self):
self.a=0
def mymethod:
print self.a
inst1=myclass()
myclass.mymethod
versus:
class myclass:
def __init__(self):
self.a=0
def myfunction(instance):
print instance.a
inst1=myclass()
myfunction(inst1)
Methods are really just functions that always receive a class instance as a first parameter (and happen to be declared within the scope of a class). The code of a method is shared across all instances, so you won't be "instantiating" a method every time you make a class instance.
So, they are really equivalent; you use whatever is the clearest expression of your intent (readability counts!). If you are writing a function that always takes an instance of a specific class as an argument, it is probably clearest expressed as a method. If the function can operate on many different kinds of classes, it may be clearest as a function.