The part of my code that I need to parallelize is something like this:
for ClassInstance in ClassInstancesList:
ClassInstance.set_attributes(arguments)
With the method "set_attributes" having no return and just setting the attributes of the class instance.
I tried using multiprocessing and concurrent.futures, but both of those do a copy of the class instance which is not what I want.
The fixes that I saw (returning self, returning all the attributes and using another method to set the attributes, or using multiprocessing.Value) would either make copies of a big number of lists of lists or make me change the methods in my class in such a way as to make it very difficult to read. (set_attributes actually calls various methods set_attribute_A, set_attribute_B etc..)
In my case the threads can be completely independent.
EDIT: Here is my attempt at a minimal reproducible example:
class Object:
def _init_(self, initial_attributes):
self.attributes1 = initial_attributes
def update(self, attributes):
self.attributes1.append(attributes)
def set_attributes2(self, args):
# Computations based on attributes 1 and args, in the real code many other
# similar private methods are called
self._set_attribute(args)
def detect_and_fill_Objects(args):
ObjectList = detect(args) # other function which initializes instances and updates them
# at this point, the objects instances only have attributes 1 set
# This following loop is the one that I want to parallelize, the one that sets
# the attributes 2
for Object in ObjectList:
Object.set_attributes2(args)
When I ran the code using multiprocessing there was a great speed-up but all the computations were lost because they were done one copies of the instances and not the instances themselves, therefore I believe that a decent speedup could be obtained ?
Related
I have a class that looks like the following
class A:
communicate = set()
def __init__(self):
pass
...
def _some_func(self):
...some logic...
self.communicate.add(some_var)
The communicate variable is shared among the instances of the class. I use it to provide a convenient way for the instances of this class to communicate with one another (they have some mild orchestration needed and I don't want to force the calling object to serve as an intermediary of this communication). However, I realized this causes problems when I run my tests. If I try to test multiple aspects of my code, since the python interpreter is the same throughout all the tests, I won't get a "fresh" A class for the tests, and as such the communicate set will be the aggregate of all objects I add to that set (in normal usage this is exactly what I want, but for testing I don't want interactions between my tests). Furthermore, down the line this will also cause problems in my code execution if I want to loop over my whole process multiple times (because I won't have a way of resetting this class variable).
I know I can fix this issue where it occurs by having the creator of the A objects do something like
A.communicate = set()
before it creates and uses any instances of A. However, I don't really love this because it forces my caller to know some details about the communication pathways of the A objects, and I don't want that coupling. Is there a better way for me to to reset the communicate A class variable? Perhaps some method I could call on the class instead of an instance itself like (A.new_batch()) that would perform this resetting? Or is there a better way I'm not familiar with?
Edit:
I added a class method like
class A:
communicate = set()
def __init__(self):
pass
...
#classmethod
def new_batch(cls):
cls.communicate = set()
def _some_func(self):
...some logic...
self.communicate.add(some_var)
and this works with the caller running A.new_batch(). Is this the way it should be constructed and called, or is there a better practice here?
I’m building a class that extends the list data structure in Python, called a Partitional. I’m adding a few methods that I find myself using frequently when dividing a list into partitions.
The class is initialized with a (nullable) list, which exists as an attribute on the class.
class Partitional(list):
"""Extends the list data type. Adds methods for dividing a list into partition sets
and returning data about those partition sets"""
def __init__(self, source_list: list=[]):
super().__init__()
self.source_list: list = source_list
self.n: int = len(source_list)
...
I want to be able to reliably replace list instances with Partitional instances without violating Liskov substitution. So for list’s methods, I wrote methods on the Partitional class that operate on self.source_list, e.g.
...
def remove(self, matched_item):
self.source_list.remove(matched_item)
self.__init__(self.source_list)
def pop(self, *args):
popped_item = self.source_list.pop(*args)
self.__init__(self.source_list)
return popped_item
def clear(self):
self.source_list.clear()
self.__init__(self.source_list)
...
(the __init__ call is there because the Partitional class builds some internal attributes based on self.source_list when it’s initialized, so these need to be rebuilt if source_list changes.)
And I also want Python’s built-in methods that take a list as an argument to work with a Partitional instance, so I set to work writing method overrides for those as well, e.g.
...
def __len__(self):
return len(self.source_list)
def __enumerate__(self):
return enumerate(self.source_list)
...
The relevant built-in methods are a finite set for any given Python version, but... is there not a simpler way to do this?
My question:
Is there a way to write a class such that, if an instance of that class is used as the argument for a function, the class provides an attribute to the function instead, by default?
That way I’d only need to override this default behaviour for a subset of built-in methods.
So for example, if a use case involving a list instance looks like this:
example_list: list = [1,2,3,4,5]
length = len(example_list)
we substitute a Partitional instance built from the same list:
example_list: list = [1,2,3,4,5]
example_partitional = Partitional(example_list)
length = len(example_partitional)
and what’s “actually” happening is this:
length = len(example_partitional.source_list)
i.e.
length = len([1,2,3,4,5])
Other notes:
In working on this, I’ve realized that there are two broad categories of Liskov substitution violation possible:
Inherent violation, where the structure of the child class will make it incompatible with any use case where the child class is used in place of the parent class, e.g. if you override some fundamental property or structure of the parent.
Context-dependent violation, where, for any given piece of software, so long as you never use the child class in a way that would violate Liskov substitution, you’re fine. E.g. You override a method on the parent class that would change how a built-in function acts when it takes an instance of the class as an argument, but you never use that built-in method with the class instance in your system. Or any system that depends on your system. Or... (you see how relying on this caveat is not foolproof)
What I’m looking to do is come up with a technique that will protect against both categories of violation, without having to worry about use cases and context.
I have a question regarding return convention in Python. I have a class that has some data attributes and I have several functions that perform some analysis on the data and then store the results as results attributes (please see the simplified implementation below). Since, the analysis functions mainly update the results attribute, my question is what is the best practice in terms of return statements. Should I avoid updating class attributes inside the function (as in process1), and just return the data and use that to update the results attribute (as in process2)?
Thanks,
Kamran
class Analysis(object):
def __init__(self, data):
self.data = data
self.results = None
def process1(self):
self.results = [i**2 for i in self.data]
def process2(self):
return [i**2 for i in self.data]
a = Analysis([1, 2, 3])
a.process1()
a.results = a.process2()
It all depends on the use case.
First of all, You are not changing the class attributes there. You are changing the instance attributes.
Python: Difference between class and instance attributes
Secondly, If you are planning to share the results of your process among the various instances of the class,Then you can use class variables.
Third, If you are asking about instance variables, Then it depends on design choice.
Besides that, This is unnecessary I guess:-
a.results = a.process2()
You simply made allocation to be part of object itself. You can use a.process2 function for displaying results if you need that functionality.
I need to combine Classes from two separate Python modules (which are similar in purpose but with different Methods) into a single Class so that the Methods can be accessed from the same object in a natural way both in code and for automatic documentation generation.
I am currently accomplishing the former but not the latter with the following code (this is not verbatim, as I can't share my actual source, but there's nothing different here that would impact the conversation).
Basically, I am creating the new class via a function which combines the __dict__ attributes of the two child Classes and returns a new Class.
def combine(argone, argtwo):
"""
Combine Classes
"""
_combined_arg = "some_string_%s_%s" % argone, argtwo
_temp = type('Temp', (ModuleOne, ModuleTwo), dict())
self = _temp(_combined_arg) # Calling the constructor with our combined arg
# The two classes have an identical constructor method within their __init__() methods
# Return the object we've instantiated off of the combined class
return self
This method works fine for producing an object that lets me call Methods from either of the original Classes, but my IDE can't auto-complete Method names nor can documentation generators (like pdoc) produce any documentation beyond our combine() function.
This process is necessary because we are generating code off of other code (descriptive, I know, sorry!) and it isn't practical to combine them upstream (ie, by hand).
Any ideas?
Thank you in advance!!!
ADDENDUM:
What I can say about what we are doing here is that we're just combining client Methods generated off of REST API endpoints that happen to be split into two, non-overlapping, namespaces for practical reasons that aren't important to this discussion. So that's why simply dropping the methods from ModuleTwo into ModuleOne would be all that needs doing.
If there are suggestions on an automatable and clean way to do this before shipping either module, I am definitely open to hearing them. Not having to do this work would be far preferable. Thanks!
There is no need for combine to define a new class every time it is called.
class CombinedAPI(APIOne, APITwo):
#classmethod
def combine(cls, arg_one, arg_two):
arg = "some_string_%s_%s" % (argone, argtwo)
return cls(arg)
obj = CombinedAPI.combine(foo, bar)
I have a program that must continuously create thousands of objects off of a class that has about 12–14 methods. Will the fact that they are of a complex class cause a performance hit over creating a simpler object like a list or dictionary, or even another object with fewer methods?
Some details about my situation:
I have a bunch of “text” objects that continuously create and refresh “prints” of their contents. The print objects have many methods but only a handful of attributes. The print objects can’t be contained within the text objects because the text objects need to be “reusable” and make multiple independent copies of their prints, so that rules out just swapping out the print objects’ attributes on refresh.
Am I better off,
Continuously creating the new print objects with all their methods as the application refreshes?
Unraveling the class and turning the print objects into simple structs and the methods into independent functions that take the objects as arguments?
This I assume would depend on whether or not there is a large cost associated with generating new objects with all the methods included in them, versus having to import all the independent functions to wherever they would have been called as object methods.
It doesn't matter how complex the class is; when you create an instance, you only store a reference to the class with the instance. All methods are accessed via this one reference.
No, it should not make a difference.
Consider that when you do the following:
a = Foo()
a.bar()
The call to the bar method is in fact translated under the covers to:
Foo.bar(a)
I.e. bar is "static" under the class definition, and there exists only one instance of the function. When you look at it this way then it suggests that no, there will be no significant impact from the number of methods. The methods are instantiated when you first run the Python program rather than create objects.
I did some testing.
I have the following function:
def call_1000000_times(f):
start = time.time()
for i in xrange(1000000):
f(a=i, b=10000-i)
return time.time() - start
As you can see, this function takes another function, calls it 1000000 times, and returns how long that took, in seconds.
I also created two classes:
A small class:
class X(object):
def __init__(self, a, b):
self.a = a
self.b = b
And a rather large one:
class Y(object):
def __init__(self, a, b):
self.a = a
self.b = b
def foo(self): pass
def bar(self): pass
def baz(self): pass
def anothermethod(self):
pass
#classmethod
def hey_a_class_method(cls, argument):
pass
def thisclassdoeswaytoomuch(self): pass
def thisclassisbecomingbloated(self): pass
def almostattheendoftheclass(self): pass
def imgonnahaveacouplemore(self): pass
def somanymethodssssss(self): pass
def should_i_add_more(self): pass
def yes_just_a_couple(self): pass
def just_for_lolz(self): pass
def coming_up_with_good_method_names_is_hard(self): pass
The results:
>>> call_1000000_times(dict)
0.2680389881134033
>>> call_1000000_times(X)
0.6771988868713379
>>> call_1000000_times(Y)
0.6260080337524414
As you can see, the difference between a large class and a small class is very small, with the large class even being faster in this case. I assume that if you ran this function multiple times with the same type, and averaged the numbers, they'd be even closer, but it's 3AM and I need sleep, so I'm not going to set that up right now.
On the other hand, just calling dict was about 2.5x faster, so if your bottleneck is instantiation, this might be a place to optimize things.
Be wary of premature optimization though. Classes, by holding data and code together, can make your code easier to understand and build upon (functional programming lovers, this is not the place for an argument). It might be a good idea to use the python profiler or other performance measuring tools to find out what parts of your code are slowing it down.