How to reassign a class method after calling another method? - python

I am currently working on a Python project with a couple class methods that are each called tens of thousands of times. One of the issues with these methods is that they rely on data being populated via another method first, so I want to be able to raise an error if the functions are called prior to populating the data.
And before anyone asks, I opted to separate the data population stage from the class constructor. This is because the data population (and processing) is intensive and I want to manage it separately from the constructor.
Simple (inefficient) implementation
A simple implementation of this might look like:
class DataNotPopulatedError(Exception):
...
class Unblocker1:
def __init__(self):
self.data = None
self._is_populated = False
def populate_data(self, data):
self.data = data
self._is_populated = True
# It will make sense later why this is its own method
def _do_something(self):
print("Data is:", self.data)
def do_something(self):
if not self._is_populated:
raise DataNotPopulatedError
return self._do_something()
unblocker1 = Unblocker1()
# Raise an error (We haven't populated the data yet)
unblocker1.do_something()
# Don't raise an error (We populated the data first)
unblocker1.populate_data([1,2,3])
unblocker1.do_something()
My goal
Because the hypothetical do_something() method is called tens (or hundreds) of thousands of times, I would think those extra checks to make sure that the data has been populated would start to add up.
While I may be barking up the wrong tree, my first thoughts to improve the efficiency of thefunction were to dynamically re-assign the method after the data is populated. I.e., when first creating the class, the do_something() method would point to another function that only raises a DataNotPopulatedError. The populate_data() method would then both populate the data and also "unblock" do_something() by dynamically reassigning do_something() back to the function as written.
I figure the cleanest way to implement something like this would be using a decorator.
Hypothetical usage
I have no idea how to implement the technique described above, however, I did create a hypothetical usage with the inefficient method from before. Given the goal implementation, there might need to be two decorators--one for the blocked functions, and one to unblock them.
import functools
def blocked2(attr, raises):
def _blocked2(func):
#functools.wraps(func)
def wrapper(*args, **kwargs):
# Assumes `args[0]` is `self`
# If `self.key` is False, raise `raises`, otherwise call `func()`
if not getattr(args[0], attr):
raise raises
return func(*args, **kwargs)
return wrapper
return _blocked2
class Unblocker2:
def __init__(self):
self.data = None
self._is_populated = False
def populate_data(self, data):
self.data = data
self._is_populated = True
#blocked2("_is_populated", DataNotPopulatedError)
def do_something(self):
print("Data is:", self.data)
I've been having a hard time explaining what I am attempting to do, so I am open to other suggestions to accomplish a similar goal (and potentially better titles for the post). There is a decent chance I am taking the complete wrong approach here; that's just part of learning. If there is a better way of doing what I am trying to do, I am all ears!

What you are trying to do does not seem especially difficult. I suspect you are overcomplicating the task a bit. Assuming you are willing to respect your own private methods, you can do something like
class Unblocker2:
def __init__(self):
self.data = None
def populate_data(self, data):
self.data = data
self.do_something = self._do_something_populated
def do_something(self):
raise DataNotPopulatedError('Data not populated yet')
def _do_something_populated(self):
print("Data is:", self.data)
Since methods are non-data descriptors, assigning a bound method to the instance attribute do_something will shadow the class attribute. That way, instances that have data populated can avoid making a check with the minimum of redundancy.
That being said, profile your code before going off and optimizing it. You'd be amazed at which parts take the longest.

Related

Patching in Python

I have a python file say
python_file_a.py
def load_content():
dir = "/down/model/"
model = Model(model_dir=dir)
return model
model = load_content()
def invoke(req):
return model.execute(req)
test_python_file_a.py
#patch("module.python_file_a.load_content")
#patch("module.python_file_a.model", Mock(spec=Model))
def test_invoke():
from module.python_file_a import model, invoke
model.execute = Mock(return_value="Some response")
invoke("some request")
This is still trying to load the actual model from the path "/down/model/" in the test. What is the correct way of patching so that the load_content function is mocked in the test?
Without knowing more about what your code does or how it's used it's hard to say exactly, but in this case the correct approach--and in many cases--is to not hard-code values as local variables in functions. Change your load_content() function to take an argument like:
def load_content(dirname):
...
or even give it a default value like
def load_content(dirname="/default/path"):
pass
For the test don't use the model instance instantiated at module level (arguably you should not be doing this in the first place, but again it depends on what you're trying to do).
Update: Upon closer inspect the problem really seems to stem from you instantiating a module-global instance at import time. Maybe try to avoid doing that and use lazy instantiation instead, like:
model = None
then if you really must write a function that accesses the global variable:
def invoke():
global model
if model is None:
model = load_content()
Alternatively you can use a PEP 562 module-level __getattr__ function.
Or write a class instead of putting everything at module-level.
class ModelInvoker:
def __init__(self, dirname='/path/to/content'):
self.dirname = dirname
#functools.cached_property
def model(self):
return load_content(self.dirname)
def invoke(self, req):
return model.execute(req)
Many other approaches to this depending on your use case. But finding some form of encapsulation is what you need if you want to be able to easily mock and replace parts of some code, and not execute code unnecessarily at import time.

Is it conventional for a member function that updates object state to also return a value?

I have a question regarding return convention in Python. I have a class that has some data attributes and I have several functions that perform some analysis on the data and then store the results as results attributes (please see the simplified implementation below). Since, the analysis functions mainly update the results attribute, my question is what is the best practice in terms of return statements. Should I avoid updating class attributes inside the function (as in process1), and just return the data and use that to update the results attribute (as in process2)?
Thanks,
Kamran
class Analysis(object):
def __init__(self, data):
self.data = data
self.results = None
def process1(self):
self.results = [i**2 for i in self.data]
def process2(self):
return [i**2 for i in self.data]
a = Analysis([1, 2, 3])
a.process1()
a.results = a.process2()
It all depends on the use case.
First of all, You are not changing the class attributes there. You are changing the instance attributes.
Python: Difference between class and instance attributes
Secondly, If you are planning to share the results of your process among the various instances of the class,Then you can use class variables.
Third, If you are asking about instance variables, Then it depends on design choice.
Besides that, This is unnecessary I guess:-
a.results = a.process2()
You simply made allocation to be part of object itself. You can use a.process2 function for displaying results if you need that functionality.

Design pattern for transformation of array data in Python script

I currently have a Python class with many methods each of which performs a transformation of time series data (primarily arrays). Each method must be called in a specific order, since the inputs of each function specifically rely on the outputs of the previous. Hence my code structure looks like the following:
class Algorithm:
def __init__(self, data1, data2):
self.data1 = data1
self.data2 = data2
def perform_transformation1(self):
...perform action on self.data1
def perform_transformation2(self):
...perform action on self.data1
etc..
At the bottom of the script, I instantiate the class and then proceed to call each method on the instance, procedurally. Using object orientated programming in this case seems wrong to me.
My aims are to re-write my script in a way such that the inputs of each method are not dependent on the outputs of the preceding method, hence giving me the ability to decide whether or not to perform certain methods.
What design pattern should I be using for this purpose, and does this move more towards functional programming?
class Algorithm:
#staticmethod
def perform_transformation_a(data):
return 350 * data
#staticmethod
def perform_transformation_b(data):
return 400 * data
def perform_transformations(data):
transformations = (Algorithm.perform_transformation_a,
Algorithm.perform_transformation_b)
for transformation in transformations:
data = transformation(data)
return data
You could have the Algorithm class just be a collection of pure functions. And then have the client code (perform_transformations here) be the logic of ordering which transformations to apply. This way the Algorithm class is only responsible for algorithms and client worries about ordering the functions.
One option is to define your methods like this:
def perform_transformation(self, data=None):
if data is None:
perform action on self.data
else:
perform action on data
This way, you have the ability to call them at any time you want.

python: will an class with many methods take longer to initialize?

I have a program that must continuously create thousands of objects off of a class that has about 12–14 methods. Will the fact that they are of a complex class cause a performance hit over creating a simpler object like a list or dictionary, or even another object with fewer methods?
Some details about my situation:
I have a bunch of “text” objects that continuously create and refresh “prints” of their contents. The print objects have many methods but only a handful of attributes. The print objects can’t be contained within the text objects because the text objects need to be “reusable” and make multiple independent copies of their prints, so that rules out just swapping out the print objects’ attributes on refresh.
Am I better off,
Continuously creating the new print objects with all their methods as the application refreshes?
Unraveling the class and turning the print objects into simple structs and the methods into independent functions that take the objects as arguments?
This I assume would depend on whether or not there is a large cost associated with generating new objects with all the methods included in them, versus having to import all the independent functions to wherever they would have been called as object methods.
It doesn't matter how complex the class is; when you create an instance, you only store a reference to the class with the instance. All methods are accessed via this one reference.
No, it should not make a difference.
Consider that when you do the following:
a = Foo()
a.bar()
The call to the bar method is in fact translated under the covers to:
Foo.bar(a)
I.e. bar is "static" under the class definition, and there exists only one instance of the function. When you look at it this way then it suggests that no, there will be no significant impact from the number of methods. The methods are instantiated when you first run the Python program rather than create objects.
I did some testing.
I have the following function:
def call_1000000_times(f):
start = time.time()
for i in xrange(1000000):
f(a=i, b=10000-i)
return time.time() - start
As you can see, this function takes another function, calls it 1000000 times, and returns how long that took, in seconds.
I also created two classes:
A small class:
class X(object):
def __init__(self, a, b):
self.a = a
self.b = b
And a rather large one:
class Y(object):
def __init__(self, a, b):
self.a = a
self.b = b
def foo(self): pass
def bar(self): pass
def baz(self): pass
def anothermethod(self):
pass
#classmethod
def hey_a_class_method(cls, argument):
pass
def thisclassdoeswaytoomuch(self): pass
def thisclassisbecomingbloated(self): pass
def almostattheendoftheclass(self): pass
def imgonnahaveacouplemore(self): pass
def somanymethodssssss(self): pass
def should_i_add_more(self): pass
def yes_just_a_couple(self): pass
def just_for_lolz(self): pass
def coming_up_with_good_method_names_is_hard(self): pass
The results:
>>> call_1000000_times(dict)
0.2680389881134033
>>> call_1000000_times(X)
0.6771988868713379
>>> call_1000000_times(Y)
0.6260080337524414
As you can see, the difference between a large class and a small class is very small, with the large class even being faster in this case. I assume that if you ran this function multiple times with the same type, and averaged the numbers, they'd be even closer, but it's 3AM and I need sleep, so I'm not going to set that up right now.
On the other hand, just calling dict was about 2.5x faster, so if your bottleneck is instantiation, this might be a place to optimize things.
Be wary of premature optimization though. Classes, by holding data and code together, can make your code easier to understand and build upon (functional programming lovers, this is not the place for an argument). It might be a good idea to use the python profiler or other performance measuring tools to find out what parts of your code are slowing it down.

Return self in class method - is this good approach?

I have a class like below
class TestClass(object):
def __init__(self, data):
self.data = data
def method_a(self, data):
self.data += data/2
return self
def method_b(self, data):
self.data += data
return self
def method_c(self, data):
self.data -= data
return self
Every method returns self. I wrote it that way to be albe to call few metohds in a chain, ie. object.method_a(10).method_b(12).method_c(11). I was told that return self in method doesn't return current object, but creates new one. Is it really how it works? Is it good practice to use return self in python methods?
Is it really how it works?
No, it won't create a new object, it will return the same instance. You can check it by using is keyword, which checks if two objects are the same:
t = TestClass(3)
c = t.method_a(4)
print t is c
>>> True
Is it good practice to use return self in python methods?
Yes, it's often used to allow chaining.
Returning self does not create a new object.
I'm sure some people will tell you that method chaining is bad. I don't agree that it is necessarily bad. Writing an API that is designed for method chaining is a choice, and if you decide to do it you need to make sure everything works as people expect. But if you decide to do it, then returning self is often the right thing to do. (Sometimes, you may want to return a newly-created object instead. It depends whether you want the chained methods to cumulatively modify the original object or not.)
return self does not create a new object, you were told wrong.
If you want to build an API that supports chaining, returning self is a fine practice. You may want to build a new object instead for a chaining API, but that's not a requirement either.
See Python class method chaining for how the SQLAlchemy library handles chaining with new instances per change. Creating a new object when mutating objects in a chain allows you to re-use partial transformations as a starting point for several new objects with different chained transformations applied.

Categories