Object generator pattern - python

I have a class that represents a pretty complex object. The objects can be created by many ways: incremental building, by parsing text strings in different formats and by analyzing binary files. So far my strategy was as follows:
Have the constructor (__init__, in my case) initialize all the internal variables to None
Supply different member functions to populate the object
Have those functions return the new, modified object to the caller so we can do sd = SuperDuper().fromString(s)
For example:
class SuperDuper:
def __init__(self):
self.var1 = None
self.var2 = None
self.varN = None
## Generators
def fromStringFormat1(self, s):
#parse the string
return self
def fromStringFormat2(self, s):
#parse the string
return self
def fromAnotherLogic(self, *params):
#parse params
return self
## Modifiers (for incremental work)
def addThis(self, p):
pass
def addThat(self, p):
pass
def removeTheOtherOne(self, p):
pass
The problem is that the class becomes very huge. Unfotunately I am not familiar with OOP pattern designs, but I assume that there is a more ellegant solution for this problem. Is taking the generator functions out of the class (so that fromString(self, s) becomes superDuperFromString(s) a good idea?

What might be a better idea in your case is dependency injection and inversion of control. The idea is to create another class that has all of the settings that you are parsing out of all of these different sources. Then subclasses can define the method to actually parse it. Then when you instantiate the class, pass an instance of the settings class to it:
class Settings(object):
var1 = None
var2 = None
var3 = None
def configure_superduper(self, superduper):
superduper.var1 = self.var1
# etc
class FromString(Settings):
def __init__(self, string):
#parse strings and set var1, etc.
class SuperDuper(object):
def __init__(self, settings): # dependency injection
settings.configure_superduper(self) # inversion of control
# other initialization stuff
sup = SuperDuper(object, FromString(some_string))
Doing it this way has the advantage of adhering more closely to the single responsibility principle which says that a class should only have one (likely to occur) reason to change. If you change the way you're storing any of these strings, then the class has to change. Here, we're isolating that into one simple, separate class for each source of data.
If on the other hand, you think that the data that's being stored is more likely to change than the way it's stored, you might want to go with class methods as Ignacio is suggesting because this is (slightly) more complicated and doesn't really buy you much in that case because when that happens you have to change two classes in this scheme. Of course it doesn't really hurt much either because you'll only have to change one more assignment.

I don't believe it would be, since those all relate directly to the class regardless.
What I would do is make the constructor take arguments to initialize the fields (defaulting to None of course), then turn all the from*() methods into classmethods that construct new objects and return them.

I don't think it is a bad design to have conversion/creation methods inside the class. You could always move it to a separate class and then you would have a Simple Factory which is a very light-weight design pattern.
I'd keep them in the class though :)

Have those functions return the new, modified object to the caller so we can do sd = SuperDuper().fromString(s)
Rarely is this a good idea. While some Python library classes do this, it's not the best approach.
Generally, you want to do this.
class SuperDuper( object ):
def __init__(self, var1=None, var2=None, var3=None):
self.var1 = var1
self.var2 = var2
self.varN = var3
def addThis(self, p):
pass
def addThat(self, p):
pass
def removeTheOtherOne(self, p):
pass
class ParseString( object ):
def __init__( self, someString ):
pass
def superDuper( self ):
pass
class ParseString_Format1( ParseString ):
pass
class ParseString_Format2( ParseString ):
pass
def parse_format1( string ):
parser= ParseString_Format1( string )
return parser.superDuper()
def parse_format2( string ):
parser= ParseString_Format2( string )
return parser.superDuper()
def fromAnotherLogic( **kw ):
return SuperDuper( **kw )
There are two unrelated responsibilities: the object and the string representations of the object.
Do Not Conflate Objects and String Representations.
Objects and Parsing must be kept separate. After all, the compiler is not part of the code that's produced. An XML parser and the Document Object Model are generally separate objects.

Related

Python class use shared state for function?

I am trying to improve my python code and have started using classes to group related methods and variables.
What is the best practice when using a function that is able to access the variables that are initialized in the class? Should I just access the variable in the function? Or explicitly pass the variable to make it clear that I am relying on it?
I've created two examples to show what I mean by this question. Which method is preferred?
# method 1
class UploadForm(object):
def __init__(self, form_data):
self.file_name = form_data.get('file_name')
def validate(self):
agency_name = self.extract_agency_name(self.file_name)
#staticmethod
def extract_agency_name(file_name):
pattern = re.search('^[CFS]Y\d{4} (.+?)[.](?:xls|csv)$', file_name, re.I)
if pattern:
agency_name = pattern.group(1)
return agency_name
# method 2
class UploadForm(object):
def __init__(self, form_data):
self.file_name = form_data.get('file_name')
def validate(self):
agency_name = self.extract_agency_name()
def extract_agency_name(self):
pattern = re.search('^[CFS]Y\d{4} (.+?)[.](?:xls|csv)$', self.file_name, re.I)
if pattern:
agency_name = pattern.group(1)
return agency_name
For reasons below method 2 is preferred.
A member variable should be accessed via self.
By using self, you are making clear that you are referencing file_name variable of the same object.
Decorators can become overheads.
Decorators are wrappers around a method or a variable.
Passing more argument is more memory consuming.
Each argument takes up memory.

Change or override str from subclass method

I had a problem with overriding str inside my inherited class. Is there a way to do something similar?
class Sentence(str):
def input(self, msg):
"""Extend allow to hold one changing object for various strings."""
self = Sentence(input(msg))
def simplify(self):
self = self.lower()
self.strip()
I want to change mine string contained in that class, for various use. There's a way to do this? Because I tried many things from stack, and no one help me.
There is a explain what I want to do:
In init, I initialize Sentence class:
self.sentence = Sentence("")
Mainloop, where user can change Sentence:
self.sentence.input("Your input:")
After it I want to simplify string for alghoritm:
self.sentence.simplify()
And that's all, after it I want to use self.sentence like string.
But in both methods:
def input(self, msg):
"""Extend allow to hold one changing object for various strings."""
self = Sentence(input(msg))
def simplify(self):
self = self.lower()
self.strip()
String wasn't changed.
Due to the optimizations languages such as Python perform on strings (i.e. they are inmutable so the same string can be reused) I don't think it's a good practice to inherit from str, instead, you could write a class that wraps the string:
class Sentence:
def __init__(self, msg: str):
self.msg = msg
def simplify(self):
self.msg = self.msg.lower().strip()
This way you can improve your implementation if for example you are changing the string too often and you run into performance problems.

What would be more pythonic solution to this problem?

I have following structure for class.
class foo(object):
def __call__(self,param1):
pass
class bar(object):
def __call__(self,param1,param2):
pass
I have many classes of this type. And i am using this callable class as follows.
classes = [foo(), bar()]
for C in classes:
res = C(param1)
'''here i want to put condition if class takes 1 argumnet just pass 1
parameter otherwise pass two.'''
I have think of one pattern like this.
class abc():
def __init__(self):
self.param1 = 'xyz'
self.param2 = 'pqr'
def something(self, classes): # classes = [foo(), bar()]
for C in classes:
if C.__class__.__name__ in ['bar']:
res = C(self.param1, self.param2)
else:
res = C(self.param2)
but in above solution have to maintain list of class which takes two arguments and as i will add more class to file this will become messy.
I dont know whether this is correct(pythonic) way to do it.
On more idea i have in mind is to check how many argument that class is taking. If its 2 then pass an additional argument otherwise pass 1 argument.I have looked at this solution How can I find the number of arguments of a Python function? . But i am not confident enought that this is the best suited solution to my problem.
Few things about this:
There are only two type of classes in my usecase one with 1 argument and one with 2.
Both class takes first argument same so params1 in both case is same argument i am passing. in case of class with two required parameter i am passing additional argument(params2) containing some data.
Ps : Any help or new idea for this problem are appretiated.
UPD : Updated the code.
Basically, you want to use polymorphism on your object's __call__() method, but you have an issue with your callables signature not being the same.
The plain simple answer to this is: you can only use polymorphism on compatible types, which in this case means that your callables MUST have compatible signatures.
Hopefully, there's a quick and easy way to solve this: just modify your methods signatures so they accept varargs and kwargs:
class Foo(object):
def __call__(self,param1, *args, **kw):
pass
class Bar(object):
def __call__(self, param1, param2, *args, **kw):
pass
For the case where you can't change the callable's signature, there's still a workaround: use a lambda as proxy:
def func1(y, z):
pass
def func2(x):
pass
callables = [func1, lambda y, z: func2(y)]
for c in callables:
c(42, 1138)
Note that this last example is actually known as the adapter pattern
Unrelated: this:
if C.__class__.__name__ in ['bar']:
is a inefficient and convoluted way to write:
if C.__class__.__name__ == 'bar':
which is itself an inefficient, convoluted AND brittle way to write:
if type(C) is bar:
which, by itself, is a possible design smell (there are legit use cases for checking the exact type of an object, but most often this is really a design issue).

Preprocessing data in Python before passing it to class constructor

Is it good style to create a separate method, in which I preprocess data, before I pass it to the constructor (in case the preprocessing is cumbersome), like so:
class C():
def __init__(self, input, more_input):
self.value = self.prepare_value(input, more_input)
def prepare_value(self, input, more_input):
#here I actually do some nontrivial stuff, over many lines
#for brevity I'm illustrating just a short, one-line operation
value = (input + more_input)/2
return value
print(C(10, 33).value) # has value 21.5
If you wanted to do it like this, then I'd suggest two things.
Make the prepare_value() method a static method by decorating with the #staticmethod decorator. Since it's not making any changes to the instance of the class itself, just returning a value then you shouldn't be making it a method of the instance. Hence, #staticmethod.
Signify that the method should only be used internally by using the name _prepare_value(). This doesn't actually make it private, but it's a well recognized convention to say to other developers (i.e. future you) "this method isn't designed to be used externally".
Overall my suggestion would be:
class C():
def __init__(self, input, more_input):
self.value = self._prepare_value(input, more_input)
#staticmethod
def _prepare_value(input, more_input):
value = (input + more_input)/2
return value

Is it bad to store all instances of a class in a class field?

I was wondering if there is anything wrong (from a OOP point of view) in doing something like this:
class Foobar:
foobars = {}
def __init__(self, name, something):
self.name = name
self.something = something
Foobar.foobars[name] = self
Foobar('first', 42)
Foobar('second', 77)
for name in Foobar.foobars:
print name, Foobar.foobars[name]
EDIT: this is the actual piece of code I'm using right now
from threading import Event
class Task:
ADDED, WAITING_FOR_DEPS, READY, IN_EXECUTION, DONE = range(5)
tasks = {}
def __init__(self, name, dep_names, job, ins, outs, uptodate, where):
self.name = name
self.dep_names = [dep_names] if isinstance(dep_names, str) else dep_names
self.job = job
self.where = where
self.done = Event()
self.status = Task.ADDED
self.jobs = []
# other stuff...
Task.tasks[name] = self
def set_done(self):
self.done.set()
self.status = Task.DONE
def wait_for_deps(self):
self.status = Task.WAITING_FOR_DEPS
for dep_name in self.dep_names:
Task.tasks[dep_name].done.wait()
self.status = Task.READY
def add_jobs_to_queues(self):
jobs = self.jobs
# a lot of stuff I trimmed here
for w in self.where: Queue.queues[w].put(jobs)
self.status = Task.IN_EXECUTION
def wait_for_jobs(self):
for j in self.jobs: j.wait()
#[...]
As you can see I need to access the dictionary with all the instances in
the wait_for_deps method. Would it make more sense to have a global variable
instead of a class field? I could be using a wrong approach here, maybe that
stuff shouldn't even be in a method, but it made sense to me (I'm new to OOP)
Yes. It's bad. It conflates the instance with the collection of instances.
Collections are one thing.
The instances which are collected are unrelated.
Also, class-level variables which get updated confuse some of us. Yes, we can eventually reason on what's going on, but the Standard Expectation™ is that state change applies to objects, not classes.
class Foobar_Collection( dict ):
def __init__( self, *arg, **kw ):
super( Foobar_Collection, self ).__init__( *arg, **kw ):
def foobar( self, *arg, **kw ):
fb= Foobar( *arg, **kw )
self[fb.name]= fb
return fb
class Foobar( object ):
def __init__( self, name, something )
self.name= name
self.something= something
fc= Foobar_Collection()
fc.foobar( 'first', 42 )
fc.foobar( 'second', 77 )
for name in fc:
print name, fc[name]
That's more typical.
In your example, the wait_for_deps is simply a method of the task collection, not the individual task. You don't need globals.
You need to refactor.
I don't suppose that there's anything wrong with this, but I don't really see how this would be sensible. Why would you need to keep a global variable (in the class, of all places) that holds references to all the instances? The client could just as easily implement this himself if he just kept a list of his instances. All in all, it seems a little hackish and unnecessary, so I'd recommend that you don't do it.
If you're more specific about what you're trying to do, perhaps we can find a better solution.
This is NOT cohesive, as well as not very functional, you want to strive to get your objects as far from the 'data-bucket' mindset as possible. The static object collection is not going to really gain you anything, you need to think WHY do you need all the objects in the collection and think about creating a second class whose responsibility is to manage and be queried for all the Foobars in the system.
Why would you want to do this?
There are several problems with this code. The first is that you have to take care of deleting instances -- there will always be a reference to each Foobar instance left in Foobar.foobars, so the garbage collector will never garbage collect them. The second problem is that it won't work with copy and pickle.
But apart from the technical problems, it feels like a wrong design. The purpose of object instances is hiding state, and you make them see each other.
From a OOP point of view there's nothing wrong with it. A class is an instance of a metaclass, and any instance can hold any kind of data in it.
However, from an efficiency point of view, if you don't eventualy clean up the foobars dict on a long running Python program, you are having potential memory leak.
No one has mentioned the potential problem this might have if you later derive a subclass from Foobar which could happen if the base class __init__() function is called from the derived class's __init__(). Specifically whether you want all the subclass instances to be sored in the same place as those of the base class -- which of course depend on why you're doing this.
It's a solvable problem but something to consider, and perhaps to code for, up front in the base class.
I needed multiple Jinja environments in an app engine application:
class JinjaEnv(object):
""" Jinja environment / loader instance per env_name """
_env_lock = threading.Lock()
with _env_lock:
_jinja_envs = dict() # instances of this class
def __init__(self, env_name):
self.jinja_loader = ..... # init jinja loader
self.client_cache = memcache.Client()
self.jinja_bcc = MemcachedBytecodeCache(self.client_cache, prefix='jinja2/bcc_%s/' % env_name, timeout=3600)
self.jinja_env = self.jinja_loader(self.jinja_bcc, env_name)
#classmethod
def get_env(cls, env_name):
with cls._env_lock:
if env_name not in cls._jinja_envs:
cls._jinja_envs[env_name] = JinjaEnv(env_name) # new env
return cls._jinja_envs[env_name].jinja_env
#classmethod
def flush_env(cls, env_name):
with cls._env_lock:
if env_name not in cls._jinja_envs:
self = cls._jinja_envs[env_name] = JinjaEnv(env_name) # new env
else:
self = cls._jinja_envs[env_name]
self.client_cache.flush_all()
self.jinja_env = self.jinja_loader(self.jinja_bcc, env_name)
return self.jinja_env
Used like:
template = JinjaEnv.get_env('example_env').get_template('example_template')

Categories