Class with too many parameters: better design strategy? - python

I am working with models of neurons. One class I am designing is a cell class which is a topological description of a neuron (several compartments connected together). It has many parameters but they are all relevant, for example:
number of axon segments, apical bifibrications, somatic length, somatic diameter, apical length, branching randomness, branching length and so on and so on... there are about 15 parameters in total!
I can set all these to some default value but my class looks crazy with several lines for parameters. This kind of thing must happen occasionally to other people too, is there some obvious better way to design this or am I doing the right thing?
UPDATE:
As some of you have asked I have attached my code for the class, as you can see this class has a huge number of parameters (>15) but they are all used and are necessary to define the topology of a cell. The problem essentially is that the physical object they create is very complex. I have attached an image representation of objects produced by this class. How would experienced programmers do this differently to avoid so many parameters in the definition?
class LayerV(__Cell):
def __init__(self,somatic_dendrites=10,oblique_dendrites=10,
somatic_bifibs=3,apical_bifibs=10,oblique_bifibs=3,
L_sigma=0.0,apical_branch_prob=1.0,
somatic_branch_prob=1.0,oblique_branch_prob=1.0,
soma_L=30,soma_d=25,axon_segs=5,myelin_L=100,
apical_sec1_L=200,oblique_sec1_L=40,somadend_sec1_L=60,
ldecf=0.98):
import random
import math
#make main the regions:
axon=Axon(n_axon_seg=axon_segs)
soma=Soma(diam=soma_d,length=soma_L)
main_apical_dendrite=DendriticTree(bifibs=
apical_bifibs,first_sec_L=apical_sec1_L,
L_sigma=L_sigma,L_decrease_factor=ldecf,
first_sec_d=9,branch_prob=apical_branch_prob)
#make the somatic denrites
somatic_dends=self.dendrite_list(num_dends=somatic_dendrites,
bifibs=somatic_bifibs,first_sec_L=somadend_sec1_L,
first_sec_d=1.5,L_sigma=L_sigma,
branch_prob=somatic_branch_prob,L_decrease_factor=ldecf)
#make oblique dendrites:
oblique_dends=self.dendrite_list(num_dends=oblique_dendrites,
bifibs=oblique_bifibs,first_sec_L=oblique_sec1_L,
first_sec_d=1.5,L_sigma=L_sigma,
branch_prob=oblique_branch_prob,L_decrease_factor=ldecf)
#connect axon to soma:
axon_section=axon.get_connecting_section()
self.soma_body=soma.body
soma.connect(axon_section,region_end=1)
#connect apical dendrite to soma:
apical_dendrite_firstsec=main_apical_dendrite.get_connecting_section()
soma.connect(apical_dendrite_firstsec,region_end=0)
#connect oblique dendrites to apical first section:
for dendrite in oblique_dends:
apical_location=math.exp(-5*random.random()) #for now connecting randomly but need to do this on some linspace
apsec=dendrite.get_connecting_section()
apsec.connect(apical_dendrite_firstsec,apical_location,0)
#connect dendrites to soma:
for dend in somatic_dends:
dendsec=dend.get_connecting_section()
soma.connect(dendsec,region_end=random.random()) #for now connecting randomly but need to do this on some linspace
#assign public sections
self.axon_iseg=axon.iseg
self.axon_hill=axon.hill
self.axon_nodes=axon.nodes
self.axon_myelin=axon.myelin
self.axon_sections=[axon.hill]+[axon.iseg]+axon.nodes+axon.myelin
self.soma_sections=[soma.body]
self.apical_dendrites=main_apical_dendrite.all_sections+self.seclist(oblique_dends)
self.somatic_dendrites=self.seclist(somatic_dends)
self.dendrites=self.apical_dendrites+self.somatic_dendrites
self.all_sections=self.axon_sections+[self.soma_sections]+self.dendrites

UPDATE: This approach may be suited in your specific case, but it definitely has its downsides, see is kwargs an antipattern?
Try this approach:
class Neuron(object):
def __init__(self, **kwargs):
prop_defaults = {
"num_axon_segments": 0,
"apical_bifibrications": "fancy default",
...
}
for (prop, default) in prop_defaults.iteritems():
setattr(self, prop, kwargs.get(prop, default))
You can then create a Neuron like this:
n = Neuron(apical_bifibrications="special value")

I'd say there is nothing wrong with this approach - if you need 15 parameters to model something, you need 15 parameters. And if there's no suitable default value, you have to pass in all 15 parameters when creating an object. Otherwise, you could just set the default and change it later via a setter or directly.
Another approach is to create subclasses for certain common kinds of neurons (in your example) and provide good defaults for certain values, or derive the values from other parameters.
Or you could encapsulate parts of the neuron in separate classes and reuse these parts for the actual neurons you model. I.e., you could write separate classes for modeling a synapse, an axon, the soma, etc.

You could perhaps use a Python"dict" object ?
http://docs.python.org/tutorial/datastructures.html#dictionaries

Having so many parameters suggests that the class is probably doing too many things.
I suggest that you want to divide your class into several classes, each of which take some of your parameters. That way each class is simpler and won't take so many parameters.
Without knowing more about your code, I can't say exactly how you should split it up.

Looks like you could cut down the number of arguments by constructing objects such as Axon, Soma and DendriticTree outside of the LayerV constructor, and passing those objects instead.
Some of the parameters are only used in constructing e.g. DendriticTree, others are used in other places as well, so the problem it's not as clear cut, but I would definitely try that approach.

could you supply some example code of what you are working on? It would help to get an idea of what you are doing and get help to you sooner.
If it's just the arguments you are passing to the class that make it long, you don't have to put it all in __init__. You can set the parameters after you create the class, or pass a dictionary/class full of the parameters as an argument.
class MyClass(object):
def __init__(self, **kwargs):
arg1 = None
arg2 = None
arg3 = None
for (key, value) in kwargs.iteritems():
if hasattr(self, key):
setattr(self, key, value)
if __name__ == "__main__":
a_class = MyClass()
a_class.arg1 = "A string"
a_class.arg2 = 105
a_class.arg3 = ["List", 100, 50.4]
b_class = MyClass(arg1 = "Astring", arg2 = 105, arg3 = ["List", 100, 50.4])

After looking over your code and realizing I have no idea how any of those parameters relate to each other (soley because of my lack of knowledge on the subject of neuroscience) I would point you to a very good book on object oriented design. Building Skills in Object Oriented Design by Steven F. Lott is an excellent read and I think would help you, and anyone else in laying out object oriented programs.
It is released under the Creative Commons License, so is free for you to use, here is a link of it in PDF format http://homepage.mac.com/s_lott/books/oodesign/build-python/latex/BuildingSkillsinOODesign.pdf
I think your problem boils down to the overall design of your classes. Sometimes, though very rarely, you need a whole lot of arguments to initialize, and most of the responses here have detailed other ways of initialization, but in a lot of cases you can break the class up into more easier to handle and less cumbersome classes.

This is similar to the other solutions that iterate through a default dictionary, but it uses a more compact notation:
class MyClass(object):
def __init__(self, **kwargs):
self.__dict__.update(dict(
arg1=123,
arg2=345,
arg3=678,
), **kwargs)

Can you give a more detailed use case ? Maybe a prototype pattern will work:
If there are some similarities in groups of objects, a prototype pattern might help.
Do you have a lot of cases where one population of neurons is just like another except different in some way ? ( i.e. rather than having a small number of discrete classes,
you have a large number of classes that slightly differ from each other. )
Python is a classed based language, but just as you can simulate class based
programming in a prototype based language like Javascript, you can simulate
prototypes by giving your class a CLONE method, that creates a new object and
populates its ivars from the parent. Write the clone method so that keyword parameters
passed to it override the "inherited" parameters, so you can call it with something
like:
new_neuron = old_neuron.clone( branching_length=n1, branching_randomness=r2 )

I have never had to deal with this situation, or this topic. Your description implies to me that you may find, as you develop the design, that there are a number of additional classes that will become relevant - compartment is the most obvious. If these do emerge as classes in their own right, it is probable that some of your parameters become parameters of these additional classes.

You could create a class for your parameters.
Instead passing a bunch of parameters, you pass one class.

In my opinion, in your case the easy solution is to pass higher order objects as parameter.
For example, in your __init__ you have a DendriticTree that uses several arguments from your main class LayerV:
main_apical_dendrite = DendriticTree(
bifibs=apical_bifibs,
first_sec_L=apical_sec1_L,
L_sigma=L_sigma,
L_decrease_factor=ldecf,
first_sec_d=9,
branch_prob=apical_branch_prob
)
Instead of passing these 6 arguments to your LayerV you would pass the DendriticTree object directly (thus saving 5 arguments).
You probably want to have this values accessible everywhere, therefore you will have to save this DendriticTree:
class LayerV(__Cell):
def __init__(self, main_apical_dendrite, ...):
self.main_apical_dendrite = main_apical_dendrite
If you want to have a default value too, you can have:
class LayerV(__Cell):
def __init__(self, main_apical_dendrite=None, ...):
self.main_apical_dendrite = main_apical_dendrite or DendriticTree()
This way you delegate what the default DendriticTree should be to the class dedicated to that matter instead of having this logic in the higher order class that LayerV.
Finally, when you need to access the apical_bifibs you used to pass to LayerV you just access it via self.main_apical_dendrite.bifibs.
In general, even if the class you are creating is not a clear composition of several classes, your goal is to find a logical way to split your parameters. Not only to make your code cleaner, but mostly to help people understand what these parameter will be used for. In the extreme cases where you can't split them, I think it's totally ok to have a class with that many parameters. If there is no clear way to split arguments, then you'll probably end up with something even less clear than a list of 15 arguments.
If you feel like creating a class to group parameters together is overkill, then you can simply use collections.namedtuple which can have default values as shown here.

Want to reiterate what a number of people have said. Theres nothing wrong with that amount of parameters. Especially when it comes to scientific computing/programming
Take for example, sklearn's KMeans++ clustering implementation which has 11 parameters you can init with. Like that, there are numerous examples and nothing wrong with them

I would say there is nothing wrong if make sure you need those params. If you really wanna make it more readable I would recommend following style.
I wouldn't say that a best practice or what, it just make others easily know what is necessary for this Object and what is option.
class LayerV(__Cell):
# author: {name, url} who made this info
def __init__(self, no_default_params, some_necessary_params):
self.necessary_param = some_necessary_params
self.no_default_param = no_default_params
self.something_else = "default"
self.some_option = "default"
def b_option(self, value):
self.some_option = value
return self
def b_else(self, value):
self.something_else = value
return self
I think the benefit for this style is:
You can easily know the params which is necessary in __init__ method
Unlike setter, you don't need two lines to construct the object if you need set an option value.
The disadvantage is, you created more methods in your class than before.
sample:
la = LayerV("no_default", "necessary").b_else("sample_else")
After all, if you have a lot of "necessary" and "no_default" params, always think about is this class(method) do too many things.
If your answer is not, just go ahead.

Related

Avoiding global variables but also too many function arguments (Python)

Let's say I have a python module that has a lot of functions that rely on each other, processing each others results. There's lots of cohesion.
That means I'll be passing back and forth a lot of arguments. Either that, or I'd be using global variables.
What are best practices to deal with such a situation if? Things that come to mind would be replacing those parameters with dictionaries. But I don't necessarily like how that changes the function signature to something less expressive. Or I can wrap everything into a class. But that feels like I'm cheating and using "pseudo"-global variables?
I'm asking specifically for how to deal with this in Python but I understand that many of those things would apply to other languages as well.
I don't have a specific code example right, it's just something that came to mind when I was thinking about this issue.
Examples could be: You have a function that calculates something. In the process, a lot of auxiliary stuff is calculated. Your processing routines need access to this auxiliary stuff, and you don't want to just re-compute it.
This is a very generic question so it is hard to be specific. What you seem to be describing is a bunch of inter-related functions that share data. That pattern is usually implemented as an Object.
Instead of a bunch of functions, create a class with a lot of methods. For the common data, use attributes. Set the attributes, then call the methods. The methods can refer to the attributes without them being explicitly passed as parameters.
As RobertB said, an object seems the clearest way. Could be as simple as:
class myInfo:
def __init__(self, x=0.0, y=0.0):
self.x = x
self.y = y
self.dist = self.messWithDist()
def messWithDist(self):
self.dist = math.sqrt(self.x*self.x + self.y*self.y)
blob = myInfo(3,4)
blob.messWithDist()
print(blob.dist)
blob.x = 5
blob.y = 7
blob.messWithDist()
print(blob.dist)
If some of the functions shouldn't really be part of such an object, you can just define them as (non-member, independent) functions, and pass the blob as one parameter. For example, by un-indenting the def of messWithDist, then calling as messWithDist(blob) instead of blob.messWithDist().
-s

A idiom or design pattern for class template?

My code base is in Python. Let's say I have a fairly generic class called Report. It takes a large number of parameters
class Report(object):
def __init__(self, title, data_source, columns, format, ...many more...)
And there are many many instantiation of the Report. These instantiation are not entirely unrelated. Many reports share similar set of parameters, differ only with minor variation, like having the same data_source and columns but with a different title.
Rather than duplicating the parameters, some programming construct is applied to make expression this structure easier. And I'm trying to find some help to sort my head to identify some idiom or design pattern for this.
If a subcategory of report need some extra processing code, subclass seems to be a good choice. Say we have a subcategory of ExpenseReport.
class ExpenseReport(Report):
def __init__(self, title, ... a small number of parameters ...)
# some parameters are fixed, while others are specific to this instance
super(ExpenseReport,self).__init__(
title,
EXPENSE_DATA_SOURCE,
EXPENSE_COLUMNS,
EXPENSE_FORMAT,
... a small number of parameters...)
def processing(self):
... extra processing specific to ExpenseReport ...
But in a lot of cases, the subcategory merely fix some parameters without any extra processing. It could easily be done with partial function.
ExpenseReport = functools.partial(Report,
data_source = EXPENSE_DATA_SOURCE,
columns = EXPENSE_COLUMNS,
format = EXPENSE_FORMAT,
)
And in some case, there isn't even any difference. We simply need 2 copies of the same object to be used in different environment, like to be embedded in different page.
expense_report = Report("Total Expense", EXPENSE_DATA_SOURCE, ...)
page1.add(expense_report)
...
page2.add(clone(expense_report))
And in my code base, an ugly technique is used. Because we need 2 separate instances for each page, and because we don't want to duplicate the code with long list of parameter that creates report, we just clone (deepcopy in Python) the report for page 2. Not only is the need of cloning not apparent, neglecting to clone the object and instead sharing one instance creates a lot of hidden problem and subtle bugs in our system.
Is there any guidance in this situation? Subclass, partial function or other idiom? My desire is for this construct to be light and transparent. I'm slight wary of subclassing because it is likely to result in a jungle of subclass. And it induces programmer to add special processing code like what I have in ExpenseReport. If there is a need I rather analyze the code to see if it can be generalized and push to the Report layer. So that Report becomes more expressive without needing special processing in lower layers.
Additional Info
We do use keyword parameter. The problem is more in how to manage and organize the instantiation. We have a large number of instantiation with common patterns:
expense_report = Report("Expense", data_source=EXPENSE, ..other common pattern..)
expense_report_usd = Report("USD Expense", data_source=EXPENSE, format=USD, ..other common pattern..)
expense_report_euro = Report("Euro Expense", data_source=EXPENSE, format=EURO, ..other common pattern..)
...
lot more reports
...
page1.add(expense_report_usd)
page2.add(expense_report_usd) # oops, page1 and page2 shared the same instance?!
...
lots of pages
...
Why don't you just use keyword arguments and collect them all into a dict:
class Report(object):
def __init__(self, **params):
self.params = params
...
I see no reason why you shouldn't just use a partial function.
If your main problem is common arguments in class constructos, possible solution is to write something like:
common_arguments = dict(arg=value, another_arg=anoter_value, ...)
expense_report = Report("Expense", data_source=EXPENSE, **common_arguments)
args_for_shared_usd_instance = dict(title="USD Expense", data_source=EXPENSE, format=USD)
args_for_shared_usd_instance.update(common_arguments)
expense_report_usd = Report(**args_for_shared_usd_instance)
page1.add(Report(**args_for_shared_usd_instance))
page2.add(Report(**args_for_shared_usd_instance))
Better naming, can make it convenient. Maybe there is better design solution.
I found some information myself.
I. curry -- associating parameters with a function « Python recipes « ActiveState Code
http://code.activestate.com/recipes/52549-curry-associating-parameters-with-a-function/
See the entire dicussion. Nick Perkins' comment on 'Lightweight' subclasses is similar to what I've described.
II. PEP 309 -- Partial Function Application
http://www.python.org/dev/peps/pep-0309/
The question is quite old, but this might still help someone who stumbles onto it...
I made a small library called classical to simplify class inheritance cases like this (Python 3 only).
Simple example:
from classical.descriptors import ArgumentedSubclass
class CustomReport(Report):
Expense = ArgumentedSubclass(data_source=EXPENSE, **OTHER_EXPENSE_KWARGS)
Usd = ArgumentedSubclass(format=USD)
Euro = ArgumentedSubclass(format=EURO)
PatternN = ArgumentedSubclass(**PATTERN_N_KWARGS)
PatternM = ArgumentedSubclass(**PATTERN_M_KWARGS)
# Now you can chain these in any combination (and with additional arguments):
my_report_1 = CustomReport.Expense.Usd(**kwargs)
my_report_2 = CustomReport.Expense.Euro(**kwargs)
my_report_3 = CustomReport.Expense.PatternM.PatternN(**kwargs)
In this example it's not really necessary to separate Report and CustomReport classes, but might be a good idea to keep the original class "clean".
Hope this helps :)

Should I extract values from Python dictionaries into object attributes?

I have a Python class that is initialized with a dictionary of settings, like this:
def __init__(self, settings):
self._settings = settings
Settings dictionary contains 50-100 different parameters that are used quite a lot in other methods:
def MakeTea(self):
tea = Tea()
if self._settings['use_sugar']:
tea.sugar_spoons = self._settings['spoons_of_sugar']
return tea
What I want to know is whether it makes sense to preload all the params into instance attributes like this:
def __init__(self, settings):
self._use_sugar = settings['use_sugar']
self._spoons_of_sugar = settings['spoons_of_sugar']
and use these attributes instead of looking up dictionary values every time I need them:
def MakeTea(self):
tea = Tea()
if self._use_sugar:
tea.sugar_spoons = _self._spoons_of_sugar
return tea
Now, I am fairly new to Python and I worked mostly with compiled languages where it really is a no-brainer: access to instance fields will be much faster than looking up values from any kind of hashtable-based structure. However, with Python being interpreted and all, I'm not sure that I'll have any significant performance gain because at the moment I have almost no knowledge of how Python interpreter works. For all I know, using attribute name in code may involve using some internal dictionaries of identifiers in interpreted environment, so I gain nothing.
So, the question: are there any significant performance benefits in extracting values from dictionary and putting them in instance attributes? Are there any other benefits or downsides of doing it? What's the good practice?
I strongly believe that this is an engineering decision rather than premature optimization. Also, I'm just curious and trying to write decent Python code, so the question seems valid to me whether I actually need those milliseconds or not.
You're comparing attribute access (self.setting) with attribute access (self.settings) plus a dictionary lookup (settings['setting']). Classes are actually implemented as dictionaries, so the problem reduces to two dictionary lookups vs. one. One lookup will be faster.
A simpler and faster way to copy an initialization dict than the one in the other answer is:
class Foobar(object):
def __init__(self, init_dict):
self.__dict__.update(init_dict)
However, I wouldn't do this for optimization purposes. It's both premature optimization (you don't know that you have a speed problem, or what your bottleneck is) and a micro-optimization (making an O(n2) algorithm O(n) will make more of a difference than removing an O(1) dictionary lookup from the original algorithm).
If somewhere, you're accessing one of these settings many, many times, just create a local reference to it, rather than polluting the namespace of Foobar instances with tons of settings.
These are two reasonable designs to consider, but you shouldn't choose one or the other for performance reasons. Instead of either one, I would probably create another object:
class Settings(object):
def __init__(self, init_dict):
self.__dict__.update(init_dict)
class Foobar(object):
def __init__(self, init_dict):
self.settings = Settings(init_dict)
just because I think self.settings.setting is nicer than self.settings['setting'] and it still keeps things organized.
This is a good use for a collections.namedtuple, if you know in advance what all the setting names are.
If you put them into the instance attributes then you'll be looking up your instance dictionary... so in the end you're just gonna be doing the same thing. So no real performance gain or loss.
Example:
>>> class Foobar(object):
def __init__(self, init_dict):
for arg in init_dict:
self.__setattr__(arg, init_dict[arg])
>>> foo = Foobar({'foobar': 'barfoo', 'shroobniz': 'foo'})
>>> print(foo.__dict__)
{'foobar': 'barfoo', 'shroobniz': 'foo'}
So if python looks up foo.__dict__ or foo._settings doesn't really make a difference.

Methods which return values vs methods which directly set attributes in Python

Which of the following classes would demonstrate the best way to set an instance attribute? Should they be used interchangeably based on the situation?
class Eggs(object):
def __init__(self):
self.load_spam()
def load_spam(self):
# Lots of code here
self.spam = 5
or
class Eggs(object):
def __init__(self):
self.spam = self.load_spam()
def load_spam(self):
# Lots of code here
return 5
I would prefer the second method.
Here's why:
Procedures with side effects tend to introduce temporal coupling. Simply put, changing the order in which you execute these procedures might break your code. Returning values and passing them to other methods in need of them makes inter-method communication explicit and thus easier to reason about and hard to forget/get in the wrong order.
Also returning a value makes it easier to test your method. With a return value, you can treat the enclosing object as a black box and ignore the internals of the object, which is generally a good thing. It makes your test code more robust.
I would indeed choose depending on the situation. If in doubt, I would choose the second version, because it's more explicit and load_spam as no (or at least less) side effects. Less side effects usually lead to code which is easier to maintain and easier to understand. As you know, there's not rule without exception. But that's the way how I would approach the problem.
If you are setting instance attributes the first method is more Pythonic. If you are calculating intermediate results then function calls are fine. Note that the second method is not only not Pythonic, it's misleading -- it's called load_spam, but it doesn't!

Have well-defined, narrowly-focused classes ... now how do I get anything done in my program?

I'm coding a poker hand evaluator as my first programming project. I've made it through three classes, each of which accomplishes its narrowly-defined task very well:
HandRange = a string-like object (e.g. "AA"). getHands() returns a list of tuples for each specific hand within the string:
[(Ad,Ac),(Ad,Ah),(Ad,As),(Ac,Ah),(Ac,As),(Ah,As)]
Translation = a dictionary that maps the return list from getHands to values that are useful for a given evaluator (yes, this can probably be refactored into another class).
{'As':52, 'Ad':51, ...}
Evaluator = takes a list from HandRange (as translated by Translator), enumerates all possible hand matchups and provides win % for each.
My question: what should my "domain" class for using all these classes look like, given that I may want to connect to it via either a shell UI or a GUI? Right now, it looks like an assembly line process:
user_input = HandRange()
x = Translation.translateList(user_input)
y = Evaluator.getEquities(x)
This smells funny in that it feels like it's procedural when I ought to be using OO.
In a more general way: if I've spent so much time ensuring that my classes are well defined, narrowly focused, orthogonal, whatever ... how do I actually manage work flow in my program when I need to use all of them in a row?
Thanks,
Mike
Don't make a fetish of object orientation -- Python supports multiple paradigms, after all! Think of your user-defined types, AKA classes, as building blocks that gradually give you a "language" that's closer to your domain rather than to general purpose language / library primitives.
At some point you'll want to code "verbs" (actions) that use your building blocks to perform something (under command from whatever interface you'll supply -- command line, RPC, web, GUI, ...) -- and those may be module-level functions as well as methods within some encompassing class. You'll surely want a class if you need multiple instances, and most likely also if the actions involve updating "state" (instance variables of a class being much nicer than globals) or if inheritance and/or polomorphism come into play; but, there is no a priori reason to prefer classes to functions otherwise.
If you find yourself writing static methods, yearning for a singleton (or Borg) design pattern, writing a class with no state (just methods) -- these are all "code smells" that should prompt you to check whether you really need a class for that subset of your code, or rather whether you may be overcomplicating things and should use a module with functions for that part of your code. (Sometimes after due consideration you'll unearth some different reason for preferring a class, and that's allright too, but the point is, don't just pick a class over a module w/functions "by reflex", without critically thinking about it!).
You could create a Poker class that ties these all together and intialize all of that stuff in the __init__() method:
class Poker(object):
def __init__(self, user_input=HandRange()):
self.user_input = user_input
self.translation = Translation.translateList(user_input)
self.evaluator = Evaluator.getEquities(x)
# and so on...
p = Poker()
# etc, etc...

Categories