Python officially recognizes namespaces as a "honking great idea" that we should "do more of". One nice thing about namespaces is their hierarchical presentation that organizes code into related parts. Is there an elegant way to organize python class methods into related parts, much as hierarchical namespaces are organized — especially for the purposes of tab-completion?
Some of my python classes cannot be split up into smaller classes, but have many methods attached to them (easily over a hundred). I also find (and my code's users tell me) that the easiest way to find useful methods is to use tab-completion. But with so many methods, this becomes unwieldy, as an enormous list of options is presented — and usually organized alphabetically, which means that closely related methods may be located in completely different parts of this massive list.
Typically, there are very distinct groups of closely related methods. For example, I have one class in which almost all of the methods fall into one of four groups:
io
statistics
transformations
symmetries
And the io group might have read and write subgroups, where there are different options for the file type to read or write, and then some additional methods involved in looking at the metadata for example. To a small extent, I can address this problem using underscores in my method names. For example, I might have methods like
myobject.io_read_from_csv
myobject.io_write_to_csv
This helps with the classification, but is ugly and still leads to unwieldy tab-completion lists. I would prefer it if the first tab-completion list just had the four options listed above, then when one of those options is selected, additional options would be presented with the next tab.
For a slightly more concrete example, here's a partial list of the hierarchy that I have in mind for my class:
myobject.io
myobject.io.read
myobject.io.read.csv
myobject.io.read.h5
myobject.io.read.npy
myobject.io.write
myobject.io.write.csv
myobject.io.write.h5
myobject.io.write.npy
myobject.io.parameters
myobject.io.parameters.from_csv_header
myobject.io.parameters.from_h5_attributes
...
...
myobject.statistics
myobject.statistics.max
myobject.statistics.max_time
myobject.statistics.norm
...
myobject.transformations
myobject.transformations.rotation
myobject.transformations.boost
myobject.transformations.spatial_translation
myobject.transformations.time_translation
myobject.transformations.supertranslation
...
myobject.symmetries
myobject.symmetries.parity
myobject.symmetries.parity.conjugate
myobject.symmetries.parity.symmetric_part
myobject.symmetries.parity.antisymmetric_part
myobject.symmetries.parity.violation
myobject.symmetries.parity.violation_normalized
myobject.symmetries.xreflection
myobject.symmetries.xreflection.conjugate
myobject.symmetries.xreflection.symmetric_part
...
...
...
One way I can imagine solving this problem is to create classes like IO, Statistics, etc., within my main MyClass class whose sole purpose is to store a reference to myobject and provide the methods that it needs. The main class would then have #property methods that just return the instances of those lower-lever classes, for which tab-completion should then work. Does this make sense? Would it work at all to provide tab-completion in ipython, for example? Would this lead to circular-reference problems? Is there a better way?
It looks like my naive suggestion of defining classes within the class does indeed work with ipython's tab-completion and without any circularity problems.
Here's the proof-of-concept code:
class A(object):
class _B(object):
def __init__(self, a):
self._owner = a
def calculate(self, y):
return y * self._owner.x
def __init__(self, x):
self.x = x
self._b = _B(self)
#property
def b(self):
return self._b
(In fact, it would be even simpler if I used self.b = _B(self), and I could skip the property, but I like this because it impedes overwriting b from outside the class. Plus this shows that this more complicated case still works.)
So if I create a = A(1.2), for example, I can hit a.<TAB> and get b as the completion, then a.b.<TAB> suggests calculate as the completion. I haven't run into any problems with this structure in my brief tests so far, and the changes to my code aren't very big — just adding ._owner into a lot of the methods code.
Related
So, I am implementing a data tree in python that represents mathematic expressions much like this binary expression tree.
Each node represents an operation (+, *, exp(), ...) and each leaf represents a number or variable. Therefore I created a module Expression.py, that contains a parent class Node and child classes for mathematical operations.
Now, as the project becomes more complex, I am starting to implement more and more node types to cover more operations and each child class of Node is starting to have quite a lot methods for tasks as term simplification etc.
So far, I implemented all of these child nodes in the Expression.py file. But it is by now a 500+ lines file and I am not even done yet. I tried to split it up by putting each child class into a single file (Java style) and merging all of them in one package, which would match my understanding of correct structure. But this implementation is giving me problems as the different modules like Addition.py and Multiplication.py still reference each other. E.g. an Addition-object's method might return a Multiplication-object and vice versa.
My question is: How do you structure such a project? How to structure many related child classes that reference each other besides putting them in a single huge file?
If I arrange them in a package, how would I import them properly? And how would I reference them properly?
Edit:
Ok, let me be more specific, this is some sample code:
class Node():
def __init__(self):
pass
def derive(self):
pass
class Sine(Node):
def __init__(self, arg):
self.arg = arg
def derive(self):
return Cosine(self.arg)
class Cosine(Node):
def __init__(self, arg):
self.arg = arg
def derive(self):
return Multiplication(Num(-1), Sine(self.arg))
class Multiplication:
...
As you see the classes Sine and Cosine have a circular dependency, that I cannot (knowingly) split into two seperate files. Though I do not want to put thousands of lines of child classes into one file. This is only sample code. The classes actually consist of way more lines.
As you don't show any code, it is only possible to give a general answer. First of all, between putting everything in one file and each class in a single file, you have the option to put groups of classes in modules. I doubt that all your classes depend on each other. So you need to get a better understanding of the structure of your dependencies. If you understand the structure, Python will offer you the tools to express this structure in code. Mixins, decorators, meta classes, ... are powerful tools to express complex structures without violating DRY. So my advice would be: Try to understand your structure better!
My code base is in Python. Let's say I have a fairly generic class called Report. It takes a large number of parameters
class Report(object):
def __init__(self, title, data_source, columns, format, ...many more...)
And there are many many instantiation of the Report. These instantiation are not entirely unrelated. Many reports share similar set of parameters, differ only with minor variation, like having the same data_source and columns but with a different title.
Rather than duplicating the parameters, some programming construct is applied to make expression this structure easier. And I'm trying to find some help to sort my head to identify some idiom or design pattern for this.
If a subcategory of report need some extra processing code, subclass seems to be a good choice. Say we have a subcategory of ExpenseReport.
class ExpenseReport(Report):
def __init__(self, title, ... a small number of parameters ...)
# some parameters are fixed, while others are specific to this instance
super(ExpenseReport,self).__init__(
title,
EXPENSE_DATA_SOURCE,
EXPENSE_COLUMNS,
EXPENSE_FORMAT,
... a small number of parameters...)
def processing(self):
... extra processing specific to ExpenseReport ...
But in a lot of cases, the subcategory merely fix some parameters without any extra processing. It could easily be done with partial function.
ExpenseReport = functools.partial(Report,
data_source = EXPENSE_DATA_SOURCE,
columns = EXPENSE_COLUMNS,
format = EXPENSE_FORMAT,
)
And in some case, there isn't even any difference. We simply need 2 copies of the same object to be used in different environment, like to be embedded in different page.
expense_report = Report("Total Expense", EXPENSE_DATA_SOURCE, ...)
page1.add(expense_report)
...
page2.add(clone(expense_report))
And in my code base, an ugly technique is used. Because we need 2 separate instances for each page, and because we don't want to duplicate the code with long list of parameter that creates report, we just clone (deepcopy in Python) the report for page 2. Not only is the need of cloning not apparent, neglecting to clone the object and instead sharing one instance creates a lot of hidden problem and subtle bugs in our system.
Is there any guidance in this situation? Subclass, partial function or other idiom? My desire is for this construct to be light and transparent. I'm slight wary of subclassing because it is likely to result in a jungle of subclass. And it induces programmer to add special processing code like what I have in ExpenseReport. If there is a need I rather analyze the code to see if it can be generalized and push to the Report layer. So that Report becomes more expressive without needing special processing in lower layers.
Additional Info
We do use keyword parameter. The problem is more in how to manage and organize the instantiation. We have a large number of instantiation with common patterns:
expense_report = Report("Expense", data_source=EXPENSE, ..other common pattern..)
expense_report_usd = Report("USD Expense", data_source=EXPENSE, format=USD, ..other common pattern..)
expense_report_euro = Report("Euro Expense", data_source=EXPENSE, format=EURO, ..other common pattern..)
...
lot more reports
...
page1.add(expense_report_usd)
page2.add(expense_report_usd) # oops, page1 and page2 shared the same instance?!
...
lots of pages
...
Why don't you just use keyword arguments and collect them all into a dict:
class Report(object):
def __init__(self, **params):
self.params = params
...
I see no reason why you shouldn't just use a partial function.
If your main problem is common arguments in class constructos, possible solution is to write something like:
common_arguments = dict(arg=value, another_arg=anoter_value, ...)
expense_report = Report("Expense", data_source=EXPENSE, **common_arguments)
args_for_shared_usd_instance = dict(title="USD Expense", data_source=EXPENSE, format=USD)
args_for_shared_usd_instance.update(common_arguments)
expense_report_usd = Report(**args_for_shared_usd_instance)
page1.add(Report(**args_for_shared_usd_instance))
page2.add(Report(**args_for_shared_usd_instance))
Better naming, can make it convenient. Maybe there is better design solution.
I found some information myself.
I. curry -- associating parameters with a function « Python recipes « ActiveState Code
http://code.activestate.com/recipes/52549-curry-associating-parameters-with-a-function/
See the entire dicussion. Nick Perkins' comment on 'Lightweight' subclasses is similar to what I've described.
II. PEP 309 -- Partial Function Application
http://www.python.org/dev/peps/pep-0309/
The question is quite old, but this might still help someone who stumbles onto it...
I made a small library called classical to simplify class inheritance cases like this (Python 3 only).
Simple example:
from classical.descriptors import ArgumentedSubclass
class CustomReport(Report):
Expense = ArgumentedSubclass(data_source=EXPENSE, **OTHER_EXPENSE_KWARGS)
Usd = ArgumentedSubclass(format=USD)
Euro = ArgumentedSubclass(format=EURO)
PatternN = ArgumentedSubclass(**PATTERN_N_KWARGS)
PatternM = ArgumentedSubclass(**PATTERN_M_KWARGS)
# Now you can chain these in any combination (and with additional arguments):
my_report_1 = CustomReport.Expense.Usd(**kwargs)
my_report_2 = CustomReport.Expense.Euro(**kwargs)
my_report_3 = CustomReport.Expense.PatternM.PatternN(**kwargs)
In this example it's not really necessary to separate Report and CustomReport classes, but might be a good idea to keep the original class "clean".
Hope this helps :)
I have a Python class that is initialized with a dictionary of settings, like this:
def __init__(self, settings):
self._settings = settings
Settings dictionary contains 50-100 different parameters that are used quite a lot in other methods:
def MakeTea(self):
tea = Tea()
if self._settings['use_sugar']:
tea.sugar_spoons = self._settings['spoons_of_sugar']
return tea
What I want to know is whether it makes sense to preload all the params into instance attributes like this:
def __init__(self, settings):
self._use_sugar = settings['use_sugar']
self._spoons_of_sugar = settings['spoons_of_sugar']
and use these attributes instead of looking up dictionary values every time I need them:
def MakeTea(self):
tea = Tea()
if self._use_sugar:
tea.sugar_spoons = _self._spoons_of_sugar
return tea
Now, I am fairly new to Python and I worked mostly with compiled languages where it really is a no-brainer: access to instance fields will be much faster than looking up values from any kind of hashtable-based structure. However, with Python being interpreted and all, I'm not sure that I'll have any significant performance gain because at the moment I have almost no knowledge of how Python interpreter works. For all I know, using attribute name in code may involve using some internal dictionaries of identifiers in interpreted environment, so I gain nothing.
So, the question: are there any significant performance benefits in extracting values from dictionary and putting them in instance attributes? Are there any other benefits or downsides of doing it? What's the good practice?
I strongly believe that this is an engineering decision rather than premature optimization. Also, I'm just curious and trying to write decent Python code, so the question seems valid to me whether I actually need those milliseconds or not.
You're comparing attribute access (self.setting) with attribute access (self.settings) plus a dictionary lookup (settings['setting']). Classes are actually implemented as dictionaries, so the problem reduces to two dictionary lookups vs. one. One lookup will be faster.
A simpler and faster way to copy an initialization dict than the one in the other answer is:
class Foobar(object):
def __init__(self, init_dict):
self.__dict__.update(init_dict)
However, I wouldn't do this for optimization purposes. It's both premature optimization (you don't know that you have a speed problem, or what your bottleneck is) and a micro-optimization (making an O(n2) algorithm O(n) will make more of a difference than removing an O(1) dictionary lookup from the original algorithm).
If somewhere, you're accessing one of these settings many, many times, just create a local reference to it, rather than polluting the namespace of Foobar instances with tons of settings.
These are two reasonable designs to consider, but you shouldn't choose one or the other for performance reasons. Instead of either one, I would probably create another object:
class Settings(object):
def __init__(self, init_dict):
self.__dict__.update(init_dict)
class Foobar(object):
def __init__(self, init_dict):
self.settings = Settings(init_dict)
just because I think self.settings.setting is nicer than self.settings['setting'] and it still keeps things organized.
This is a good use for a collections.namedtuple, if you know in advance what all the setting names are.
If you put them into the instance attributes then you'll be looking up your instance dictionary... so in the end you're just gonna be doing the same thing. So no real performance gain or loss.
Example:
>>> class Foobar(object):
def __init__(self, init_dict):
for arg in init_dict:
self.__setattr__(arg, init_dict[arg])
>>> foo = Foobar({'foobar': 'barfoo', 'shroobniz': 'foo'})
>>> print(foo.__dict__)
{'foobar': 'barfoo', 'shroobniz': 'foo'}
So if python looks up foo.__dict__ or foo._settings doesn't really make a difference.
I'm coding a poker hand evaluator as my first programming project. I've made it through three classes, each of which accomplishes its narrowly-defined task very well:
HandRange = a string-like object (e.g. "AA"). getHands() returns a list of tuples for each specific hand within the string:
[(Ad,Ac),(Ad,Ah),(Ad,As),(Ac,Ah),(Ac,As),(Ah,As)]
Translation = a dictionary that maps the return list from getHands to values that are useful for a given evaluator (yes, this can probably be refactored into another class).
{'As':52, 'Ad':51, ...}
Evaluator = takes a list from HandRange (as translated by Translator), enumerates all possible hand matchups and provides win % for each.
My question: what should my "domain" class for using all these classes look like, given that I may want to connect to it via either a shell UI or a GUI? Right now, it looks like an assembly line process:
user_input = HandRange()
x = Translation.translateList(user_input)
y = Evaluator.getEquities(x)
This smells funny in that it feels like it's procedural when I ought to be using OO.
In a more general way: if I've spent so much time ensuring that my classes are well defined, narrowly focused, orthogonal, whatever ... how do I actually manage work flow in my program when I need to use all of them in a row?
Thanks,
Mike
Don't make a fetish of object orientation -- Python supports multiple paradigms, after all! Think of your user-defined types, AKA classes, as building blocks that gradually give you a "language" that's closer to your domain rather than to general purpose language / library primitives.
At some point you'll want to code "verbs" (actions) that use your building blocks to perform something (under command from whatever interface you'll supply -- command line, RPC, web, GUI, ...) -- and those may be module-level functions as well as methods within some encompassing class. You'll surely want a class if you need multiple instances, and most likely also if the actions involve updating "state" (instance variables of a class being much nicer than globals) or if inheritance and/or polomorphism come into play; but, there is no a priori reason to prefer classes to functions otherwise.
If you find yourself writing static methods, yearning for a singleton (or Borg) design pattern, writing a class with no state (just methods) -- these are all "code smells" that should prompt you to check whether you really need a class for that subset of your code, or rather whether you may be overcomplicating things and should use a module with functions for that part of your code. (Sometimes after due consideration you'll unearth some different reason for preferring a class, and that's allright too, but the point is, don't just pick a class over a module w/functions "by reflex", without critically thinking about it!).
You could create a Poker class that ties these all together and intialize all of that stuff in the __init__() method:
class Poker(object):
def __init__(self, user_input=HandRange()):
self.user_input = user_input
self.translation = Translation.translateList(user_input)
self.evaluator = Evaluator.getEquities(x)
# and so on...
p = Poker()
# etc, etc...
This question is in continuation to my previous question, in which I asked about passing around an ElementTree.
I need to read the XML files only and to solve this, I decided to create a global ElementTree and then parse it wherever required.
My question is:
Is this an acceptable practice? I heard global variables are bad. If I don't make it global, I was suggested to make a class. But do I really need to create a class? What benefits would I have from that approach. Note that I would be handling only one ElementTree instance per run, the operations are read-only. If I don't use a class, how and where do I declare that ElementTree so that it available globally? (Note that I would be importing this module)
Please answer this question in the respect that I am a beginner to development, and at this stage I can't figure out whether to use a class or just go with the functional style programming approach.
There are a few reasons that global variables are bad. First, it gets you in the habit of declaring global variables which is not good practice, though in some cases globals make sense -- PI, for instance. Globals also create problems when you on purpose or accidentally re-use the name locally. Or worse, when you think you're using the name locally but in reality you're assigning a new value to the global variable. This particular problem is language dependent, and python handles it differently in different cases.
class A:
def __init__(self):
self.name = 'hi'
x = 3
a = A()
def foo():
a.name = 'Bedevere'
x = 9
foo()
print x, a.name #outputs 3 Bedevere
The benefit of creating a class and passing your class around is you will get a defined, constant behavior, especially since you should be calling class methods, which operate on the class itself.
class Knights:
def __init__(self, name='Bedevere'):
self.name = name
def knight(self):
self.name = 'Sir ' + self.name
def speak(self):
print self.name + ":", "Run away!"
class FerociousRabbit:
def __init__(self):
self.death = "awaits you with sharp pointy teeth!"
def speak(self):
print "Squeeeeeeee!"
def cave(thing):
thing.speak()
if isinstance(thing, Knights):
thing.knight()
def scene():
k = Knights()
k2 = Knights('Launcelot')
b = FerociousRabbit()
for i in (b, k, k2):
cave(i)
This example illustrates a few good principles. First, the strength of python when calling functions - FerociousRabbit and Knights are two different classes but they have the same function speak(). In other languages, in order to do something like this, they would at least have to have the same base class. The reason you would want to do this is it allows you to write a function (cave) that can operate on any class that has a 'speak()' method. You could create any other method and pass it to the cave function:
class Tim:
def speak(self):
print "Death awaits you with sharp pointy teeth!"
So in your case, when dealing with an elementTree, say sometime down the road you need to also start parsing an apache log. Well if you're doing purely functional program you're basically hosed. You can modify and extend your current program, but if you wrote your functions well, you could just add a new class to the mix and (technically) everything will be peachy keen.
Pragmatically, is your code expected to grow? Even though people herald OOP as the right way, I found that sometimes it's better to weigh cost:benefit(s) whenever you refactor a piece of code. If you are looking to grow this, then OOP is a better option in that you can extend and customise any future use case, while saving yourself from unnecessary time wasted in code maintenance. Otherwise, if it ain't broken, don't fix it, IMHO.
I generally find myself regretting it when I give in to the temptation to give a module, for example, a load_file() method that sets a global that the module's other functions can then use to find the file they're supposed to be talking about. It makes testing far more difficult, for example, and as soon as I need two XML files there is a problem. Plus, every single function needs to check whether the file's there and give an error if it's not.
If I want to be functional, I simply therefore have every function take the XML file as an argument.
If I want to be object oriented, I'll have a MyXMLFile class whose methods can just look at self.xmlfile or whatever.
The two approaches are more or less equivalent when there's just one single thing, like a file, to be passed around; but when the number of things in the "state" becomes larger than a few, then I find classes simpler because I can stick all of those things in the class.
(Am I answering your question? I'm still a big vague on what kind of answer you want.)