Is it possible to create unique instances that have the same input? - python

I am working on code in Python that creates Compound objects (as in chemical compounds) that are be composed of Bond and Element objects. These Element objects are created with some inputs about them (Name, symbol, atomic number, atomic mass, etc). If I want to populate an array with Element objects, and I want the Element objects to be unique so I can do something to one and leave the rest unchanged, but they should all have the information related to a 'Hydrogen' element.
This question Python creating multiple instances for a single object/class leads me to believe that I should create sub-classes to Element - ie a Hydrogen object and a Carbon object, etc.
Is this doable without creating sub-classes, and if so how?

Design your object model based on making the concepts make sense, not based on what seems easiest to implement.
If, in your application, hydrogen atoms are a different type of thing than oxygen atoms, then you want to have a Hydrogen class and an Oxygen class, both probably subclasses of an Element class.*
If, on the other hand, there's nothing special about hydrogen or oxygen (e.g., if you don't want to distinguish between, say, oxygen and sulfur, since they both have the same valence), then you don't want subclasses.
Either way, you can create multiple instances. It's just a matter of whether you do it like this:
atoms = [Hydrogen(), Hydrogen(), Oxygen(), Oxygen()]
… or this:
atoms = [Element(1), Element(1), Element(-2), Element(-2)]
If your instances take a lot of arguments, and you want a lot of instances with the same arguments, repeating yourself like this can be a bad thing. But you can use a loop—either an explicit statement, or comprehension—to make it better:
for _ in range(50):
atoms.append(Element(group=16, valence=2, number=16, weight=32.066))
… or:
atoms.extend(Element(group=16, valence=2, number=16, weight=32.066)
for _ in range(50))
* Of course you may even want further subclasses, e.g., to distinguish Oxygen-16, Oxygen-17, Oxygen-18, or maybe even different mixtures, like the 99.762% Oxygen-16 with small amounts of -18 and tiny bits of the others that's standard in Earth's atmosphere, vs. the different mixture that was common millions of years ago…

Related

Methods vs. Properties

I'm working on the Python API for our physics simulation package (Mechanica, https://mechanica.readthedocs.io), and mulling over wether to use properties or methods.
Python has an established convention that of the objects in a dictionary are accessible by an items() method, i.e.
[i for i in d.items()]
I'm trying to adhere to this established convention for our objects, but it's sometimes awkward, for example, in our simulator, we have:
C.items() # get all the members of this
# type
n = a.neighbors() # get all the neighbors of an object
c = find_some_cluster() # some function to find a cluster
c.items() # get all the items in ths list
b = m.bind(pot, a, b) # create a bond between two objects
b.energy() # gets the current energy of the bond
b.half_life # gets / sets the half life
b.delete() # deletes the bond
b.items()[0], b.items()[1] # gets the pair of objects that this bond acts on.
b.dissociation_energy # bond breaking threshold
To access one of the of objects a that a bond is between, you currently have to call
b.items()[0]
I think that's awkward, and perhaps items would be better as a property. I just don't know though, because if I made it a property, that goes agains the some of the established Python convention. Python itself is pretty inconsistent where some things are a stand-alone function, i.e. len(a) for length of a list, but most other things are methods on objects.
Our bond object, some things on it are methods, like energy(), but others are properties, like half_life. I set these up this way, because half_life is actually a stateful property of the object, but energy() is a computed thing. I’m not sure if this makes the most sense to the end user though.
What do you guys think our items should be, a method or property. Is there any good rule to apply as to when something should be a method or property.

__repr__ for (large) composite objects

I would like to have informative representations for my composite objects (i.e., objects composed of other (potentially composite) objects). However, because my code fundamentally deals with high-precision numbers (please don't ask me why I don't just use doubles), I end up with representations like you see here: http://pastebin.com/jpLgAfxC. Would it just be better to just stick with the default __repr__?
Whether to have a verbose repr depends on what you want to accomplish. For complex or composite objects, I know which I'd prefer of the following:
Point(x=1.12, y=2.2, z=-1.9)
<__main__.Point object at 0x103011890>
They both tell me what type the object is, but only the first is clear about all of the (relevant) values involved, and avoids low-level information that is only relevant on the rarest of occasions.
I like to see the real values. But, yours is a special case, given that your values are so frightfully humongous:
72401317106217603290426741268390656010621951704689382948334809645
87850348552960901165648762842931879347325584704068956434195098288
38279057775096090002410493665682226331178331461681861612403032369
73237863637784679012984303024949059416189689048527978878840119376
5152408961823197987224502419157858495179687559851
That they cannot be useful for most development or debugging purposes. I'm sure there are times you need the full serialization--to send to and from files, for example. But those have to be fairly rare, no? I can't imagine you really remember all 309 digits, or can determine if the above number is the same as the one below on visual inspection:
72401317106217603290426741268390656010621951704689382948334809645
87850348552960901165648762842931879347325584704068956434195098288
38279057775096090002410493665682226331178331461681861612403032369
73327863637784679012984303024949059416189689048527978878840119376
5152408961823197987224502419157858495179687559851
They're not the same. But unless you're Spock or The Terminator, you wouldn't know that from a quick glance. (And actually, I've made it easier here, length-wrapping to avoid having to horizontally scroll.)
So I would recommend (massively) shortening their representation, to make the output more tractable. This is like printing out the entire chapter text every time you want to print a Chapter object. Overkill.
Instead, try something much shorter and easier to work with. Truncation and/or ellipsis are useful. e.g.
72401...59851
7240131710...
You can use the object id as well. If your high-precision type is HP, then:
HP(0x103011890)
At least then you will be able to tell them apart. One ugliness of using object ids, however, is that objects can be logically equivalent, but if you create multiple objects with the same logical value, they'd have different ids, thus appear different when they are not. You can get around that by creating your own short hash function. There's a bit of an art to hashing, but for reprs, even something simple would work. E.g.:
import binascii, struct
def shorthash(s):
"""
Given a Python value, produce a short alphanumeric hash that
helps identify it for debugging purposes. A riff on
http://stackoverflow.com/a/2511059/240490
Enhanced to remove trailing boilerplate, and to work
on either Python 2 or Python 3.
"""
hashbytes = binascii.b2a_base64(struct.pack('l', hash(s)))
return hashbytes.decode('utf-8').rstrip().rstrip("=")
Then define your repr in the high-precision class:
def __repr__(self):
clsname = self.__class__.__name__
return '{0}({1}).format(clsname, shorthash(self.value))
Where self.value is whatever local attribute, property, or method creates the multi-hundred-digit value. If you're subclassing int, this could be just self.
This gets you to:
HP(Tea+5MY0WwA)
The two massive, almost identical numbers above? Using this scheme, they render out to:
HP(XhkG0358Fx4)
HP(27CdIG5elhQ)
Which are obviously different. You can combine this with a bit of a value representation. E.g. a few alternatives:
HP(~7.24013e308 # XhkG0358Fx4)
HP(dig='72401...59851', ndigits=309, hash='XhkG0358Fx4')
You'll find these shorter values more useful in debugging contexts. You can, of course, keep around a method or property (e.g. .value, .digits, or .alldigits) for those case in which you need every last bit, but define the common case as something more easily consumed.
Thank you to Demian for the pointer to https://docs.python.org/2/reference/datamodel.html#object.repr, specifically:
This is typically used for debugging, so it is important that the
representation is information-rich and unambiguous.
http://pastebin.com/jpLgAfxC is probably the best possible __repr__ in this case.

Python - Class variables vs dictionary of values

Lets say I've got a class which represents an object that has many properties (simple data types like strings and integers). Should they be represented as instance variables or would the better "pythonic" be to put them into a dictionary?
For example:
class FruitBasket:
def __init__(self,apples, oranges, bananas, pears): #number of apples, oranges etc...
self.apples = apples
self.oranges = oranges
self.bananas = bananas
self.pears = pears
class FruitBasket:
def __init__(self, fruits): #fruits is a dictionary
self.fruits = fruits
My general philosophy is to use attributes if the set of items is more or less fixed, and use a dictionary if the set may change on an ad-hoc basis. If your FruitBasket is specifically made to contain apples, oranges, bananas and pears, then use attributes. If it may contain any random assortment of other things (e.g., you might sometimes throw in a pineapple or a raspberry), use a dictionary.
One reason can be sort of seem even in your example code. If you use attributes, you have to specify each one literally in the code (e.g., self.pears). Moreover, you often wind up doing what you did here, where you explicitly pass each item as an argument to __init__. This obviously won't work if you later decide to add new fruits. You could keep adding more arguments to __init__, but that quickly becomes unwieldy.
In addition, if you have a fixed set of items, you'll probably be accessing them individually. That is, if you know you only have apples, oranges, bananas, and pears, you can directly access them by name as you did here (self.apples, self.oranges, etc.). If you don't know ahead of time what fruits may be in the basket, you can't know what names to use a prior, so you'll typically process them by iterating over them. It is very easy to iterate over the items of a dictionary. By contrast, iterating over the attributes of an object is fraught with peril, since you can't easily distinguish the attributes that contain data that the object is "about" (e.g., self.pears) from those that pertain to the structure of the object itself (e.g., self.__init__, self.basketColor, self.basketSize, etc.).
In short, if you don't know ahead of time what will be in the basket, you'll want to iterate over its contents, and if you want to iterate over something's contents, it's best to use a type designed for containment (like a list or dict), because these types cleanly separate the container from its contents.
It depends what are you going to do with it. A dictionary is more flexible, as it easier to expand with new fruits, as you can iterate to get them all. Representing the fruits as members saves you from some typing, but you have to hard-code all the accesses to them.
A middle ground exists, and it is to use the pattern to syncronize both. Here there is some discussion about how to implement it.
And, before you write another class, remember: Stop writing classes.
You probably want to use a dictionary or you can use the new python 3.4 enum! If it should be an enum. https://docs.python.org/3/library/enum.html
from enum import Enum
animal = Enum('Animal', 'ant bee cat dog')
animal.ant
This and very similiar questions have been asked many times before, and there are various AttrDict implementations out there.
However, you should ask yourself if you have any reason at all not to use a dict. If you don't, then the pythonic thing to do is to use a dict, obviously. A class with no methods should probably not be a class at all. You should also consider the fact that not all valid dict keys are valid attribute names.

Python: iterating through a list of objects within a list of objects

I've made two classes called House and Window. I then made a list containing four Houses. Each instance of House has a list of Windows. I'm trying to iterate over the windows in each house and print it's ID. However, I seem to get some odd results :S I'd greatly appreciate any help.
#!/usr/bin/env python
# Minimal house class
class House:
ID = ""
window_list = []
# Minimal window class
class Window:
ID = ""
# List of houses
house_list = []
# Number of windows to build into each of the four houses
windows_per_house = [1, 3, 2, 1]
# Build the houses
for new_house in range(0, len(windows_per_house)):
# Append the new house to the house list
house_list.append(House())
# Give the new house an ID
house_list[new_house].ID = str(new_house)
# For each new house build some windows
for new_window in range(0, windows_per_house[new_house]):
# Append window to house's window list
house_list[new_house].window_list.append(Window())
# Give the window an ID
house_list[new_house].window_list[new_window].ID = str(new_window)
#Iterate through the windows of each house, printing house and window IDs.
for house in house_list:
print "House: " + house.ID
for window in house.window_list:
print " Window: " + window.ID
####################
# Desired output:
#
# House: 0
# Window: 0
# House: 1
# Window: 0
# Window: 1
# Window: 2
# House: 2
# Window: 0
# Window: 1
# House: 3
# Window: 0
####################
Currently you are using class attributes instead of instance attributes. Try changing your class definitions to the following:
class House:
def __init__(self):
self.ID = ""
self.window_list = []
class Window:
def __init__(self):
self.ID = ""
The way your code is now all instances of House are sharing the same window_list.
Here's the updated code.
# Minimal house class
class House:
def __init__(self, id):
self.ID = id
self.window_list = []
# Minimal window class
class Window:
ID = ""
# List of houses
house_list = []
# Number of windows to build into each of the for houses
windows_per_house = [1, 3, 2, 1]
# Build the houses
for new_house in range(len(windows_per_house)):
# Append the new house to the house list
house_list.append(House(str(new_house)))
# For each new house build some windows
for new_window in range(windows_per_house[new_house]):
# Append window to house's window list
house_list[new_house].window_list.append(Window())
# Give the window an ID
house_list[new_house].window_list[new_window].ID = str(new_window)
#Iterate through the windows of each house, printing house and window IDs.
for house in house_list:
print "House: " + house.ID
for window in house.window_list:
print " Window: " + window.ID
The actual problem is that the window_list attribute is mutable, so when the different instances are using it, they end up sharing the same one. By moving window_list into __init__ each instance gets its own.
C++, Java, C# etc. have this really strange behaviour regarding instance variables, whereby data (members, or fields, depending on which culture you belong to) that's described within a class {} block belongs to instances, while functions (well, methods, but C++ programmers seem to hate that term and say "member functions" instead) described within the same block belong to the class itself. Strange, and confusing, when you actually think about it.
A lot of people don't think about it; they just accept it and move on. But it actually causes confusion for a lot of beginners, who assume that everything within the block belongs to the instances. This leads to bizarre (to experienced programmers) questions and concerns about the per-instance overhead of these methods, and trouble wrapping their heads around the whole "vtable" implementation concept. (Of course, it's mostly the teachers' collective fault for failing to explain that vtables are just one implementation, and for failing to make clear distinctions between classes and instances in the first place.)
Python doesn't have this confusion. Since in Python, functions (including methods) are objects, it would be bizarrely inconsistent for the compiler to make a distinction like that. So, what happens in Python is what you should intuitively expect: everything within the class indented block belongs to the class itself. And, yes, Python classes are themselves objects as well (which gives a place to put those class attributes), and you don't have to jump through standard library hoops to use them reflectively. (The absence of manifest typing is quite liberating here.)
So how, I hear you protest, do we actually add any data to the instances? Well, by default, Python doesn't restrict you from adding anything to any instance. It doesn't even require you to make different instances of the same class contain the same attributes. And it certainly doesn't pre-allocate a single block of memory to contain all the object's attributes. (It would only be able to contain references, anyway, given that Python is a pure reference-semantics language, with no C# style value types or Java style primitives.)
But obviously, it's a good idea to do things that way, so the usual convention is "add all the data at the time that the instance is constructed, and then don't add any more (or delete any) attributes".
"When it's constructed"? Python doesn't really have constructors in the C++/Java/C# sense, because this absence of "reserved space" means there's no real benefit to considering "initialization" as a separate task from ordinary assignment - except of course the benefit of initialization being something that automatically happens to a new object.
So, in Python, our closest equivalent is the magic __init__ method that is automatically called upon newly-created instances of the class. (There is another magic method called __new__, which behaves more like a constructor, in the sense that it's responsible for the actual creation of the object. However, in nearly every case we just want to delegate to the base object __new__, which calls some built-in logic to basically give us a little pointer-ball that can serve as an object, and point it to a class definition. So there's no real point in worrying about __new__ in almost every case. It's really more analogous to overloading the operator new for a class in C++.) In the body of this method (there are no C++-style initialization lists, because there is no pre-reserved data to initialize), we set initial values for attributes (and possibly do other work), based on the parameters we're given.
Now, if we want to be a little bit neater about things, or efficiency is a real concern, there is another trick up our sleeves: we can use the magic __slots__ attribute of the class to specify class attribute names. This is a list of strings, nothing fancy. However, this still doesn't pre-initialize anything; an instance doesn't have an attribute until you assign it. This just prevents you from adding attributes with other names. You can even still delete attributes from an object whose class has specified __slots__. All that happens is that the instances are given a different internal structure, to optimize memory usage and attribute lookup.
The __slots__ usage requires that we derive from the built-in object type, which we should do anyway (although we aren't required in Python 2.x, this is intended only for backwards-compatibility purposes).
Ok, so now we can make the code work. But how do we make it right for Python?
First off, just as with any other language, constantly commenting to explain already-self-explanatory things is a bad idea. It distracts the user, and doesn't really help you as a learner of the language, either. You're supposed to know what a class definition looks like, and if you need a comment to tell you that a class definition is a class definition, then reading the code comments isn't the kind of help you need.
With this whole "duck typing" thing, it's poor form to include data type names in variable (or attribute) names. You're probably protesting, "but how am I supposed to keep track of the type otherwise, without the manifest type declaration"? Don't. The code that uses your list of windows doesn't care that your list of windows is a list of windows. It just cares that it can iterate over the list of windows, and thus obtain values that can be used in certain ways that are associated with windows. That's how duck typing works: stop thinking about what the object is, and worry about what it can do.
You'll notice in the code below that I put the string conversion code into the House and Window constructors themselves. This serves as a primitive form of type-checking, and also makes sure that we can't forget to do the conversion. If someone tries to create a House with an ID that can't even be converted to a string, then it will raise an exception. Easier to ask for forgiveness than permission, after all. (Note that you actually have to go out of your way a bit in Python to create
As for the actual iteration... in Python, we iterate by actually iterating over the objects in a container. Java and C# have this concept as well, and you can get at it with the C++ standard library too (although a lot of people don't bother). We don't iterate over indices, because it's a useless and distracting indirection. We don't need to number our "windows_per_house" values in order to use them; we just need to look at each value in turn.
How about the ID numbers, I hear you ask? Simple. Python provides us with a function called 'enumerate', which gives us (index, element) pairs given an input sequence of elements). It's clean, it lets us be explicit about our need for the index to solve the problem (and the purpose of the index), and it's a built-in that doesn't need to be interpreted like the rest of the Python code, so it doesn't incur all that much overhead. (When memory is a concern, it's possible to use a lazy-evaluation version instead.)
But even then, iterating to create each house, and then manually appending each one to an initially-empty list, is too low-level. Python knows how to construct a list of values; we don't need to tell it how. (And as a bonus, we typically get better performance by letting it do that part itself, since the actual looping logic can now be done internally, in native C.) We instead describe what we want in the list, with a list comprehension. We don't have to walk through the steps of "take each window-count in turn, make the corresponding house, and add it to the list", because we can say "a list of houses with the corresponding window-count for each window-count in this input list" directly. That's arguably clunkier in English, but much cleaner in a programming language like Python, because you can skip a bunch of the little words, and you don't have to expend effort to describe the initial list, or the act of appending the finished houses to the list. You don't describe the process at all, just the result. Made-to-order.
Finally, as a general programming concept, it makes sense, whenever possible, to delay the construction of an object until we have everything ready that's needed for that object's existence. "Two-phase construction" is ugly. So we make the windows for a house first, and then the house (using those windows). With list comprehensions, this is simple: we just nest the list comprehensions.
class House(object):
__slots__ = ['ID', 'windows']
def __init__(self, id, windows):
self.ID = str(id)
self.windows = windows
class Window(object):
__slots__ = ['ID']
def __init__(self, id):
self.ID = str(id)
windows_per_house = [1, 3, 2, 1]
# Build the houses.
houses = [
House(house_id, [Window(window_id) for window_id in range(window_count)])
for house_id, window_count in enumerate(windows_per_house)
]
# See how elegant the list comprehensions are?
# If you didn't quite follow the logic there, please try **not**
# to imagine the implicitly-defined process as you trace through it.
# (Pink elephants, I know, I know.) Just understand what is described.
# And now we can iterate and print just as before.
for house in houses:
print "House: " + house.ID
for window in house.windows:
print " Window: " + window.ID
Apart from some indentation errors, you're assigning the IDs and window_lists to the class and not the instances.
You want something like
class House():
def __init__(self, ID):
self.ID = ID
self.window_list = []
etc.
Then, you can do house_list.append(House(str(newHouse))) and so on.

Have well-defined, narrowly-focused classes ... now how do I get anything done in my program?

I'm coding a poker hand evaluator as my first programming project. I've made it through three classes, each of which accomplishes its narrowly-defined task very well:
HandRange = a string-like object (e.g. "AA"). getHands() returns a list of tuples for each specific hand within the string:
[(Ad,Ac),(Ad,Ah),(Ad,As),(Ac,Ah),(Ac,As),(Ah,As)]
Translation = a dictionary that maps the return list from getHands to values that are useful for a given evaluator (yes, this can probably be refactored into another class).
{'As':52, 'Ad':51, ...}
Evaluator = takes a list from HandRange (as translated by Translator), enumerates all possible hand matchups and provides win % for each.
My question: what should my "domain" class for using all these classes look like, given that I may want to connect to it via either a shell UI or a GUI? Right now, it looks like an assembly line process:
user_input = HandRange()
x = Translation.translateList(user_input)
y = Evaluator.getEquities(x)
This smells funny in that it feels like it's procedural when I ought to be using OO.
In a more general way: if I've spent so much time ensuring that my classes are well defined, narrowly focused, orthogonal, whatever ... how do I actually manage work flow in my program when I need to use all of them in a row?
Thanks,
Mike
Don't make a fetish of object orientation -- Python supports multiple paradigms, after all! Think of your user-defined types, AKA classes, as building blocks that gradually give you a "language" that's closer to your domain rather than to general purpose language / library primitives.
At some point you'll want to code "verbs" (actions) that use your building blocks to perform something (under command from whatever interface you'll supply -- command line, RPC, web, GUI, ...) -- and those may be module-level functions as well as methods within some encompassing class. You'll surely want a class if you need multiple instances, and most likely also if the actions involve updating "state" (instance variables of a class being much nicer than globals) or if inheritance and/or polomorphism come into play; but, there is no a priori reason to prefer classes to functions otherwise.
If you find yourself writing static methods, yearning for a singleton (or Borg) design pattern, writing a class with no state (just methods) -- these are all "code smells" that should prompt you to check whether you really need a class for that subset of your code, or rather whether you may be overcomplicating things and should use a module with functions for that part of your code. (Sometimes after due consideration you'll unearth some different reason for preferring a class, and that's allright too, but the point is, don't just pick a class over a module w/functions "by reflex", without critically thinking about it!).
You could create a Poker class that ties these all together and intialize all of that stuff in the __init__() method:
class Poker(object):
def __init__(self, user_input=HandRange()):
self.user_input = user_input
self.translation = Translation.translateList(user_input)
self.evaluator = Evaluator.getEquities(x)
# and so on...
p = Poker()
# etc, etc...

Categories