Related
Tl;dr is bold-faced text.
I'm working with an image dataset that comes with boolean "one-hot" image annotations (Celeba to be specific). The annotations encode facial features like bald, male, young. Now I want to make a custom one-hot list (to test my GAN model). I want to provide a literate interface. I.e., rather than specifying features[12]=True knowing that 12 - counting from zero - corresponds to the male feature, I want something like features[male]=True or features.male=True.
Suppose the header of my .txt file is
Arched_Eyebrows Attractive Bags_Under_Eyes Bald Bangs Chubby Male Wearing_Necktie Young
and I want to codify Young, Bald, and Chubby. The expected output is
[ 0. 0. 0. 1. 0. 1. 0. 0. 1.]
since Bald is the fourth entry of the header, Chubby is the sixth, and so on. What is the clearest way to do this without expecting a user to know Bald is the fourth entry, etc.?
I'm looking for a Pythonic way, not necessarily the fastest way.
Ideal Features
In rough order of importance:
A way to accomplish my stated goal that is already standard in the Python community will take precedence.
A user/programmer should not need to count to an attribute in the .txt header. This is the point of what I'm trying to design.
A user should not be expected to have non-standard libraries like aenum.
A user/programmer should not need to reference the .txt header for attribute names/available attributes. One example: if a user wants to specify the gender attribute but does not know whether to use male or female, it should be easy to find out.
A user/programmer should be able to find out the available attributes via documentation (ideally generated by Sphinx api-doc). That is, the point 4 should be possible reading as little code as possible. Attribute exposure with dir() sufficiently satisfies this point.
The programmer should find the indexing tool natural. Specifically, zero-indexing should be preferred over subtracting from one-indexing.
Between two otherwise completely identical solutions, one with better performance would win.
Examples:
I'm going to compare and contrast the ways that immediately came to my mind. All examples use:
import numpy as np
header = ("Arched_Eyebrows Attractive Bags_Under_Eyes "
"Bald Bangs Chubby Male Wearing_Necktie Young")
NUM_CLASSES = len(header.split()) # 9
1: Dict Comprehension
Obviously we could use a dictionary to accomplish this:
binary_label = np.zeros([NUM_CLASSES])
classes = {head: idx for (idx, head) in enumerate(header.split())}
binary_label[[classes["Young"], classes["Bald"], classes["Chubby"]]] = True
print(binary_label)
For what it's worth, this has the fewest lines of code and is the only one that doesn't rely on a standard library over builtins. As for negatives, it isn't exactly self-documenting. To see the available options, you must print(classes.keys()) - it's not exposed with dir(). This borders on not satisfying feature 5 because it requires a user to know classes is a dict to exposure features AFAIK.
2: Enum:
Since I'm learning C++ right now, Enum is the first thing that came to mind:
import enum
binary_label = np.zeros([NUM_CLASSES])
Classes = enum.IntEnum("Classes", header)
features = [Classes.Young, Classes.Bald, Classes.Chubby]
zero_idx_feats = [feat-1 for feat in features]
binary_label[zero_idx_feats] = True
print(binary_label)
This gives dot notation and the image options are exposed with dir(Classes). However, enum uses one-indexing by default (the reason is documented). The work-around makes me feel like enum is not the Pythonic way to do this, and entirely fails to satisfy feature 6.
3: Named Tuple
Here's another one out of the standard Python library:
import collections
binary_label = np.zeros([NUM_CLASSES])
clss = collections.namedtuple(
"Classes", header)._make(range(NUM_CLASSES))
binary_label[[clss.Young, clss.Bald, clss.Chubby]] = True
print(binary_label)
Using namedtuple, we again get dot notation and self-documentation with dir(clss). But, the namedtuple class is heavier than enum. By this I mean, namedtuple has functionality I do not need. This solution appears to be a leader among my examples, but I do not know if it satisfies feature 1 or if an alternative could "win" via feature 7.
4: Custom Enum
I could really break my back:
binary_label = np.zeros([NUM_CLASSES])
class Classes(enum.IntEnum):
Arched_Eyebrows = 0
Attractive = 1
Bags_Under_Eyes = 2
Bald = 3
Bangs = 4
Chubby = 5
Male = 6
Wearing_Necktie = 7
Young = 8
binary_label[
[Classes.Young, Classes.Bald, Classes.Chubby]] = True
print(binary_label)
This has all the advantages of Ex. 2. But, it comes with obvious the obvious drawbacks. I have to write out all the features (there's 40 in the real dataset) just to zero-index! Sure, this is how to make an enum in C++ (AFAIK), but it shouldn't be necessary in Python. This is a slight failure on feature 6.
Summary
There are many ways to accomplish literate zero-indexing in Python. Would you provide a code snippet of how you would accomplish what I'm after and tell me why your way is right?
(edit:) Or explain why one of my examples is the right tool for the job?
Status Update:
I'm not ready to accept an answer yet in case anyone wants to address the following feedback/update, or any new solution appears. Maybe another 24 hours? All the responses have been helpful, so I upvoted everyone's so far. You may want to look over this repo I'm using to test solutions. Feel free to tell me if my following remarks are (in)accurate or unfair:
zero-enum:
Oddly, Sphinx documents this incorrectly (one-indexed in docs), but it does document it! I suppose that "issue" doesn't fail any ideal feature.
dotdict:
I feel that Map is overkill, but dotdict is acceptable. Thanks to both answerers that got this solution working with dir(). However, it doesn't appear that it "works seamlessly" with Sphinx.
Numpy record:
As written, this solution takes significantly longer than the other solutions. It comes in at 10x slower than a namedtuple (fastest behind pure dict) and 7x slower than standard IntEnum (slowest behind numpy record). That's not drastic at current scale, nor a priority, but a quick Google search indicates np.in1d is in fact slow. Let's stick with
_label = np.zeros([NUM_CLASSES])
_label[[header_rec[key].item() for key in ["Young", "Bald", "Chubby"]]] = True
unless I've implemented something wrong in the linked repo. This brings the execution speed into a range that compares with the other solutions. Again, no Sphinx.
namedtuple (and rassar's critiques)
I'm not convinced of your enum critique. It seems to me that you believe I'm approaching the problem wrong. It's fine to call me out on that, but I don't see how using the namedtuple is fundamentally different from "Enum [which] will provide separate values for each constant." Have I misunderstood you?
Regardless, namedtuple appears in Sphinx (correctly numbered, for what it's worth). On the Ideal Features list, this chalks up identically to zero-enum and profiles ahead of zero-enum.
Accepted Rationale
I accepted the zero-enum answer because the answer gave me the best challenger for namedtuple. By my standards, namedtuple is marginally the best solution. But salparadise wrote the answer that helped me feel confident in that assessment. Thanks to all who answered.
How about a factory function to create a zero indexed IntEnum since that is the object that suits your needs, and Enum provides flexibility in construction:
from enum import IntEnum
def zero_indexed_enum(name, items):
# splits on space, so it won't take any iterable. Easy to change depending on need.
return IntEnum(name, ((item, value) for value, item in enumerate(items.split())))
Then:
In [43]: header = ("Arched_Eyebrows Attractive Bags_Under_Eyes "
...: "Bald Bangs Chubby Male Wearing_Necktie Young")
In [44]: Classes = zero_indexed_enum('Classes', header)
In [45]: list(Classes)
Out[45]:
[<Classes.Arched_Eyebrows: 0>,
<Classes.Attractive: 1>,
<Classes.Bags_Under_Eyes: 2>,
<Classes.Bald: 3>,
<Classes.Bangs: 4>,
<Classes.Chubby: 5>,
<Classes.Male: 6>,
<Classes.Wearing_Necktie: 7>,
<Classes.Young: 8>]
You can use a custom class which I like to call as DotMap or as mentioned here is this SO discussion as Map:
https://stackoverflow.com/a/32107024/2598661 (Map, longer complete version)
https://stackoverflow.com/a/23689767/2598661 (dotdict, shorter lighter version)
About Map:
It has the features of a dictionary since the input to a Map/DotMap is a dict. You can access attributes using features['male'].
Additionally you can access the attributes using dot i.e. features.male and the attributes will be exposed when you do dir(features).
It is only as heavy as it needs to be in order to enable the dot functionality.
Unlike namedtuple you don't need to pre-define it and you can add and remove keys willy nilly.
The Map function described in the SO question is not Python3 compatible because it uses iteritems(). Just replace it with items() instead.
About dotdict:
dotdict provides the same advantages of Map with the exception that it does not override the dir() method therefore you will not be able to obtain the attributes for documentation. #SigmaPiEpsilon has provided a fix for this here.
It uses the dict.get method instead of dict.__getitem__ therefore it will return None instead of throwing KeyError when you are access attributes that don't exist.
It does not recursively apply dotdict-iness to nested dicts therefore you won't be able to use features.foo.bar.
Here's the updated version of dotdict which solves the first two issues:
class dotdict(dict):
__getattr__ = dict.__getitem__ # __getitem__ instead of get
__setattr__ = dict.__setitem__
__delattr__ = dict.__delitem__
def __dir__(self): # by #SigmaPiEpsilon for documentation
return self.keys()
Update
Map and dotdict don't have the same behavior as pointed out by #SigmaPiEpsilon so I added separate descriptions for both.
Of your examples, 3 is the most pythonic answer to your question.
1, as you said, does not even answer your question, since the names are not explicit.
2 uses enums, which though being in the standard library are not pythonic and generally not used in these scenarios in Python.
(Edit): In this case you only really need two different constants - the target values and the other ones. An Enum will provide separate values for each constant, which is not what the goal of your program is and seems to be a roundabout way of approaching the problem.
4 is just not maintainable if a client wants to add options, and even as it is it's painstaking work.
3 uses well-known classes from the standard library in a readable and succinct way. Also, it does not have any drawbacks, as it is perfectly explicit. Being too "heavy" doesn't matter if you don't care about performance, and anyway the lag will be unnoticeable with your input size.
Your requirements if I understand correctly can be divided into two parts:
Access the position of header elements in the .txt by name in the most pythonic way possible and with minimum external dependencies
Enable dot access to the data structure containing the names of the headers to be able to call dir() and setup easy interface with Sphinx
Pure Python Way (no external dependencies)
The most pythonic way to solve the problem is of course the method using dictionaries (dictionaries are at the heart of python). Searching a dictionary through key is also much faster than other methods. The only problem is this prevents dot access. Another answer mentions the Map and dotdict as alternatives. dotdict is simpler but it only enable dot access, it will not help in the documentation aspect with dir() since dir() calls the __dir__() method which is not overridden in these cases. Hence it will only return the attributes of Python dict and not the header names. See below:
>>> class dotdict(dict):
... __getattr__ = dict.get
... __setattr__ = dict.__setitem__
... __delattr__ = dict.__delitem__
...
>>> somedict = {'a' : 1, 'b': 2, 'c' : 3}
>>> somedotdict = dotdict(somedict)
>>> somedotdict.a
1
>>> 'a' in dir(somedotdict)
False
There are two options to get around this problem.
Option 1: Override the __dir__() method like below. But this will only work when you call dir() on the instances of the class. To make the changes apply for the class itself you have to create a metaclass for the class. See here
#add this to dotdict
def __dir__(self):
return self.keys()
>>> somedotdictdir = dotdictdir(somedict)
>>> somedotdictdir.a
1
>>> dir(somedotdictdir)
['a', 'b', 'c']
Option 2: A second option which makes it much closer to user-defined object with attributes is to update the __dict__ attribute of the created object. This is what Map also uses. A normal python dict does not have this attribute. If you add this then you can call dir() to get attributes/keys and also all the additional methods/attributes of python dict. If you just want the stored attribute and values you can use vars(somedotdictdir) which is also useful for documentation.
class dotdictdir(dict):
def __init__(self, *args, **kwargs):
dict.__init__(self, *args, **kwargs)
self.__dict__.update({k : v for k,v in self.items()})
def __setitem__(self, key, value):
dict.__setitem__(self, key, value)
self.__dict__.update({key : value})
__getattr__ = dict.get #replace with dict.__getitem__ if want raise error on missing key access
__setattr__ = __setitem__
__delattr__ = dict.__delitem__
>>> somedotdictdir = dotdictdir(somedict)
>>> somedotdictdir
{'a': 3, 'c': 6, 'b': 4}
>>> vars(somedotdictdir)
{'a': 3, 'c': 6, 'b': 4}
>>> 'a' in dir(somedotdictdir)
True
Numpy way
Another option will be to use a numpy record array which allows dot access. I noticed in your code you are already using numpy. In this case too __dir__() has to be overrridden to get the attributes. This may result in faster operations (not tested) for data with lots of other numeric values.
>>> headers = "Arched_Eyebrows Attractive Bags_Under_Eyes Bald Bangs Chubby Male Wearing_Necktie Young".split()
>>> header_rec = np.array([tuple(range(len(headers)))], dtype = zip(headers, [int]*len(headers)))
>>> header_rec.dtype.names
('Arched_Eyebrows', 'Attractive', 'Bags_Under_Eyes', 'Bald', 'Bangs', 'Chubby', 'Male', 'Wearing_Necktie', 'Young')
>>> np.in1d(header_rec.item(), [header_rec[key].item() for key in ["Young", "Bald", "Chubby"]]).astype(int)
array([0, 0, 0, 1, 0, 1, 0, 0, 1])
In Python 3, you will need to use dtype=list(zip(headers, [int]*len(headers))) since zip became its own object.
I wrote a quick script to scrape various data about mixed martial arts fights and their associated odds.
Originally, the data was a tuple with the first entry being the name of a fighter (string) and the second being their odds (float). The script later accessed this data, and I defined two constants, FIGHTER = 0 and ODDS = 1 so that I later use fight_data[FIGHTER] or fight_data[ODDS].
Since the data is immutable, a tuple made sense, and by defining constants my reasoning was that my IDE/Editor could catch typos as opposed to using a string index for a dictionary.
FIGHTER = 0
ODDS = 1
fight_data = get_data()
def process_data(fight_data):
do_something(fight_data[FIGHTER])
do_something(fight_data[ODDS])
What are the other alternatives? I thought of making a FightData class, but the data is strictly a value object with two small elements.
class FightData(object):
fighter = None
odds = None
def __init__(self, fighter, odds):
self.fighter = fighter
self.odds = odds
fight_data = get_data()
def process_data(data):
do_something(fight_data.fighter)
do_something(fight_data.odds)
In addition, I realized I could use a dictionary, and have fight_data["fighter"] but that seems both ugly and unnecessary to me.
Which one of these alternatives is the best?
Python is a "multi-paradigm" language, so in my opinion, either the procedural approach or the object-oriented approach is valid and Pythonic. For this use-case, with such a limited amount of data, I don't think you need to worry too much.
However, if you're going down the OOP route, I would define your class to be called Fighter and give it attributes called name and odds, and then do_something with the entire Fighter instance:
class Fighter(object):
def __init__(self, name, odds):
self.name = name
self.odds = odds
fighters = get_data()
# for example:
for fighter in fighters:
do_something(fighter)
These are my thoughts... unless you have serious performance issues or efficiency metrics you're trying to achieve, I would use a dict instead of a tuple. Just because the data is immutable doesn't mean you have to use a tuple. And IMO it looks cleaner and is easier to read. Using magic numbers like:
FIGHTER = 1
ODDS = 0
as index markers makes the code harder to understand. And a class is a bit overkill. But if you use a dict your code will look something like:
fight_data = get_data()
def process_data(fight_data):
do_something(fight_data['fighter'])
do_something(fight_data['odds'])
I just got rid of two lines of code, and now we don't have to use any magic variables to reference data. It's much easier to see exactly what you're doing without having to worry about FIGHTER and ODDS.
Don't use variables if you really don't have to. FIGHTER and ODDS really aren't necessary, that's why we have dicts.
Simple pieces of immutable data that you want to reference by field-name sounds like the perfect usecase for a namedtuple.
The SO question/answer in the above link gives a great explanation, but in summary: namedtuples are easily defined, memory-efficient immutable data structures that support data access via attribute reference much like Python Classes, but also fully support tuple operations as well.
from collections import namedtuple
#Defining the form of the namedtuple is much more lightweight than Classes
FightData = namedtuple("FightData", "fighter odds")
#You instantiate a namedtuple much like you would a class instance
fight_data1 = FightData("Andy Hug", 0.8)
#Fields can be referenced by name
print fight_data1.fighter
print fight_data1.odds
#Or by index just like a normal tuple
print fight_data1[0], fight_data1[1]
#They're tuples, so can be iterated over as well
for data in fight_data1:
print data
This is a question about a clean, pythonic way to juggle some different instance methods.
I have a class that operates a little differently depending on certain inputs. The differences didn't seem big enough to justify producing entirely new classes. I have to interface the class with one of several data "providers". I thought I was being smart when I introduced a dictionary:
self.interface_tools={'TYPE_A':{ ... various ..., 'data_supplier':self.current_data},
'TYPE_B':{ ... various ..., 'data_supplier':self.predicted_data} }
Then, as part of the class initialization, I have an input "source_name" and I do ...
# ... various ....
self.data_supplier = self.interface_tools[source_name]['data_supplier']
self.current_data and self.predicted_data need the same input parameter, so when it comes time to call the method, I don't have to distinguish them. I can just call
new_data = self.data_supplier(param1)
But now I need to interface with a new data source -- call it "TYPE_C" -- and it needs more input parameters. There are ways to do this, but nothing I can think of is very clean. For instance, I could just add the new parameters to the old data_suppliers and never use them, so then the call would look like
new_data = self.data_supplier(param1,param2,param3)
But I don't like that. I could add an if block
if self.data_source != 'TYPE_C':
new_data = self.data_supplie(param1)
else:
new_data = self.data_c_supplier(param1,param2,param3)
but avoiding if blocks like this was exactly what I was trying to do in the first place with that dictionary I came up with.
So the upshot is: I have a few "data_supplier" routines. Now that my project has expanded, they have different input lists. But I want my class to be able to treat them all the same to the extent possible. Any ideas? Thanks.
Sounds like your functions could be making use of variable length argument lists.
That said, you could also just make subclasses. They're fairly cheap to make, and would solve your problem here. This is pretty much the case they were designed for.
You could make all your data_suppliers accept a single argument and make it a dictionary or a list or even a NamedTuple.
I need to make a league table for a project. There has to be 3 files,2 files consist of 1 class and the last file is for running a program. I have done all of the parts but when I call a method to add a team, the program adds the name but it does not insert it into the list of teams(which should do). When I try to display the items in the list, the program displays an error message instead of showing the actual team.
How can I fix it?Any help would be appreciated. :)
A few things here:
When I try to display the items in the list, the program displays: team.Team object at 0x000000000332A978 insted of showing the actual team.
The default display for a user class is something like <team.Team object at 0x000000000332A978>. If you want it to display something different, you have to tell Python what you want to display. There are two separate functions for this: __repr__ and __str__. The idea is that the first is a representation for the programmer, the second for the user. If you don't need two different representations, just define __repr__ and it'll use that whenever it needs __str__.
So, a really simple way to fix this is to add this to the Team class:
def __repr__(self):
return 'Team("{}")'.format(self._name)
Now, if you call league.addTeam('Dodgers'), then print(l._table), you'll get [Team("Dodgers")] instead of [<team.Team object at 0x000000000332A978>].
Meanwhile, these two methods are probably not what you want:
def removeTeam(self,team):
self._table.remove(team)
def returnPosition(self,team):
return self._table.index(team)
These will remove or find a team given the Team object—not the name, or even a new Team created from the name, but a reference to the exact same object stored in the _table. This is not all that useful, and you seem to want to call them with just names.
There are two ways to fix this: You could change Team so that it compares by name instead of by object identity, by adding this method to the class:
def __eq__(self, other):
return self._name == other._name
What this means is that if you say Team('Giants') == Team('Giants'), it will now be true instead of False. Even if the first team is in a different league, and has a different W-L record, and so on (e.g., like the baseball "Giants" from San Francisco vs. the football "Giants" from New York), as far as Python is concerned, they're now the same team. Of course if that's not what you want, you can write any other __eq__ function that seems more appropriate.
Anyway, if you do this, the index and remove functions will now be able to find any Team with the same name, instead of just the exact same team, so:
def removeTeam(self,team_name):
self._table.remove(Team(team_name))
def returnPosition(self,team_name):
return self._table.index(Team(team_name))
If you go this way, you might want to consider defining all of the comparison methods, so you can, e.g., sort a list of teams, and they sort by name.
Or you could change these methods so they don't work based on equality, e.g., by redefining them like this:
def removeTeam(self,team_name):
self._table = [team for team in self._table if team._name != team_name]
def returnPosition(self,team_name):
return [team._name for team in self._table].index(team_name)
To understand how these work, if you're not used to reading list comprehensions, turn each one back into the equivalent loop:
self._table = [team for team in self._table if team._name != team_name]
temp = []
for team in self._table:
if team._name != team_name:
temp.append(team)
self._table = temp
If you step through this, temp ends up with a list of every team in the table, except the one you wanted to remove, and then you replace the old self._table with the new filtered one. (Another way to write the same idea is with filter, if you know that function.)
It's usually better to create a new filtered list than to modify a list in-place. Sometimes there are performance reasons not do this, and sometimes it ends up being very complex and hard to understand, but it's usually both faster and simpler to reason about. Also, modifying lists in place leads to problems like this:
for i, value in enumerate(mylist):
if value == value_to_remove:
del mylist[i]
Play with this for a while, and you'll see that it doesn't actually work. Understanding why is a bit complicated, and you probably don't want to learn that until later. The usual trick to solve the problem is to iterate over a copy of the list… but once you're doing that, you've now got the worst of filtering and the worst of deleting-in-place at the same time.
The second function may be a little too clever, but let's look at it:
def returnPosition(self,team_name):
return [team._name for team in self._table].index(team_name)
First, I'm creating a list like the original one, but it's a list of just the names instead of the team objects. Again, let's decompose the list comprehension:
temp = []
for team in self._table:
temp.append(team._name)
Or try to translate it into English: This is a list of the team name of every team in the table.
Now, because this is a list of team names, I can use index(team_name) and it will find it. And, because the two lists have the same shape, I know that this is the right index to use in the original team list as well.
A much simpler solution would be to change _tables from a list of Teams into a dict mapping names to Teams. This is probably the most Pythonic solution—it looks a lot simpler than writing list comprehensions to do simple operations. (It's also probably the most efficient, but that's hardly relevant unless you have some truly gigantic leagues.) And then you don't even need returnPosition for anything. To do that:
def __init__(self):
self._table={}
def addTeam(self,name):
self._table[name]=Team(name)
def removeTeam(self,team_name):
del self._table[team_name]
def returnPosition(self,team_name):
return team_name
def updateLeague(self,team1_name1,team_name2,score1,score2):
if score1>score2:
self._table[team_name1].win()
self._table[team_name2].loss()
elif score1==score2:
self._table[team_name1].draw()
self._table[team_name2].draw()
elif score1<score2:
self._table[team_name1].loss()
self._table[team_name2].win()
Note that I've defined returnPosition to just return the team name itself as the position. If you think about it, dict keys are used exactly the same way as list indices, so this means any code someone wrote for the "old" API that required returnPosition will still work with the "new" API. (I probably wouldn't try to sell this to a teacher who assigned a problem that required us to use returnPosition, but for a real-life library where I wanted to make it easier for my 1.3 users to migrate to 2.0, I probably would.)
This only requires a few other changes. In displayList and saveList, you iterate over self._table.values() rather than self._table; in loadList, you change self._table.append(team) to self._table[a] = team. Speaking of loadList: You might want to consider renaming those local variables from a, b, c, and d to name, wins, losses, and draws.
A few other comments:
As kreativitea says in the comments, you should not create "private" variables and then add do-nothing accessor methods in Python. It's just more boilerplate that hides the real code, and one more thing you can get wrong with a silly typo that you'll spend hours debugging one day. Just have members named name, wins, losses, etc., and access them directly. (If someone told you that this is bad style because it doesn't let you replace the implementation in the future without changing the interface, that's only true in Java and C++, not in Python. If you ever need to replace the implementation, just read up on #property.)
You don't need print("""""")—and it's very easy to accidentally miscount the number of " characters. (Especially since some IDEs will actually be confused by this and think the multi-line string never ends.) Just do print().
You've got the same ending condition both in the while loop (while x!="q":) and in an internal break. You don't need it in both places. Either change it to while True:, or get rid of the break (just make options("q") do print("Goodbye"), so you don't need to special-case it at all inside the loop).
Whenever you have a long chain of elif statements, think about whether you can turn it into a dict of short functions. I'm not sure it's a good idea in this case, but it's always worth thinking about and making the explicit decision.
The last idea would look something like this:
def addTeam():
name=input("Enter the name of the team:")
l.addTeam(name)
def removeTeam():
teamToRemove=input("Enter the name of the team you want to remove:")
l.removeTeam(teamToRemove)
def recordGame():
team1=input("What is the name of the team?")
ans1=int(input("Enter the number of goals for the first team:"))
team2=input("What is the name of the team?")
ans2=int(input("Enter the number of goals for the second time:"))
l.updateLeague(team1,team2,ans1,ans2)
optionsdict = {
"a": addTeam,
"d": l.displayList,
"s": l.saveList,
"l": l.loadList,
"r": removeTeam,
"rec": recordGame,
}
def options(x):
func = optionsdict.get(x)
if func:
func()
As I said, I'm not sure it's actually clearer in this case, but it's worth considering.
I've made two classes called House and Window. I then made a list containing four Houses. Each instance of House has a list of Windows. I'm trying to iterate over the windows in each house and print it's ID. However, I seem to get some odd results :S I'd greatly appreciate any help.
#!/usr/bin/env python
# Minimal house class
class House:
ID = ""
window_list = []
# Minimal window class
class Window:
ID = ""
# List of houses
house_list = []
# Number of windows to build into each of the four houses
windows_per_house = [1, 3, 2, 1]
# Build the houses
for new_house in range(0, len(windows_per_house)):
# Append the new house to the house list
house_list.append(House())
# Give the new house an ID
house_list[new_house].ID = str(new_house)
# For each new house build some windows
for new_window in range(0, windows_per_house[new_house]):
# Append window to house's window list
house_list[new_house].window_list.append(Window())
# Give the window an ID
house_list[new_house].window_list[new_window].ID = str(new_window)
#Iterate through the windows of each house, printing house and window IDs.
for house in house_list:
print "House: " + house.ID
for window in house.window_list:
print " Window: " + window.ID
####################
# Desired output:
#
# House: 0
# Window: 0
# House: 1
# Window: 0
# Window: 1
# Window: 2
# House: 2
# Window: 0
# Window: 1
# House: 3
# Window: 0
####################
Currently you are using class attributes instead of instance attributes. Try changing your class definitions to the following:
class House:
def __init__(self):
self.ID = ""
self.window_list = []
class Window:
def __init__(self):
self.ID = ""
The way your code is now all instances of House are sharing the same window_list.
Here's the updated code.
# Minimal house class
class House:
def __init__(self, id):
self.ID = id
self.window_list = []
# Minimal window class
class Window:
ID = ""
# List of houses
house_list = []
# Number of windows to build into each of the for houses
windows_per_house = [1, 3, 2, 1]
# Build the houses
for new_house in range(len(windows_per_house)):
# Append the new house to the house list
house_list.append(House(str(new_house)))
# For each new house build some windows
for new_window in range(windows_per_house[new_house]):
# Append window to house's window list
house_list[new_house].window_list.append(Window())
# Give the window an ID
house_list[new_house].window_list[new_window].ID = str(new_window)
#Iterate through the windows of each house, printing house and window IDs.
for house in house_list:
print "House: " + house.ID
for window in house.window_list:
print " Window: " + window.ID
The actual problem is that the window_list attribute is mutable, so when the different instances are using it, they end up sharing the same one. By moving window_list into __init__ each instance gets its own.
C++, Java, C# etc. have this really strange behaviour regarding instance variables, whereby data (members, or fields, depending on which culture you belong to) that's described within a class {} block belongs to instances, while functions (well, methods, but C++ programmers seem to hate that term and say "member functions" instead) described within the same block belong to the class itself. Strange, and confusing, when you actually think about it.
A lot of people don't think about it; they just accept it and move on. But it actually causes confusion for a lot of beginners, who assume that everything within the block belongs to the instances. This leads to bizarre (to experienced programmers) questions and concerns about the per-instance overhead of these methods, and trouble wrapping their heads around the whole "vtable" implementation concept. (Of course, it's mostly the teachers' collective fault for failing to explain that vtables are just one implementation, and for failing to make clear distinctions between classes and instances in the first place.)
Python doesn't have this confusion. Since in Python, functions (including methods) are objects, it would be bizarrely inconsistent for the compiler to make a distinction like that. So, what happens in Python is what you should intuitively expect: everything within the class indented block belongs to the class itself. And, yes, Python classes are themselves objects as well (which gives a place to put those class attributes), and you don't have to jump through standard library hoops to use them reflectively. (The absence of manifest typing is quite liberating here.)
So how, I hear you protest, do we actually add any data to the instances? Well, by default, Python doesn't restrict you from adding anything to any instance. It doesn't even require you to make different instances of the same class contain the same attributes. And it certainly doesn't pre-allocate a single block of memory to contain all the object's attributes. (It would only be able to contain references, anyway, given that Python is a pure reference-semantics language, with no C# style value types or Java style primitives.)
But obviously, it's a good idea to do things that way, so the usual convention is "add all the data at the time that the instance is constructed, and then don't add any more (or delete any) attributes".
"When it's constructed"? Python doesn't really have constructors in the C++/Java/C# sense, because this absence of "reserved space" means there's no real benefit to considering "initialization" as a separate task from ordinary assignment - except of course the benefit of initialization being something that automatically happens to a new object.
So, in Python, our closest equivalent is the magic __init__ method that is automatically called upon newly-created instances of the class. (There is another magic method called __new__, which behaves more like a constructor, in the sense that it's responsible for the actual creation of the object. However, in nearly every case we just want to delegate to the base object __new__, which calls some built-in logic to basically give us a little pointer-ball that can serve as an object, and point it to a class definition. So there's no real point in worrying about __new__ in almost every case. It's really more analogous to overloading the operator new for a class in C++.) In the body of this method (there are no C++-style initialization lists, because there is no pre-reserved data to initialize), we set initial values for attributes (and possibly do other work), based on the parameters we're given.
Now, if we want to be a little bit neater about things, or efficiency is a real concern, there is another trick up our sleeves: we can use the magic __slots__ attribute of the class to specify class attribute names. This is a list of strings, nothing fancy. However, this still doesn't pre-initialize anything; an instance doesn't have an attribute until you assign it. This just prevents you from adding attributes with other names. You can even still delete attributes from an object whose class has specified __slots__. All that happens is that the instances are given a different internal structure, to optimize memory usage and attribute lookup.
The __slots__ usage requires that we derive from the built-in object type, which we should do anyway (although we aren't required in Python 2.x, this is intended only for backwards-compatibility purposes).
Ok, so now we can make the code work. But how do we make it right for Python?
First off, just as with any other language, constantly commenting to explain already-self-explanatory things is a bad idea. It distracts the user, and doesn't really help you as a learner of the language, either. You're supposed to know what a class definition looks like, and if you need a comment to tell you that a class definition is a class definition, then reading the code comments isn't the kind of help you need.
With this whole "duck typing" thing, it's poor form to include data type names in variable (or attribute) names. You're probably protesting, "but how am I supposed to keep track of the type otherwise, without the manifest type declaration"? Don't. The code that uses your list of windows doesn't care that your list of windows is a list of windows. It just cares that it can iterate over the list of windows, and thus obtain values that can be used in certain ways that are associated with windows. That's how duck typing works: stop thinking about what the object is, and worry about what it can do.
You'll notice in the code below that I put the string conversion code into the House and Window constructors themselves. This serves as a primitive form of type-checking, and also makes sure that we can't forget to do the conversion. If someone tries to create a House with an ID that can't even be converted to a string, then it will raise an exception. Easier to ask for forgiveness than permission, after all. (Note that you actually have to go out of your way a bit in Python to create
As for the actual iteration... in Python, we iterate by actually iterating over the objects in a container. Java and C# have this concept as well, and you can get at it with the C++ standard library too (although a lot of people don't bother). We don't iterate over indices, because it's a useless and distracting indirection. We don't need to number our "windows_per_house" values in order to use them; we just need to look at each value in turn.
How about the ID numbers, I hear you ask? Simple. Python provides us with a function called 'enumerate', which gives us (index, element) pairs given an input sequence of elements). It's clean, it lets us be explicit about our need for the index to solve the problem (and the purpose of the index), and it's a built-in that doesn't need to be interpreted like the rest of the Python code, so it doesn't incur all that much overhead. (When memory is a concern, it's possible to use a lazy-evaluation version instead.)
But even then, iterating to create each house, and then manually appending each one to an initially-empty list, is too low-level. Python knows how to construct a list of values; we don't need to tell it how. (And as a bonus, we typically get better performance by letting it do that part itself, since the actual looping logic can now be done internally, in native C.) We instead describe what we want in the list, with a list comprehension. We don't have to walk through the steps of "take each window-count in turn, make the corresponding house, and add it to the list", because we can say "a list of houses with the corresponding window-count for each window-count in this input list" directly. That's arguably clunkier in English, but much cleaner in a programming language like Python, because you can skip a bunch of the little words, and you don't have to expend effort to describe the initial list, or the act of appending the finished houses to the list. You don't describe the process at all, just the result. Made-to-order.
Finally, as a general programming concept, it makes sense, whenever possible, to delay the construction of an object until we have everything ready that's needed for that object's existence. "Two-phase construction" is ugly. So we make the windows for a house first, and then the house (using those windows). With list comprehensions, this is simple: we just nest the list comprehensions.
class House(object):
__slots__ = ['ID', 'windows']
def __init__(self, id, windows):
self.ID = str(id)
self.windows = windows
class Window(object):
__slots__ = ['ID']
def __init__(self, id):
self.ID = str(id)
windows_per_house = [1, 3, 2, 1]
# Build the houses.
houses = [
House(house_id, [Window(window_id) for window_id in range(window_count)])
for house_id, window_count in enumerate(windows_per_house)
]
# See how elegant the list comprehensions are?
# If you didn't quite follow the logic there, please try **not**
# to imagine the implicitly-defined process as you trace through it.
# (Pink elephants, I know, I know.) Just understand what is described.
# And now we can iterate and print just as before.
for house in houses:
print "House: " + house.ID
for window in house.windows:
print " Window: " + window.ID
Apart from some indentation errors, you're assigning the IDs and window_lists to the class and not the instances.
You want something like
class House():
def __init__(self, ID):
self.ID = ID
self.window_list = []
etc.
Then, you can do house_list.append(House(str(newHouse))) and so on.