What is the most Pythonic way to index collection data - python

I wrote a quick script to scrape various data about mixed martial arts fights and their associated odds.
Originally, the data was a tuple with the first entry being the name of a fighter (string) and the second being their odds (float). The script later accessed this data, and I defined two constants, FIGHTER = 0 and ODDS = 1 so that I later use fight_data[FIGHTER] or fight_data[ODDS].
Since the data is immutable, a tuple made sense, and by defining constants my reasoning was that my IDE/Editor could catch typos as opposed to using a string index for a dictionary.
FIGHTER = 0
ODDS = 1
fight_data = get_data()
def process_data(fight_data):
do_something(fight_data[FIGHTER])
do_something(fight_data[ODDS])
What are the other alternatives? I thought of making a FightData class, but the data is strictly a value object with two small elements.
class FightData(object):
fighter = None
odds = None
def __init__(self, fighter, odds):
self.fighter = fighter
self.odds = odds
fight_data = get_data()
def process_data(data):
do_something(fight_data.fighter)
do_something(fight_data.odds)
In addition, I realized I could use a dictionary, and have fight_data["fighter"] but that seems both ugly and unnecessary to me.
Which one of these alternatives is the best?

Python is a "multi-paradigm" language, so in my opinion, either the procedural approach or the object-oriented approach is valid and Pythonic. For this use-case, with such a limited amount of data, I don't think you need to worry too much.
However, if you're going down the OOP route, I would define your class to be called Fighter and give it attributes called name and odds, and then do_something with the entire Fighter instance:
class Fighter(object):
def __init__(self, name, odds):
self.name = name
self.odds = odds
fighters = get_data()
# for example:
for fighter in fighters:
do_something(fighter)

These are my thoughts... unless you have serious performance issues or efficiency metrics you're trying to achieve, I would use a dict instead of a tuple. Just because the data is immutable doesn't mean you have to use a tuple. And IMO it looks cleaner and is easier to read. Using magic numbers like:
FIGHTER = 1
ODDS = 0
as index markers makes the code harder to understand. And a class is a bit overkill. But if you use a dict your code will look something like:
fight_data = get_data()
def process_data(fight_data):
do_something(fight_data['fighter'])
do_something(fight_data['odds'])
I just got rid of two lines of code, and now we don't have to use any magic variables to reference data. It's much easier to see exactly what you're doing without having to worry about FIGHTER and ODDS.
Don't use variables if you really don't have to. FIGHTER and ODDS really aren't necessary, that's why we have dicts.

Simple pieces of immutable data that you want to reference by field-name sounds like the perfect usecase for a namedtuple.
The SO question/answer in the above link gives a great explanation, but in summary: namedtuples are easily defined, memory-efficient immutable data structures that support data access via attribute reference much like Python Classes, but also fully support tuple operations as well.
from collections import namedtuple
#Defining the form of the namedtuple is much more lightweight than Classes
FightData = namedtuple("FightData", "fighter odds")
#You instantiate a namedtuple much like you would a class instance
fight_data1 = FightData("Andy Hug", 0.8)
#Fields can be referenced by name
print fight_data1.fighter
print fight_data1.odds
#Or by index just like a normal tuple
print fight_data1[0], fight_data1[1]
#They're tuples, so can be iterated over as well
for data in fight_data1:
print data

Related

Best way to make 'n' objects with each with unique data contents?

Firstly, you should know that I am incredibly new to programming, so I will love any detailed explanations.
So what I am attempting to make is a program that basically creates people. This includes unique characteristics as such their name, income, job, etc. And since I planned to make a large number of 'people,' I hoped I could merely state how many people I wanted made, and I would get each of them as a object class. To name them I figured I could do 'person1,' 'person2,' and so on. My trouble came when I found out you can't make strings into objects. (Or rather, it is heavily frowned upon.)
After researching I was able to make each person a dictionary, with a key like 'income' and a value like '60000.' However, when it comes to manipulating the data created it seems much better to uses classes and methods instead.
Thank you, and sorry if this is bad or if I am overlooking something.
Edit: I realized I could ask this better, how can I instantiate a large number of persons, or how do I make the needed variables to instantiate? I suck at explaining things...
It seems to me that you are asking two distinct questions (correct me if I'm wrong). The first - how should you store your data. The second - how can you do that repeatedly with ease.
There are a couple of ways you can store the data. I don't know your exact usecase so I can't say exactly which one would work best (you mentioned creating objects in your question so I'll use that for further examples)
Objects
class Person(object):
def __init__(self, name, income):
self.name = name
self.income = income
Namedtuples
>>> from collections import namedtuple
>>> a = namedtuple("person", ['name', 'income'])
>>> a
<class '__main__.person'>
>>> ab = a("Dannnnno", 100)
>>> ab
person(name='Dannnnno', income=100)
>>> ab.name
'Dannnnno'
>>> ab.income
100
Dictionaries
someperson = {0 : {name:"Dannnno", income:100}}
someotherperson = {1: {name:"kcd", income:100}}
As for creating large numbers of them - either create a class like GroupOfPeople or use a function.
Using the Classes example from above (I assume you could translate the other two examples appropriately)
class GroupOfPeople(object):
def __init__(self, num_people):
self.people = [Person("Default", 0) for i in range num_people]
####
def MakeLotsOfPeople(num_people):
return [Person("Default", 0) for i in range num_people]
You could then edit those separate Person instances to whatever you want. You could also edit the class/function to accept another input (like a filename perhaps) that stored all of your name/income/etc data.
If you want a dictionary of the group of people just replace the list comprehensions with a dictionary comprehension, like so
{i : Person("Default", 0) for i in range num_people}
Look up Object Oriented Programming. This is the concept you are trying to wrap your head around.
http://en.wikipedia.org/wiki/Object-oriented_programming

League table in Python - it doesn't insert teams into the list

I need to make a league table for a project. There has to be 3 files,2 files consist of 1 class and the last file is for running a program. I have done all of the parts but when I call a method to add a team, the program adds the name but it does not insert it into the list of teams(which should do). When I try to display the items in the list, the program displays an error message instead of showing the actual team.
How can I fix it?Any help would be appreciated. :)
A few things here:
When I try to display the items in the list, the program displays: team.Team object at 0x000000000332A978 insted of showing the actual team.
The default display for a user class is something like <team.Team object at 0x000000000332A978>. If you want it to display something different, you have to tell Python what you want to display. There are two separate functions for this: __repr__ and __str__. The idea is that the first is a representation for the programmer, the second for the user. If you don't need two different representations, just define __repr__ and it'll use that whenever it needs __str__.
So, a really simple way to fix this is to add this to the Team class:
def __repr__(self):
return 'Team("{}")'.format(self._name)
Now, if you call league.addTeam('Dodgers'), then print(l._table), you'll get [Team("Dodgers")] instead of [<team.Team object at 0x000000000332A978>].
Meanwhile, these two methods are probably not what you want:
def removeTeam(self,team):
self._table.remove(team)
def returnPosition(self,team):
return self._table.index(team)
These will remove or find a team given the Team object—not the name, or even a new Team created from the name, but a reference to the exact same object stored in the _table. This is not all that useful, and you seem to want to call them with just names.
There are two ways to fix this: You could change Team so that it compares by name instead of by object identity, by adding this method to the class:
def __eq__(self, other):
return self._name == other._name
What this means is that if you say Team('Giants') == Team('Giants'), it will now be true instead of False. Even if the first team is in a different league, and has a different W-L record, and so on (e.g., like the baseball "Giants" from San Francisco vs. the football "Giants" from New York), as far as Python is concerned, they're now the same team. Of course if that's not what you want, you can write any other __eq__ function that seems more appropriate.
Anyway, if you do this, the index and remove functions will now be able to find any Team with the same name, instead of just the exact same team, so:
def removeTeam(self,team_name):
self._table.remove(Team(team_name))
def returnPosition(self,team_name):
return self._table.index(Team(team_name))
If you go this way, you might want to consider defining all of the comparison methods, so you can, e.g., sort a list of teams, and they sort by name.
Or you could change these methods so they don't work based on equality, e.g., by redefining them like this:
def removeTeam(self,team_name):
self._table = [team for team in self._table if team._name != team_name]
def returnPosition(self,team_name):
return [team._name for team in self._table].index(team_name)
To understand how these work, if you're not used to reading list comprehensions, turn each one back into the equivalent loop:
self._table = [team for team in self._table if team._name != team_name]
temp = []
for team in self._table:
if team._name != team_name:
temp.append(team)
self._table = temp
If you step through this, temp ends up with a list of every team in the table, except the one you wanted to remove, and then you replace the old self._table with the new filtered one. (Another way to write the same idea is with filter, if you know that function.)
It's usually better to create a new filtered list than to modify a list in-place. Sometimes there are performance reasons not do this, and sometimes it ends up being very complex and hard to understand, but it's usually both faster and simpler to reason about. Also, modifying lists in place leads to problems like this:
for i, value in enumerate(mylist):
if value == value_to_remove:
del mylist[i]
Play with this for a while, and you'll see that it doesn't actually work. Understanding why is a bit complicated, and you probably don't want to learn that until later. The usual trick to solve the problem is to iterate over a copy of the list… but once you're doing that, you've now got the worst of filtering and the worst of deleting-in-place at the same time.
The second function may be a little too clever, but let's look at it:
def returnPosition(self,team_name):
return [team._name for team in self._table].index(team_name)
First, I'm creating a list like the original one, but it's a list of just the names instead of the team objects. Again, let's decompose the list comprehension:
temp = []
for team in self._table:
temp.append(team._name)
Or try to translate it into English: This is a list of the team name of every team in the table.
Now, because this is a list of team names, I can use index(team_name) and it will find it. And, because the two lists have the same shape, I know that this is the right index to use in the original team list as well.
A much simpler solution would be to change _tables from a list of Teams into a dict mapping names to Teams. This is probably the most Pythonic solution—it looks a lot simpler than writing list comprehensions to do simple operations. (It's also probably the most efficient, but that's hardly relevant unless you have some truly gigantic leagues.) And then you don't even need returnPosition for anything. To do that:
def __init__(self):
self._table={}
def addTeam(self,name):
self._table[name]=Team(name)
def removeTeam(self,team_name):
del self._table[team_name]
def returnPosition(self,team_name):
return team_name
def updateLeague(self,team1_name1,team_name2,score1,score2):
if score1>score2:
self._table[team_name1].win()
self._table[team_name2].loss()
elif score1==score2:
self._table[team_name1].draw()
self._table[team_name2].draw()
elif score1<score2:
self._table[team_name1].loss()
self._table[team_name2].win()
Note that I've defined returnPosition to just return the team name itself as the position. If you think about it, dict keys are used exactly the same way as list indices, so this means any code someone wrote for the "old" API that required returnPosition will still work with the "new" API. (I probably wouldn't try to sell this to a teacher who assigned a problem that required us to use returnPosition, but for a real-life library where I wanted to make it easier for my 1.3 users to migrate to 2.0, I probably would.)
This only requires a few other changes. In displayList and saveList, you iterate over self._table.values() rather than self._table; in loadList, you change self._table.append(team) to self._table[a] = team. Speaking of loadList: You might want to consider renaming those local variables from a, b, c, and d to name, wins, losses, and draws.
A few other comments:
As kreativitea says in the comments, you should not create "private" variables and then add do-nothing accessor methods in Python. It's just more boilerplate that hides the real code, and one more thing you can get wrong with a silly typo that you'll spend hours debugging one day. Just have members named name, wins, losses, etc., and access them directly. (If someone told you that this is bad style because it doesn't let you replace the implementation in the future without changing the interface, that's only true in Java and C++, not in Python. If you ever need to replace the implementation, just read up on #property.)
You don't need print("""""")—and it's very easy to accidentally miscount the number of " characters. (Especially since some IDEs will actually be confused by this and think the multi-line string never ends.) Just do print().
You've got the same ending condition both in the while loop (while x!="q":) and in an internal break. You don't need it in both places. Either change it to while True:, or get rid of the break (just make options("q") do print("Goodbye"), so you don't need to special-case it at all inside the loop).
Whenever you have a long chain of elif statements, think about whether you can turn it into a dict of short functions. I'm not sure it's a good idea in this case, but it's always worth thinking about and making the explicit decision.
The last idea would look something like this:
def addTeam():
name=input("Enter the name of the team:")
l.addTeam(name)
def removeTeam():
teamToRemove=input("Enter the name of the team you want to remove:")
l.removeTeam(teamToRemove)
def recordGame():
team1=input("What is the name of the team?")
ans1=int(input("Enter the number of goals for the first team:"))
team2=input("What is the name of the team?")
ans2=int(input("Enter the number of goals for the second time:"))
l.updateLeague(team1,team2,ans1,ans2)
optionsdict = {
"a": addTeam,
"d": l.displayList,
"s": l.saveList,
"l": l.loadList,
"r": removeTeam,
"rec": recordGame,
}
def options(x):
func = optionsdict.get(x)
if func:
func()
As I said, I'm not sure it's actually clearer in this case, but it's worth considering.

Should I extract values from Python dictionaries into object attributes?

I have a Python class that is initialized with a dictionary of settings, like this:
def __init__(self, settings):
self._settings = settings
Settings dictionary contains 50-100 different parameters that are used quite a lot in other methods:
def MakeTea(self):
tea = Tea()
if self._settings['use_sugar']:
tea.sugar_spoons = self._settings['spoons_of_sugar']
return tea
What I want to know is whether it makes sense to preload all the params into instance attributes like this:
def __init__(self, settings):
self._use_sugar = settings['use_sugar']
self._spoons_of_sugar = settings['spoons_of_sugar']
and use these attributes instead of looking up dictionary values every time I need them:
def MakeTea(self):
tea = Tea()
if self._use_sugar:
tea.sugar_spoons = _self._spoons_of_sugar
return tea
Now, I am fairly new to Python and I worked mostly with compiled languages where it really is a no-brainer: access to instance fields will be much faster than looking up values from any kind of hashtable-based structure. However, with Python being interpreted and all, I'm not sure that I'll have any significant performance gain because at the moment I have almost no knowledge of how Python interpreter works. For all I know, using attribute name in code may involve using some internal dictionaries of identifiers in interpreted environment, so I gain nothing.
So, the question: are there any significant performance benefits in extracting values from dictionary and putting them in instance attributes? Are there any other benefits or downsides of doing it? What's the good practice?
I strongly believe that this is an engineering decision rather than premature optimization. Also, I'm just curious and trying to write decent Python code, so the question seems valid to me whether I actually need those milliseconds or not.
You're comparing attribute access (self.setting) with attribute access (self.settings) plus a dictionary lookup (settings['setting']). Classes are actually implemented as dictionaries, so the problem reduces to two dictionary lookups vs. one. One lookup will be faster.
A simpler and faster way to copy an initialization dict than the one in the other answer is:
class Foobar(object):
def __init__(self, init_dict):
self.__dict__.update(init_dict)
However, I wouldn't do this for optimization purposes. It's both premature optimization (you don't know that you have a speed problem, or what your bottleneck is) and a micro-optimization (making an O(n2) algorithm O(n) will make more of a difference than removing an O(1) dictionary lookup from the original algorithm).
If somewhere, you're accessing one of these settings many, many times, just create a local reference to it, rather than polluting the namespace of Foobar instances with tons of settings.
These are two reasonable designs to consider, but you shouldn't choose one or the other for performance reasons. Instead of either one, I would probably create another object:
class Settings(object):
def __init__(self, init_dict):
self.__dict__.update(init_dict)
class Foobar(object):
def __init__(self, init_dict):
self.settings = Settings(init_dict)
just because I think self.settings.setting is nicer than self.settings['setting'] and it still keeps things organized.
This is a good use for a collections.namedtuple, if you know in advance what all the setting names are.
If you put them into the instance attributes then you'll be looking up your instance dictionary... so in the end you're just gonna be doing the same thing. So no real performance gain or loss.
Example:
>>> class Foobar(object):
def __init__(self, init_dict):
for arg in init_dict:
self.__setattr__(arg, init_dict[arg])
>>> foo = Foobar({'foobar': 'barfoo', 'shroobniz': 'foo'})
>>> print(foo.__dict__)
{'foobar': 'barfoo', 'shroobniz': 'foo'}
So if python looks up foo.__dict__ or foo._settings doesn't really make a difference.

Python: iterating through a list of objects within a list of objects

I've made two classes called House and Window. I then made a list containing four Houses. Each instance of House has a list of Windows. I'm trying to iterate over the windows in each house and print it's ID. However, I seem to get some odd results :S I'd greatly appreciate any help.
#!/usr/bin/env python
# Minimal house class
class House:
ID = ""
window_list = []
# Minimal window class
class Window:
ID = ""
# List of houses
house_list = []
# Number of windows to build into each of the four houses
windows_per_house = [1, 3, 2, 1]
# Build the houses
for new_house in range(0, len(windows_per_house)):
# Append the new house to the house list
house_list.append(House())
# Give the new house an ID
house_list[new_house].ID = str(new_house)
# For each new house build some windows
for new_window in range(0, windows_per_house[new_house]):
# Append window to house's window list
house_list[new_house].window_list.append(Window())
# Give the window an ID
house_list[new_house].window_list[new_window].ID = str(new_window)
#Iterate through the windows of each house, printing house and window IDs.
for house in house_list:
print "House: " + house.ID
for window in house.window_list:
print " Window: " + window.ID
####################
# Desired output:
#
# House: 0
# Window: 0
# House: 1
# Window: 0
# Window: 1
# Window: 2
# House: 2
# Window: 0
# Window: 1
# House: 3
# Window: 0
####################
Currently you are using class attributes instead of instance attributes. Try changing your class definitions to the following:
class House:
def __init__(self):
self.ID = ""
self.window_list = []
class Window:
def __init__(self):
self.ID = ""
The way your code is now all instances of House are sharing the same window_list.
Here's the updated code.
# Minimal house class
class House:
def __init__(self, id):
self.ID = id
self.window_list = []
# Minimal window class
class Window:
ID = ""
# List of houses
house_list = []
# Number of windows to build into each of the for houses
windows_per_house = [1, 3, 2, 1]
# Build the houses
for new_house in range(len(windows_per_house)):
# Append the new house to the house list
house_list.append(House(str(new_house)))
# For each new house build some windows
for new_window in range(windows_per_house[new_house]):
# Append window to house's window list
house_list[new_house].window_list.append(Window())
# Give the window an ID
house_list[new_house].window_list[new_window].ID = str(new_window)
#Iterate through the windows of each house, printing house and window IDs.
for house in house_list:
print "House: " + house.ID
for window in house.window_list:
print " Window: " + window.ID
The actual problem is that the window_list attribute is mutable, so when the different instances are using it, they end up sharing the same one. By moving window_list into __init__ each instance gets its own.
C++, Java, C# etc. have this really strange behaviour regarding instance variables, whereby data (members, or fields, depending on which culture you belong to) that's described within a class {} block belongs to instances, while functions (well, methods, but C++ programmers seem to hate that term and say "member functions" instead) described within the same block belong to the class itself. Strange, and confusing, when you actually think about it.
A lot of people don't think about it; they just accept it and move on. But it actually causes confusion for a lot of beginners, who assume that everything within the block belongs to the instances. This leads to bizarre (to experienced programmers) questions and concerns about the per-instance overhead of these methods, and trouble wrapping their heads around the whole "vtable" implementation concept. (Of course, it's mostly the teachers' collective fault for failing to explain that vtables are just one implementation, and for failing to make clear distinctions between classes and instances in the first place.)
Python doesn't have this confusion. Since in Python, functions (including methods) are objects, it would be bizarrely inconsistent for the compiler to make a distinction like that. So, what happens in Python is what you should intuitively expect: everything within the class indented block belongs to the class itself. And, yes, Python classes are themselves objects as well (which gives a place to put those class attributes), and you don't have to jump through standard library hoops to use them reflectively. (The absence of manifest typing is quite liberating here.)
So how, I hear you protest, do we actually add any data to the instances? Well, by default, Python doesn't restrict you from adding anything to any instance. It doesn't even require you to make different instances of the same class contain the same attributes. And it certainly doesn't pre-allocate a single block of memory to contain all the object's attributes. (It would only be able to contain references, anyway, given that Python is a pure reference-semantics language, with no C# style value types or Java style primitives.)
But obviously, it's a good idea to do things that way, so the usual convention is "add all the data at the time that the instance is constructed, and then don't add any more (or delete any) attributes".
"When it's constructed"? Python doesn't really have constructors in the C++/Java/C# sense, because this absence of "reserved space" means there's no real benefit to considering "initialization" as a separate task from ordinary assignment - except of course the benefit of initialization being something that automatically happens to a new object.
So, in Python, our closest equivalent is the magic __init__ method that is automatically called upon newly-created instances of the class. (There is another magic method called __new__, which behaves more like a constructor, in the sense that it's responsible for the actual creation of the object. However, in nearly every case we just want to delegate to the base object __new__, which calls some built-in logic to basically give us a little pointer-ball that can serve as an object, and point it to a class definition. So there's no real point in worrying about __new__ in almost every case. It's really more analogous to overloading the operator new for a class in C++.) In the body of this method (there are no C++-style initialization lists, because there is no pre-reserved data to initialize), we set initial values for attributes (and possibly do other work), based on the parameters we're given.
Now, if we want to be a little bit neater about things, or efficiency is a real concern, there is another trick up our sleeves: we can use the magic __slots__ attribute of the class to specify class attribute names. This is a list of strings, nothing fancy. However, this still doesn't pre-initialize anything; an instance doesn't have an attribute until you assign it. This just prevents you from adding attributes with other names. You can even still delete attributes from an object whose class has specified __slots__. All that happens is that the instances are given a different internal structure, to optimize memory usage and attribute lookup.
The __slots__ usage requires that we derive from the built-in object type, which we should do anyway (although we aren't required in Python 2.x, this is intended only for backwards-compatibility purposes).
Ok, so now we can make the code work. But how do we make it right for Python?
First off, just as with any other language, constantly commenting to explain already-self-explanatory things is a bad idea. It distracts the user, and doesn't really help you as a learner of the language, either. You're supposed to know what a class definition looks like, and if you need a comment to tell you that a class definition is a class definition, then reading the code comments isn't the kind of help you need.
With this whole "duck typing" thing, it's poor form to include data type names in variable (or attribute) names. You're probably protesting, "but how am I supposed to keep track of the type otherwise, without the manifest type declaration"? Don't. The code that uses your list of windows doesn't care that your list of windows is a list of windows. It just cares that it can iterate over the list of windows, and thus obtain values that can be used in certain ways that are associated with windows. That's how duck typing works: stop thinking about what the object is, and worry about what it can do.
You'll notice in the code below that I put the string conversion code into the House and Window constructors themselves. This serves as a primitive form of type-checking, and also makes sure that we can't forget to do the conversion. If someone tries to create a House with an ID that can't even be converted to a string, then it will raise an exception. Easier to ask for forgiveness than permission, after all. (Note that you actually have to go out of your way a bit in Python to create
As for the actual iteration... in Python, we iterate by actually iterating over the objects in a container. Java and C# have this concept as well, and you can get at it with the C++ standard library too (although a lot of people don't bother). We don't iterate over indices, because it's a useless and distracting indirection. We don't need to number our "windows_per_house" values in order to use them; we just need to look at each value in turn.
How about the ID numbers, I hear you ask? Simple. Python provides us with a function called 'enumerate', which gives us (index, element) pairs given an input sequence of elements). It's clean, it lets us be explicit about our need for the index to solve the problem (and the purpose of the index), and it's a built-in that doesn't need to be interpreted like the rest of the Python code, so it doesn't incur all that much overhead. (When memory is a concern, it's possible to use a lazy-evaluation version instead.)
But even then, iterating to create each house, and then manually appending each one to an initially-empty list, is too low-level. Python knows how to construct a list of values; we don't need to tell it how. (And as a bonus, we typically get better performance by letting it do that part itself, since the actual looping logic can now be done internally, in native C.) We instead describe what we want in the list, with a list comprehension. We don't have to walk through the steps of "take each window-count in turn, make the corresponding house, and add it to the list", because we can say "a list of houses with the corresponding window-count for each window-count in this input list" directly. That's arguably clunkier in English, but much cleaner in a programming language like Python, because you can skip a bunch of the little words, and you don't have to expend effort to describe the initial list, or the act of appending the finished houses to the list. You don't describe the process at all, just the result. Made-to-order.
Finally, as a general programming concept, it makes sense, whenever possible, to delay the construction of an object until we have everything ready that's needed for that object's existence. "Two-phase construction" is ugly. So we make the windows for a house first, and then the house (using those windows). With list comprehensions, this is simple: we just nest the list comprehensions.
class House(object):
__slots__ = ['ID', 'windows']
def __init__(self, id, windows):
self.ID = str(id)
self.windows = windows
class Window(object):
__slots__ = ['ID']
def __init__(self, id):
self.ID = str(id)
windows_per_house = [1, 3, 2, 1]
# Build the houses.
houses = [
House(house_id, [Window(window_id) for window_id in range(window_count)])
for house_id, window_count in enumerate(windows_per_house)
]
# See how elegant the list comprehensions are?
# If you didn't quite follow the logic there, please try **not**
# to imagine the implicitly-defined process as you trace through it.
# (Pink elephants, I know, I know.) Just understand what is described.
# And now we can iterate and print just as before.
for house in houses:
print "House: " + house.ID
for window in house.windows:
print " Window: " + window.ID
Apart from some indentation errors, you're assigning the IDs and window_lists to the class and not the instances.
You want something like
class House():
def __init__(self, ID):
self.ID = ID
self.window_list = []
etc.
Then, you can do house_list.append(House(str(newHouse))) and so on.

How to create a class from function

I am still struggling with understanding classes, I am not certain but I have an idea that this function I have created is probably a good candidate for a class. The function takes a list of dictionaries, identifies the keys and writes out a csv file.
First Q, is this function a good candidate for a class (I write out a lot of csv files
Second Q If the answer to 1 is yes, how do I do it
Third Q how do I use the instances of the class (did I say that right)
import csv
def writeCSV(dictList,outfile):
maxLine=dictList[0]
for item in dictList:
if len(item)>len(maxLine):
maxLine=item
dictList.insert(0,dict( (key,key) for key in maxLine.keys()))
csv_file=open(outfile,'ab')
writer = csv.DictWriter(csv_file,fieldnames=[key for key in maxLine.keys()],restval='notScanned',dialect='excel')
for dataLine in dictList:
writer.writerow(dataLine)
csv_file.close()
return
The main idea behind objects is that an object is data plus methods.
Whenever you are thinking about making something an object, you must ask yourself what will be the object's data, and what operations (methods) will you want to perform on that data.
Functions, more readily translate to methods than classes.
So, for instance, if your dictList is data upon which you often call writeCSV,
then perhaps make a dictList object with method writeCSV:
class DictList(object):
def __init__(self,data):
self.data=data
def writeCSV(self,outfile):
maxLine=self.data[0]
for item in self.data:
if len(item)>len(maxLine):
maxLine=item
self.data.insert(0,dict( (key,key) for key in maxLine.keys()))
csv_file=open(outfile,'ab')
writer = csv.DictWriter(
csv_file,fieldnames=[key for key in maxLine.keys()],
restval='notScanned',dialect='excel')
for dataLine in self.data:
writer.writerow(dataLine)
csv_file.close()
Then you could instantiate a DictList object:
dl=DictList([{},{},...])
dl.writeCSV(outfile)
Doing this might make sense if you have more methods that could operate on the same DictList.data. Otherwise, you'd probably be better off sticking with the original function.
For this you need to understand little bit concepts of classes first and then follow the next step.
I too faced a same problem and followed this LINK , I m sure u will also start working on classes from your structured programming.
If you want to write a lot of CSV files with the same dictList (is that what you're saying...?), turning the function into a class would let you perform initialization just once, and then write repeatedly from the same initialized instance. E.g., with other minor opts:
class CsvWriter(object):
def __init__(self, dictList):
self.maxline = max(dictList, key=len)
self.dictList = [dict((k,k) for k in self.maxline)]
self.dictList.extend(dictList)
def doWrite(self, outfile):
csv_file=open(outfile,'ab')
writer = csv.DictWriter(csv_file,
fieldnames=self.maxLine.keys(),
restval='notScanned',
dialect='excel')
for dataLine in self.dictList:
writer.writerow(dataLine)
csv_file.close()
This seems a dubious use case, but if it does match your desire, then you'd instantiate and use this class as follows...:
cw = CsvWriter(dataList)
for ou in many_outfiles:
cw.doWrite(ou)
When thinking about making objects, remember this:
Classes have attributes - things that describe different instances of the class differently
Classes have methods - things that the objects do (often involving using their attributes)
Objects and classes are wonderful, but the first thing to keep in mind is that they are not always necessary, or even desirable.
That said, in answer to your first question, this doesn't seem like a particularly good candidate for a class. The only thing different between the different CVS files you're writing are the data and the file you write to, and the only thing you do with them (ie, the only method you would have) is the function you've already written).
Even though the first answer is no, it's still instructive to see how a class is built.
class CSVWriter:
# this function is called when you create an instance of the class
# it sets up the initial attributes of the instance
def __init__(self, dictList, outFile):
self.dictList = dictList
self.outFile = outFile
def writeCSV(self):
# basically exactly what you have above, except you can use the instance's
# own variables (ie, self.dictList and self.outFile) instead of the local
# variables
For your final question - the first step to using an instance of a class (an individual object, if you will) is to create that instance:
myCSV = CSVWriter(dictList, outFile)
When the object is created, init is called with the arguments you gave it - that allows your object to have its own data. Now you can access any of the attributes or methods that your myCSV object has with the '.' operator:
myCSV.writeCSV()
print "Wrote a file to", myCSV.outFile
One way to think about objects versus functions is that objects are generally nouns (eg, I created a CSVWriter), while functions are verbs (eg, you wrote a the function that writes CSV files). If you're just doing something over and over again, without re-using any of the same data, a function by itself is fine. But, if you have lots of related data, and part of it gets changed in the course of the action, classes may be a good idea.
I don't think your writeCSV is in need of a class, typicaly class would be used when you have to update some state(data) and then act on it, may be with various options.
e.g. if you need to pass around your object, so that other function/method can add values to it or your final action/output function has many options or you think same data can be processed, acted upon in many ways.
Typically practical case would be if you have multiple functions which act on same data or a singe function whose optional parameter list is going to long, you may think of converting it into a class.
If in your case you had various options and need to insert data in increments, you should make it a class.
Usually class name would be noun, so function(verb) writeCSV -> class(noun) CSVWriter
class CSVWriter(object):
def __init__(self, init-params...):
self.data = {}
def addData(self, data):
self.data.update(data)
def dumpCSV(self, filePath):
...
def dumpJSON(self, filePath):
....
I think question 1 is pretty crucial as it goes to the heart of what a class is.
Yes, you can put this function in a class. A class is a set of functions (called methods) and data together in one logical unit. As other posters noted, probably overkill to have a class with one method.

Categories