Related
While writing state machines to analyze different types of text data, independent of language used (VBA to process .xls contents using arrays/dictionaries or PHP/Python to make SQL insert queries out of .csv's) I often ran into neccesity of something like
boolean = False
while %sample statement%:
x = 'many different things'
if boolean == False:
boolean = True
else:
%action that DOES depend on contents of x
that need to do every BUT first time I get to it%
Every time I have to use a construction like this, I can't help feeling noob. Dear algorithmic gurus, can you assure me that it's the only way out and there is nothing more elegant? Any way to specify that some statement should be "burnt after reading"? So that some stupid boolean is not going to be checked each iteration of the loop
The only things that come across as slightly "noob" about this style are:
Comparing a boolean variable to True or False. Just write if <var> or if not <var>. (I'll ignore the = vs == as a typo!)
Not giving the boolean variable a good name. I know that here boolean is just a placeholder name, but in general using a name like first_item_seen rather than something generic can make the code a lot more readable:
first_item_seen = False
while [...]:
[...]
if first_item_seen:
[...]
else:
first_item_seen = True
Another suggestion that can work in some circumstances is to base the decision on another variable that naturally conveys the same state. For instance, it's relatively common to have a variable that contains None for the first iteration, but contains a value for later iterations (e.g. the result so far); using this can make the code slightly more efficient and often slightly clearer.
If I understand your problem correctly, I'd try something like
x = 'many different things'
while %sample statements%:
x = 'many different things'
action_that_depends_on_x()
It is almost equivalent; the only difference is that in your version the loop body could be never executed (hence x never being computed, hence no side effects of computing x), in my version it is always computed at least once.
Suppose I have a function like the following:
bigrams=[(k,v) for (k,v) in dict_bigrams.items()
if k[:pos_qu]==selection[:pos_qu]
and (k[pos_qu+1:]==selection[pos_qu+1:] if pos_qu!=1)
and k[pos_qu] not in alphabet.values()]
I want to make the second condition, namely k[pos_qu+1:]==selection[pos_qu+1:] dependent from another if statement, if pos_qu!=1. I tried (as shown above) by including the two together into parentheses but python flags a syntax error at the parentheses
If I understand your requirement correctly, you only want to check k[pos_qu+1:]==selection[pos_qu+1:] if the condition pos_qu!=1 is also met. You can rephrase that as the following condition:
pos_qu==1 or k[pos_qu+1:]==selection[pos_qu+1:]
Putting this into your comprehension:
bigrams=[(k,v) for (k,v) in dict_bigrams.items()
if k[:pos_qu]==selection[:pos_qu]
and (pos_qu==1 or k[pos_qu+1:]==selection[pos_qu+1:])
and k[pos_qu] not in alphabet.values()]
Whenever you find yourself with a complex list comprehension, trying to figure out how to do something complicated and not knowing how, the answer is usually to break things up. Expression syntax is inherently more limited than full statement (or multi-statement suite) syntax in Python, to prevent you from writing things that you won't be able to read later. Usually, that's a good thing—and, even when it isn't, you're better off going along with it than trying to fight it.
In this case, you've got a trivial comprehension, except for the if clause, which you don't know how to write as an expression. So, I'd turn the condition into a separate function:
def isMyKindOfKey(k):
… condition here
[(k,v) for (k,v) in dict_bigrams.items() if isMyKindOfKey(k)]
This lets you use full multi-statement syntax for the condition. It also lets you give the condition a name (hopefully something better than isMyKindOfKey); makes the parameters, local values captured by the closure, etc. more explicit; lets you test the function separately or reuse it; etc.
In cases where the loop itself is the non-trivial part (or there's just lots of nesting), it usually makes more sense to break up the entire comprehension into an explicit for loop and append, but I don't think that's necessary here.
It's worth noting that in this case—as in general—this doesn't magically solve your problem, it just gives you more flexibility in doing so. For example, you can use the same transformation from postfix if to infix or that F.J suggests, but you can also leave it as an if, e.g., like this:
def isMyKindOfKey(k):
retval = k[:pos_qu]==selection[:pos_qu]
if pos_qu!=1:
retval = retval and (k[pos_qu+1:]==selection[pos_qu+1:])
retval = retval and (k[pos_qu] not in alphabet.values())
return retval
That probably isn't actually the way I'd write this, but you can see how this is a trivial way to transform what's in your head into code, which would be very hard to do in an expression.
just change the order
bigrams=[(k,v) for (k,v) in dict_bigrams.items()
if k[:pos_qu]==selection[:pos_qu] #evaluated first
and pos_qu!=1 #if true continue and evaluate this next
and (k[pos_qu+1:]==selection[pos_qu+1:]) #if pos_qu != 1 lastly eval this
as the comment mentions this is not a very pythonic list comprehension and would be much more readable as a standard for loop..
I need to make a league table for a project. There has to be 3 files,2 files consist of 1 class and the last file is for running a program. I have done all of the parts but when I call a method to add a team, the program adds the name but it does not insert it into the list of teams(which should do). When I try to display the items in the list, the program displays an error message instead of showing the actual team.
How can I fix it?Any help would be appreciated. :)
A few things here:
When I try to display the items in the list, the program displays: team.Team object at 0x000000000332A978 insted of showing the actual team.
The default display for a user class is something like <team.Team object at 0x000000000332A978>. If you want it to display something different, you have to tell Python what you want to display. There are two separate functions for this: __repr__ and __str__. The idea is that the first is a representation for the programmer, the second for the user. If you don't need two different representations, just define __repr__ and it'll use that whenever it needs __str__.
So, a really simple way to fix this is to add this to the Team class:
def __repr__(self):
return 'Team("{}")'.format(self._name)
Now, if you call league.addTeam('Dodgers'), then print(l._table), you'll get [Team("Dodgers")] instead of [<team.Team object at 0x000000000332A978>].
Meanwhile, these two methods are probably not what you want:
def removeTeam(self,team):
self._table.remove(team)
def returnPosition(self,team):
return self._table.index(team)
These will remove or find a team given the Team object—not the name, or even a new Team created from the name, but a reference to the exact same object stored in the _table. This is not all that useful, and you seem to want to call them with just names.
There are two ways to fix this: You could change Team so that it compares by name instead of by object identity, by adding this method to the class:
def __eq__(self, other):
return self._name == other._name
What this means is that if you say Team('Giants') == Team('Giants'), it will now be true instead of False. Even if the first team is in a different league, and has a different W-L record, and so on (e.g., like the baseball "Giants" from San Francisco vs. the football "Giants" from New York), as far as Python is concerned, they're now the same team. Of course if that's not what you want, you can write any other __eq__ function that seems more appropriate.
Anyway, if you do this, the index and remove functions will now be able to find any Team with the same name, instead of just the exact same team, so:
def removeTeam(self,team_name):
self._table.remove(Team(team_name))
def returnPosition(self,team_name):
return self._table.index(Team(team_name))
If you go this way, you might want to consider defining all of the comparison methods, so you can, e.g., sort a list of teams, and they sort by name.
Or you could change these methods so they don't work based on equality, e.g., by redefining them like this:
def removeTeam(self,team_name):
self._table = [team for team in self._table if team._name != team_name]
def returnPosition(self,team_name):
return [team._name for team in self._table].index(team_name)
To understand how these work, if you're not used to reading list comprehensions, turn each one back into the equivalent loop:
self._table = [team for team in self._table if team._name != team_name]
temp = []
for team in self._table:
if team._name != team_name:
temp.append(team)
self._table = temp
If you step through this, temp ends up with a list of every team in the table, except the one you wanted to remove, and then you replace the old self._table with the new filtered one. (Another way to write the same idea is with filter, if you know that function.)
It's usually better to create a new filtered list than to modify a list in-place. Sometimes there are performance reasons not do this, and sometimes it ends up being very complex and hard to understand, but it's usually both faster and simpler to reason about. Also, modifying lists in place leads to problems like this:
for i, value in enumerate(mylist):
if value == value_to_remove:
del mylist[i]
Play with this for a while, and you'll see that it doesn't actually work. Understanding why is a bit complicated, and you probably don't want to learn that until later. The usual trick to solve the problem is to iterate over a copy of the list… but once you're doing that, you've now got the worst of filtering and the worst of deleting-in-place at the same time.
The second function may be a little too clever, but let's look at it:
def returnPosition(self,team_name):
return [team._name for team in self._table].index(team_name)
First, I'm creating a list like the original one, but it's a list of just the names instead of the team objects. Again, let's decompose the list comprehension:
temp = []
for team in self._table:
temp.append(team._name)
Or try to translate it into English: This is a list of the team name of every team in the table.
Now, because this is a list of team names, I can use index(team_name) and it will find it. And, because the two lists have the same shape, I know that this is the right index to use in the original team list as well.
A much simpler solution would be to change _tables from a list of Teams into a dict mapping names to Teams. This is probably the most Pythonic solution—it looks a lot simpler than writing list comprehensions to do simple operations. (It's also probably the most efficient, but that's hardly relevant unless you have some truly gigantic leagues.) And then you don't even need returnPosition for anything. To do that:
def __init__(self):
self._table={}
def addTeam(self,name):
self._table[name]=Team(name)
def removeTeam(self,team_name):
del self._table[team_name]
def returnPosition(self,team_name):
return team_name
def updateLeague(self,team1_name1,team_name2,score1,score2):
if score1>score2:
self._table[team_name1].win()
self._table[team_name2].loss()
elif score1==score2:
self._table[team_name1].draw()
self._table[team_name2].draw()
elif score1<score2:
self._table[team_name1].loss()
self._table[team_name2].win()
Note that I've defined returnPosition to just return the team name itself as the position. If you think about it, dict keys are used exactly the same way as list indices, so this means any code someone wrote for the "old" API that required returnPosition will still work with the "new" API. (I probably wouldn't try to sell this to a teacher who assigned a problem that required us to use returnPosition, but for a real-life library where I wanted to make it easier for my 1.3 users to migrate to 2.0, I probably would.)
This only requires a few other changes. In displayList and saveList, you iterate over self._table.values() rather than self._table; in loadList, you change self._table.append(team) to self._table[a] = team. Speaking of loadList: You might want to consider renaming those local variables from a, b, c, and d to name, wins, losses, and draws.
A few other comments:
As kreativitea says in the comments, you should not create "private" variables and then add do-nothing accessor methods in Python. It's just more boilerplate that hides the real code, and one more thing you can get wrong with a silly typo that you'll spend hours debugging one day. Just have members named name, wins, losses, etc., and access them directly. (If someone told you that this is bad style because it doesn't let you replace the implementation in the future without changing the interface, that's only true in Java and C++, not in Python. If you ever need to replace the implementation, just read up on #property.)
You don't need print("""""")—and it's very easy to accidentally miscount the number of " characters. (Especially since some IDEs will actually be confused by this and think the multi-line string never ends.) Just do print().
You've got the same ending condition both in the while loop (while x!="q":) and in an internal break. You don't need it in both places. Either change it to while True:, or get rid of the break (just make options("q") do print("Goodbye"), so you don't need to special-case it at all inside the loop).
Whenever you have a long chain of elif statements, think about whether you can turn it into a dict of short functions. I'm not sure it's a good idea in this case, but it's always worth thinking about and making the explicit decision.
The last idea would look something like this:
def addTeam():
name=input("Enter the name of the team:")
l.addTeam(name)
def removeTeam():
teamToRemove=input("Enter the name of the team you want to remove:")
l.removeTeam(teamToRemove)
def recordGame():
team1=input("What is the name of the team?")
ans1=int(input("Enter the number of goals for the first team:"))
team2=input("What is the name of the team?")
ans2=int(input("Enter the number of goals for the second time:"))
l.updateLeague(team1,team2,ans1,ans2)
optionsdict = {
"a": addTeam,
"d": l.displayList,
"s": l.saveList,
"l": l.loadList,
"r": removeTeam,
"rec": recordGame,
}
def options(x):
func = optionsdict.get(x)
if func:
func()
As I said, I'm not sure it's actually clearer in this case, but it's worth considering.
I prefer to use long identifiers to keep my code semantically clear, but in the case of repeated references to the same identifier, I'd like for it to "get out of the way" in the current scope. Take this example in Python:
def define_many_mappings_1(self):
self.define_bidirectional_parameter_mapping("status", "current_status")
self.define_bidirectional_parameter_mapping("id", "unique_id")
self.define_bidirectional_parameter_mapping("location", "coordinates")
#etc...
Let's assume that I really want to stick with this long method name, and that these arguments are always going to be hard-coded.
Implementation 1 feels wrong because most of each line is taken up with a repetition of characters. The lines are also rather long in general, and will exceed 80 characters easily when nested inside of a class definition and/or a try/except block, resulting in ugly line wrapping. Let's try using a for loop:
def define_many_mappings_2(self):
mappings = [("status", "current_status"),
("id", "unique_id"),
("location", "coordinates")]
for mapping in mappings:
self.define_parameter_mapping(*mapping)
I'm going to lump together all similar iterative techniques under the umbrella of Implementation 2, which has the improvement of separating the "unique" arguments from the "repeated" method name. However, I dislike that this has the effect of placing the arguments before the method they're being passed into, which is confusing. I would prefer to retain the "verb followed by direct object" syntax.
I've found myself using the following as a compromise:
def define_many_mappings_3(self):
d = self.define_bidirectional_parameter_mapping
d("status", "current_status")
d("id", "unique_id")
d("location", "coordinates")
In Implementation 3, the long method is aliased by an extremely short "abbreviation" variable. I like this approach because it is immediately recognizable as a set of repeated method calls on first glance while having less redundant characters and much shorter lines. The drawback is the usage of an extremely short and semantically unclear identifier "d".
What is the most readable solution? Is the usage of an "abbreviation variable" acceptable if it is explicitly assigned from an unabbreviated version in the local scope?
itertools to the rescue again! Try using starmap - here's a simple demo:
list(itertools.starmap(min,[(1,2),(2,2),(3,2)]))
prints
[1,2,2]
starmap is a generator, so to actually invoke the methods, you have to consume the generator with a list.
import itertools
def define_many_mappings_4(self):
list(itertools.starmap(
self.define_parameter_mapping,
[
("status", "current_status"),
("id", "unique_id"),
("location", "coordinates"),
] ))
Normally I'm not a fan of using a dummy list construction to invoke a sequence of functions, but this arrangement seems to address most of your concerns.
If define_parameter_mapping returns None, then you can replace list with any, and then all of the function calls will get made, and you won't have to construct that dummy list.
I would go with Implementation 2, but it is a close call.
I think #2 and #3 are equally readable. Imagine if you had 100s of mappings... Either way, I cannot tell what the code at the bottom is doing without scrolling to the top. In #2 you are giving a name to the data; in #3, you are giving a name to the function. It's basically a wash.
Changing the data is also a wash, since either way you just add one line in the same pattern as what is already there.
The difference comes if you want to change what you are doing to the data. For example, say you decide to add a debug message for each mapping you define. With #2, you add a statement to the loop, and it is still easy to read. With #3, you have to create a lambda or something. Nothing wrong with lambdas -- I love Lisp as much as anybody -- but I think I would still find #2 easier to read and modify.
But it is a close call, and your taste might be different.
I think #3 is not bad although I might pick a slightly longer identifier than d, but often this type of thing becomes data driven, so then you would find yourself using a variation of #2 where you are looping over the result of a database query or something from a config file
There's no right answer, so you'll get opinions on all sides here, but I would by far prefer to see #2 in any code I was responsible for maintaining.
#1 is verbose, repetitive, and difficult to change (e.g. say you need to call two methods on each pair or add logging -- then you must change every line). But this is often how code evolves, and it is a fairly familiar and harmless pattern.
#3 suffers the same problem as #1, but is slightly more concise at the cost of requiring what is basically a macro and thus new and slightly unfamiliar terms.
#2 is simple and clear. It lays out your mappings in data form, and then iterates them using basic language constructs. To add new mappings, you only need add a line to the array. You might end up loading your mappings from an external file or URL down the line, and that would be an easy change. To change what is done with them, you only need change the body of your for loop (which itself could be made into a separate function if the need arose).
Your complaint of #2 of "object before verb" doesn't bother me at all. In scanning that function, I would basically first assume the verb does what it's supposed to do and focus on the object, which is now clear and immediately visible and maintainable. Only if there were problems would I look at the verb, and it would be immediately evident what it is doing.
for i in vr_world.getNodeNames():
if i != "_error_":
World[i] = vr_world.getChild(i)
vr_world.getNodeNames() returns me a gigantic list, vr_world.getChild(i) returns a specific type of object.
This is taking a long time to run, is there anyway to make it more efficient? I have seen one-liners for loops before that are supposed to be faster. Ideas?
kaloyan suggests using a generator. Here's why that may help.
If getNodeNames() builds a list, then your loop is basically going over the list twice: once to build it, and once when you iterate over the list.
If getNodeNames() is a generator, then your loop doesn't ever build the list; instead of creating the item and adding it to the list, it creates the item and yields it to the caller.
Whether or not this helps is contingent on a couple of things. First, it has to be possible to implement getNodeNames() as a generator. We don't know anything about the implementation details of that function, so it's not possible to say if that's the case. Next, the number of items you're iterating over needs to be pretty big.
Of course, none of this will have any effect at all if it turns out that the time-consuming operation in all of this is vr_world.getChild(). That's why you need to profile your code.
I don't think you can make it faster than what you have there. Yes, you can put the whole thing on one line but that will not make it any faster. The bottleneck obviously is getNodeNames(). If you can make it a generator, you will start populating the World dict with results sooner (if that matters to you) and if you make it filter out the "_error_" values, you will not have the deal with that at a later stage.
World = dict((i, vr_world.getChild(i)) for i in vr_world.getNodeNames() if i != "_error_")
This is a one-liner, but not necessarily much faster than your solution...
Maybe you can use a filter and a map, however I don't know if this would be any faster:
valid = filter(lambda i: i != "_error_", vr_world.getNodeNames())
World = map(lambda i: vr_world.getChild(i), valid)
Also, as you'll see a lot around here, profile first, and then optimize, otherwise you may be wasting time. You have two functions there, maybe they are the slow parts, not the iteration.