I am searching for the pythonic way to do the following:
I have a list of keys and a list of objects.
For any key, something should be done with the first object that fits to that key.
If no object fits to no key, so nothing has be done at all, something different should be done instead.
I implemented this as follows and it is working properly:
didSomething = False
for key in keys:
for obj in objects:
if <obj fits to key>:
doSomething(obj, key)
didSomething = True
break
if not didSomething:
doSomethingDifferent()
But normally, if there is only one for-loop, you don't need such a temporary boolean to check whether something has been done or not. You can use a for-else statement instead. But this does not work with 2 for-loops, did it?
I have the feeling that there should be some better way to do this but i don't get it. Do you have any ideas or is there no improvement?
Thank you :)
This doesn't really fit into the for/else paradigm, because you don't want to break the outer loop. So just use a variable to track whether something was done, as in your original code.
Instead of the second loop, use a single expression that finds the first matching object. See Python: Find in list for ways to do this.
didSomething = false
for key in keys:
found = next((obj for obj in objects if <obj fits to key>), None)
if found:
doSomething(found, key)
didSomething = true
if not didSomething:
doSomethingDifferent()
Whenever you find yourself needing to break out of a nested loop, it’s usually hard to think through the details, and when you finish figuring it out, the answer is usually just that it’s impossible (or at least only possible with an explicit flag variable or an exception or something else that obscures your logic).
There's an easy answer to that (which I'll include below in case anyone finding this question by search has that problem), but that's not actually your problem. What you want to check is not "did I complete the loop normally", because you always complete the loop normally. What you want to check is "did I do something (in this case, call doSomething) one or more times".
That isn't really about the outer loop, unlike breaking out of the outer loop (which obviously is), so there's no syntax for it. You need to keep track of whether you did something one or more times, and the way you're already doing that is probably the simplest way.
In some cases, you can rearrange things to flatten or invert the loop, so you end up doing one thing with all of the currently-outer values one time and breaking out of that loop, in which case it is about looping again. But if that twists your logic up so much that it's no longer clear what's going on, that's not going to be an improvement. For example:
fits = set()
for key in keys:
for obj in objects:
if <obj fits to key>:
fits.add((obj, key))
for obj, key in fits:
do_something(obj, key)
if not fits:
do_something_else()
This can be simplified:
fits = {(obj, key) for key in keys for obj in objects if <obj fits to key>}
for obj, key in fits:
do_something(obj, key)
if not fits:
do_something_else()
But, either way, notice that the way I avoided storing a flag saying whether you ever found a fit was by storing a set of all of the fits you found. For some problems, that's an improvement. But if that set could be very large, it's a terrible idea. And if that set just conceptually doesn't mean anything in your problem, it might obscure the logic instead of simplifying it.
If your problem were breaking out of a nested loop (which it isn't, but, again, it might be for someone else who finds this question by search), there’s always an easy answer to that: just take the whole nest of loops and refactor it into a function. Then you can break out at any level by just using return. If you didn’t return anywhere, the code after the loops will get run, while if you did return, it will—just like an else.
So:
def fits():
for key in keys:
for obj in objects:
if <obj fits to key>:
doSomething(obj, key)
return
doSomethingDifferent()
fits()
I’m not sure whether breaking out if both loops is what you want. If it is, this does exactly what you want. If not, it doesn’t, but then I’m not sure what semantics you were looking for with the else–when it should get run—so I don’t know how to explain how to do that.
Once you’ve done this, you may find the abstraction generalizes to more than use in your code, so you can turn the function into something that takes parameters instead of using closure or global variables, and that returns a value or raises instead of calling one of two functions, and so on. But sometimes, this trivial local function is all you need.
There's no real way to simplify your code. It is, however, kind of confusing the way it's written. I would actually make it more verbose to make sure it's read properly:
def fit_objects_to_keys(objects, keys):
for key in keys:
for obj in objects:
if <obj fits to key>:
yield obj, key
break
none_fit = True
for obj, key in fit_objects_to_keys(keys, objects):
doSomething(obj, key)
none_fit = False
if none_fit:
doSomethingDifferent()
You may be able to simplify it further if you explain what <obj fits to key> actually does.
I agree with the comment that your code is fine as it is - but if you must flatten multiple for-loops into one (so that you can use the 'else' feature, for example, or the number of for-loops is itself variable), this is actually possible:
import itertools
for key, obj in itertools.product(keys, objects):
if <obj fits to key>:
doSomething(obj, key)
break
else:
doSomethingDifferent()
Related
I am new to Python.
Assume I have a dictionary which holds power supply admin state.
(OK = Turned on. FAIL = Turned off).
There are several way to write the "get" function:
1st way
is_power_supply_off(dictionary)
gets the admin state from dictionary.
returns true if turned off.
returns false if turned on.
is_power_supply_on(dictionary)
gets the admin state from dictionary.
returns true if turned on.
returns false if turned off.
2nd way
is_power_supply_on_or_off(dictionary, on_or_off)
gets the admin state from dictionary.
returns true/false based on the received argument
3rd way
get_power_supply_admin_state(dictionary)
gets the admin state from dictionary.
return value.
Then, I can ask in the function which calls the get function
if get_power_supply_admin_state() == turned_on/turned_off...
My questions are:
Which of the above is considered best practice?
If all three ways are OK, and it`s just a matter of style, please let me know.
Is 1st way considered as "code duplication"? I am asking this because I can combine the two functions to be just one (by adding an argument, as I did in the 2nd way. Still, IMO, 1st way is more readable than 2nd way.
I will appreciate if you can share your thoughts on EACH of the ways I specified.
Thanks in advance!
I would say that the best approach would be to have only a is_power_supply_on function. Then, to test if it is off, you can do not is_power_supply_on(dictionary).
This could even be a lambda (assuming state is the key of the admin state)::
is_power_supply_on = lambda mydict: mydict['state'].lower() == 'ok'
The problem with the first approach is that, as you say, it wastes codes.
The problem with the second approach is that, at best, you save two characters compared to not (if you use 0 or 1 for on_or_off), and if you use a more idiomatic approach (like on=True or on_or_off="off") you end up using more characters. Further, it results in slower and more complicated code since you need to do anif` test.
The problem with the third approach is in most cases you will also likely be wasting characters compared to just getting the dict value by key manually.
Even if this solution isn't in your propositions, I think the most pythonic way of creating getters is to use properties. As it, you'll be able to know whether the power supply is on or off, but the user will use this property as it was a simple class member:
#property
def state(self):
# Here, get whether the power supply is on or off
# and put it in value
return value
Also, you could create two class constants, PowerSupply.on = True and PowerSupply.off = False, which would make the code easier to understand
The general Pythonic style is to not repeat yourself unnecessarily, so definitely the first method seems pointless because it's actually confusing to follow (you need to notice whether it's on or off)
I'd gravitate most to
get_power_supply_admin_state(dictionary)
gets the admin state from dictionary
return value
And, if I'm reading this correctly, you could go even further.
power_supply_on(dictionary)
return the admin state from dictionary == turned on
This will evaluate to True if it's on and False otherwise, creating the simplest test because then you can run
if power_supply_on(dictionary):
It's more Pythonic to store the dictionary in a class:
class PowerSupply(object):
def __init__(self):
self.state = {'admin': 'FAIL'}
def turn_on(self):
self.state['admin'] = 'OK'
def is_on(self):
return self.state['admin'] == 'OK'
(add more methods as needed)
Then you can use it like this:
ps = PowerSupply()
if not ps.is_on():
# send an alert!
result = is_power_supply_off(state)
result = is_power_supply_on(state)
result = not is_power_supply_on(state) # alternatively, two functions are certainly not needed
I strongly prefer this kind of naming for sake of readability. Let's just consider alternatives, not in function definition but where function is used.
result = is_power_supply_on_or_off(state, True)
pass
result = is_power_supply_on_or_off(state, False)
pass
if get_power_supply_admin_state(state):
pass
if not get_power_supply_admin_state(state):
pass
All of these codes requires map of what True and False means in this context. And to be honest, is not that clear. In many embedded systems 0 means truthy value. What if this function analyses output from system command? 0 (falsy) value is indicator of correct state/execution. In a result, intuitive True means OK is not always valid. Therefore I strongly advice for first option - precisely named function.
Obviously, you'll have some kind of private function like _get_power_supply_state_value(). Both function will call it and manipulate it's output. But point is - it will be hidden inside a module which knows what means what considering power supply state. Is implementation detail and API users does not need to know it.
Often times I find that, when working with Pythonic sets, the Pythonic way seems to be absent.
For example, doing something like a dijkstra or a*:
openSet, closedSet = set(nodes), set(nodes)
while openSet:
walkSet, openSet = openSet, set()
for node in walkSet:
for dest in node.destinations():
if dest.weight() < constraint:
if dest not in closedSet:
closedSet.add(dest)
openSet.add(dest)
This is a weakly contrived example, the focus is the last three lines:
if not value in someSet:
someSet.add(value)
doAdditionalThings()
Given the Python way of working with, for example, accessing/using values of a dict, I would have expected to be able to do:
try:
someSet.add(value)
except KeyError:
continue # well, that's ok then.
doAdditionalThings()
As a C++ programmer, my skin crawls a bit that I can't even do:
if someSet.add(value):
# add wasn't blocked by the value already being present
doAdditionalThings()
Is there a more Pythonic (and if possible more efficient) way to work with this sort of set-as-guard usage?
The add operation is not supposed to also tell you if the item was already in the set; it just makes sure it is in there after you add it. Or put another way, what you want is not "add an item and check if it worked"; you want to first check if the item is there, and if not, then do some special stuff. If all you wanted to do was add the item, you wouldn't do the check at all. There is nothing unpythonic about this pattern:
if item not in someSet:
someSet.add(item)
doStuff()
else:
doOtherStuff()
It is true that the API could have been designed so that .add returned whether the item was already in there, but in my experience that's not a particularly common use case. Part of the point of sets is that you can freely add items without worrying about whether they were already in there (since adding an already-included item has no effect). Also, having .add return None is consistent with the general convention for Python builtin types that methods that mutate their arguments return None. It is really things like dict.setdefault (which gets an item but first adds it if isn't there) that are the unusual case.
I am implementing a something that uses a dictionary to store data. Additionally to the normal data, it also stores some internal data, all prefixed with _. However, I want to isolate the user of the library from this data since he is normally not concerned with it. Additionally, I need to set a modified flag in my class to track if the data was modified.
For all interface functions this worked nicely, here are two examples, one with and one without modification. Note that in this case, I do not hide internal data, because it is intentionally demanded as a key:
def __getitem__(self, key):
return self._data[key]
def __setitem__(self, key, value):
self.modified = True
self._data[key] = value
On some functions, e.g. __iter__, I filter out everything that starts with _ before I yield the data.
But a single function makes real problems here: popitem. In its normal behaviour it would just withdraw an arbitrary item and return it while deleting it from the dict. However, here comes the problem: Without deep internal knowledge, I don't know which item will be returned beforehand. But I know that popitem follows the same rules as items and keys. So I did come up with an implementation:
keys = self._data.keys()
for k in keys:
if k.startswith("_"):
continue
v = self._data.pop(k)
self.modified = True
return k, v
else:
raise KeyError('popitem(): dictionary is empty')
This implementation works. But it feels to unpythonic and not at all dynamic or clean. It did also struggle with the idea to raise the exception like this: {}.popitem() which looks totally insane but would give me at least a dynamic way (e.g. if the exception message or type ever changes, I don't have to adjust).
What I am now after is a cleaner and less crazy way to solve this problem. There would be a way of removing the internal data from the dict, but I'd only take this road as a last resort. So do you have any recipes or ideas for this?
Give your objects two dict attributes: self._data and self._internal_data. Then forward all the dict methods to self._data, and you won't have to filter out anything.
edit: Okay, I missed the "last resort" bit at the end. But I suspect that managing two dicts will be far easier than "fixing" every single dict method and operator. :)
Subclass dict rather than wrapping a dictionary. You'll need to implement a lot less stuff.
Store your "internal data" as attributes on the object, not in the dictionary. This way they are easy to get to if you need them, but won't appear in ordinary iteration. If at some point you need to combine them, do that with x = dict(self); x.update(self.__dict__) to create a new dictionary having both sets of values.
If you do want to store your internal data as a dictionary, embed that one. Implement __missing__ on your main object so you can grab items from the internal dictionary if they're not found in the main one.
Well, the logic's correct, you could reduce it to something like:
self._data.pop(next((key for key in self._data if not key.startswith('_')), 'popitem(): dictionary is empty'))
So, find the next key in self._data that doesn't start with _, otherwise default it to a key that isn't going to match any of the other keys in the dictionary so that when the pop fails, you automatically get the KeyError thrown (with your "error message")
Suppose I have a function like the following:
bigrams=[(k,v) for (k,v) in dict_bigrams.items()
if k[:pos_qu]==selection[:pos_qu]
and (k[pos_qu+1:]==selection[pos_qu+1:] if pos_qu!=1)
and k[pos_qu] not in alphabet.values()]
I want to make the second condition, namely k[pos_qu+1:]==selection[pos_qu+1:] dependent from another if statement, if pos_qu!=1. I tried (as shown above) by including the two together into parentheses but python flags a syntax error at the parentheses
If I understand your requirement correctly, you only want to check k[pos_qu+1:]==selection[pos_qu+1:] if the condition pos_qu!=1 is also met. You can rephrase that as the following condition:
pos_qu==1 or k[pos_qu+1:]==selection[pos_qu+1:]
Putting this into your comprehension:
bigrams=[(k,v) for (k,v) in dict_bigrams.items()
if k[:pos_qu]==selection[:pos_qu]
and (pos_qu==1 or k[pos_qu+1:]==selection[pos_qu+1:])
and k[pos_qu] not in alphabet.values()]
Whenever you find yourself with a complex list comprehension, trying to figure out how to do something complicated and not knowing how, the answer is usually to break things up. Expression syntax is inherently more limited than full statement (or multi-statement suite) syntax in Python, to prevent you from writing things that you won't be able to read later. Usually, that's a good thing—and, even when it isn't, you're better off going along with it than trying to fight it.
In this case, you've got a trivial comprehension, except for the if clause, which you don't know how to write as an expression. So, I'd turn the condition into a separate function:
def isMyKindOfKey(k):
… condition here
[(k,v) for (k,v) in dict_bigrams.items() if isMyKindOfKey(k)]
This lets you use full multi-statement syntax for the condition. It also lets you give the condition a name (hopefully something better than isMyKindOfKey); makes the parameters, local values captured by the closure, etc. more explicit; lets you test the function separately or reuse it; etc.
In cases where the loop itself is the non-trivial part (or there's just lots of nesting), it usually makes more sense to break up the entire comprehension into an explicit for loop and append, but I don't think that's necessary here.
It's worth noting that in this case—as in general—this doesn't magically solve your problem, it just gives you more flexibility in doing so. For example, you can use the same transformation from postfix if to infix or that F.J suggests, but you can also leave it as an if, e.g., like this:
def isMyKindOfKey(k):
retval = k[:pos_qu]==selection[:pos_qu]
if pos_qu!=1:
retval = retval and (k[pos_qu+1:]==selection[pos_qu+1:])
retval = retval and (k[pos_qu] not in alphabet.values())
return retval
That probably isn't actually the way I'd write this, but you can see how this is a trivial way to transform what's in your head into code, which would be very hard to do in an expression.
just change the order
bigrams=[(k,v) for (k,v) in dict_bigrams.items()
if k[:pos_qu]==selection[:pos_qu] #evaluated first
and pos_qu!=1 #if true continue and evaluate this next
and (k[pos_qu+1:]==selection[pos_qu+1:]) #if pos_qu != 1 lastly eval this
as the comment mentions this is not a very pythonic list comprehension and would be much more readable as a standard for loop..
for i in vr_world.getNodeNames():
if i != "_error_":
World[i] = vr_world.getChild(i)
vr_world.getNodeNames() returns me a gigantic list, vr_world.getChild(i) returns a specific type of object.
This is taking a long time to run, is there anyway to make it more efficient? I have seen one-liners for loops before that are supposed to be faster. Ideas?
kaloyan suggests using a generator. Here's why that may help.
If getNodeNames() builds a list, then your loop is basically going over the list twice: once to build it, and once when you iterate over the list.
If getNodeNames() is a generator, then your loop doesn't ever build the list; instead of creating the item and adding it to the list, it creates the item and yields it to the caller.
Whether or not this helps is contingent on a couple of things. First, it has to be possible to implement getNodeNames() as a generator. We don't know anything about the implementation details of that function, so it's not possible to say if that's the case. Next, the number of items you're iterating over needs to be pretty big.
Of course, none of this will have any effect at all if it turns out that the time-consuming operation in all of this is vr_world.getChild(). That's why you need to profile your code.
I don't think you can make it faster than what you have there. Yes, you can put the whole thing on one line but that will not make it any faster. The bottleneck obviously is getNodeNames(). If you can make it a generator, you will start populating the World dict with results sooner (if that matters to you) and if you make it filter out the "_error_" values, you will not have the deal with that at a later stage.
World = dict((i, vr_world.getChild(i)) for i in vr_world.getNodeNames() if i != "_error_")
This is a one-liner, but not necessarily much faster than your solution...
Maybe you can use a filter and a map, however I don't know if this would be any faster:
valid = filter(lambda i: i != "_error_", vr_world.getNodeNames())
World = map(lambda i: vr_world.getChild(i), valid)
Also, as you'll see a lot around here, profile first, and then optimize, otherwise you may be wasting time. You have two functions there, maybe they are the slow parts, not the iteration.