Two different approaches to writing a function (map vs loop) - python

SO busy with some code, and have a function which basically takes dictionary where each value is a list, and returns the key with the largest list.
I wrote the following:
def max_list(dic):
if dic:
l1 = dic.values()
l1 = map(len, l1)
l2 = dic.keys()
return l2[l1.index(max(l1))]
else:
return None
Someone else wrote the following:
def max_list(dic):
result = None
maxValue = 0
for key in dic.keys():
if len(dic[key]) >= maxValue:
result = key
maxValue = len(dic[key])
return result
Which would be the 'correct' way to do this, if there is one. I hope this is not regarded as community wiki (even though the code works), trying to figure which would be the best pattern in terms of the problem.

Another valid option:
maxkey,maxvalue = max(d.items(),key=lambda x: len(x[1]))
Of the two above, I would probably prefer the explicit for loop as you don't generate all sorts of intermediate objects just to throw them away.
As a side note, This solution doesn't work particularly well for empty dicts ... (it raises a ValueError). Since I expect that is an unusual case (rather than the norm), it shouldn't hurt to enclose in a try-except ValueError block.

the most pythonic would be max(dic,key=lambda x:len(dic[x])) ... at least I would think ...
maximizing readability and minimizing lines of code is pythonic ... usually

I think the question you should ask yourself is, what do you think the most important is: code maintainability or computation speed?
As the other answers point out, this problem has a very concise solution using a map. For most people this implementation would probably be more easy to read then the implementation with a loop.
In terms of computational speed, the map solution would be less efficient, but still be in the same Computational Magnitute.
Therefore, I think it is unlikely that the map method would ever have noticeably less performance. I would suggest you to use a profiler after your program is finished, so you can be sure where the real problem lies if your program turns out to run slower than desired.

Related

Zen of Python 'Explicit is better than implicit'

I'm trying to understand what 'implicit' and 'explicit' really means in the context of Python.
a = []
# my understanding is that this is implicit
if not a:
print("list is empty")
# my understanding is that this is explicit
if len(a) == 0:
print("list is empty")
I'm trying to follow the Zen of Python rules, but I'm curious to know if this applies in this situation or if I am over-thinking it?
The two statements have very different semantics. Remember that Python is dynamically typed.
For the case where a = [], both not a and len(a) == 0 are equivalent. A valid alternative might be to check not len(a). In some cases, you may even want to check for both emptiness and listness by doing a == [].
But a can be anything. For example, a = None. The check not a is fine, and will return True. But len(a) == 0 will not be fine at all. Instead you will get TypeError: object of type 'NoneType' has no len(). This is a totally valid option, but the if statements do very different things and you have to pick which one you want.
(Almost) everything has a __bool__ method in Python, but not everything has __len__. You have to decide which one to use based on the situation. Things to consider are:
Have you already verified whether a is a sequence?
Do you need to?
Do you mind if your if statement crashed on non-sequences?
Do you want to handle other falsy objects as if they were empty lists?
Remember that making the code look pretty takes second place to getting the job done correctly.
Though this question is old, I'd like to offer a perspective.
In a dynamic language, my preference would be to always describe the expected type and objective of a variable in order to offer more purpose understanding. Then use the knowledge of the language to be succinct and increase readability where possible (in python, an empty list's boolean result is false). Thus the code:
lst_colours = []
if not lst_colours:
print("list is empty")
Even better to convey meaning is using a variable for very specific checks.
lst_colours = []
b_is_list_empty = not lst_colours
if b_is_list_empty:
print("list is empty")
Checking a list is empty would be a common thing to do several times in a code base. So even better such things in a separate file helper function library. Thus isolating common checks, and reducing code duplication.
lst_colours = []
if b_is_list_empty(lst_colours):
print("list is empty")
def b_is_list_empty (lst):
......
Most importantly, add meaning as much as possible, have an agreed company standard to chose how to tackle the simple things, like variable naming and implicit/explicit code choices.
Try to think of:
if not a:
...
as shorthand for:
if len(a) == 0:
...
I don't think this is a good example of a gotcha with Python's Zen rule of "explicit" over "implicit". This is done rather mostly because of readability. It's not that the second one is bad and the other is good. It's just that the first one is more skillful. If one understands boolean nature of lists in Python, I think you find the first is more readable and readability counts in Python.

How can I improve the runtime of Python implementation of cycle detection in Course Schedule problem?

My aim is to improve the speed of my Python code that has been successfully accepted in a leetcode problem, Course Schedule.
I am aware of the algorithm but even though I am using O(1) data-structures, my runtime is still poor: around 200ms.
My code uses dictionaries and sets:
from collections import defaultdict
class Solution:
def canFinish(self, numCourses: int, prerequisites: List[List[int]]) -> bool:
course_list = []
pre_req_mapping = defaultdict(list)
visited = set()
stack = set()
def dfs(course):
if course in stack:
return False
stack.add(course)
visited.add(course)
for neighbor in pre_req_mapping.get(course, []):
if neighbor in visited:
no_cycle = dfs(neighbor)
if not no_cycle:
return False
stack.remove(course)
return True
# for course in range(numCourses):
# course_list.append(course)
for pair in prerequisites:
pre_req_mapping[pair[1]].append(pair[0])
for course in range(numCourses):
if course in visited:
continue
no_cycle = dfs(course)
if not no_cycle:
return False
return True
What else can I do to improve the speed?
You are calling dfs() for a given course multiple times.
But its return value won't change.
So we have an opportunity to memoize it.
Change your algorithmic approach (here, to dynamic programming)
for the big win.
It's a space vs time tradeoff.
EDIT:
Hmmm, you are already memoizing most of the computation
with visited, so lru_cache would mostly improve clarity
rather than runtime.
It's just a familiar idiom for caching a result.
It would be helpful to add a # comment citing a reference
for the algorithm you implemented.
This is a very nice expression, with defaulting:
pre_req_mapping.get(course, [])
If you use timeit you may find that the generated bytecode
for an empty tuple () is a tiny bit more efficient than that
for an empty list [], as it involves fewer allocations.
Ok, some style nits follow, unrelated to runtime.
As an aside, youAreMixingCamelCase and_snake_case.
PEP-8 asks you to please stick with just snake_case.
This is a fine choice of identifier name:
for pair in prerequisites:
But instead of the cryptic [0], [1] dereferences,
it would be easier to read a tuple unpack:
for course, prereq in prerequisites:
if not no_cycle: is clumsy.
Consider inverting the meaning of dfs' return value,
or rephrasing the assignment as:
cycle = not dfs(course)
I think that you are doing it in good way, but since Python is an interpreted language, it's normal to have slow runtime compared with compiled languages like C/C++ and Java, especially for large inputs.
Try to write the same code in C/C++ for example and compare the speed between them.

Questions related to performance/efficiency in Python/Django

I have few questions which are bothering me since few days back. I'm a beginner Python/Django Programmer so I just want to clear few things before I dive into real time product development.(for Python 2.7.*)
1) saving value in a variable before using in a function
for x in some_list/tuple:
func(do_something(x))
for x in some_list/tuple:
y = do_something(x)
func(y)
Which one is faster or which one I SHOULD use.
2)Creating a new object of a model in Django
def myview(request):
u = User(username="xyz12",city="TA",name="xyz",...)
u.save()
def myview(request):
d = {'username':"xyz12",'city':"TA",'name':"xyz",...}
u = User(**d)
u.save()
3) creating dictionary
var = Dict(key1=val1,key2=val2,...)
var = {'key1':val1,'key2':val2,...}
4) I know .append() is faster than += but what if I want to append a list's elements to another
a = [1,2,3,],b=[4,5,6]
a += b
or
for i in b:
a.append(i)
This is a very interesting question, but I think you don't ask it for the good reason. The performances gained by such optimisations are negligible, especially if you're working with small number of elements.
On the other hand, what is really important is the ease of reading the code and it's clarity.
def myview(request):
d = {'username':"xyz12",'city':"TA",'name':"xyz",...}
u = User(**d)
u.save()
This code for example isn't "easy" to read and to understand at first sight. It requires to think about it before finding what is actually does. Unless you need the intermediary step, don't do it.
For the 4th point, I'd go for the first solution, way much clearer (and it avoids the function call overhead created by calling the same function in a loop). You could also use more specialised function for better performances such as reduce (see this answer : https://stackoverflow.com/a/11739570/3768672 and this thread as well : What is the fastest way to merge two lists in python?).
The 1st and 3rd points are usually up to what you prefer, as both are really similar and will probably be optimised when compiled to bytecode anyway.
If you really want to optimise more your code, I advise you to go check this out : https://wiki.python.org/moin/PythonSpeed/PerformanceTips
PS : Ultimately, you can still do your own tests. Write two functions doing the exact same things with the two different methods you want to test, measure the execution times of these methods and compare them (be careful, do the tests multiple time to reduce the uncertainties).

More efficient ways of doing this

for i in vr_world.getNodeNames():
if i != "_error_":
World[i] = vr_world.getChild(i)
vr_world.getNodeNames() returns me a gigantic list, vr_world.getChild(i) returns a specific type of object.
This is taking a long time to run, is there anyway to make it more efficient? I have seen one-liners for loops before that are supposed to be faster. Ideas?
kaloyan suggests using a generator. Here's why that may help.
If getNodeNames() builds a list, then your loop is basically going over the list twice: once to build it, and once when you iterate over the list.
If getNodeNames() is a generator, then your loop doesn't ever build the list; instead of creating the item and adding it to the list, it creates the item and yields it to the caller.
Whether or not this helps is contingent on a couple of things. First, it has to be possible to implement getNodeNames() as a generator. We don't know anything about the implementation details of that function, so it's not possible to say if that's the case. Next, the number of items you're iterating over needs to be pretty big.
Of course, none of this will have any effect at all if it turns out that the time-consuming operation in all of this is vr_world.getChild(). That's why you need to profile your code.
I don't think you can make it faster than what you have there. Yes, you can put the whole thing on one line but that will not make it any faster. The bottleneck obviously is getNodeNames(). If you can make it a generator, you will start populating the World dict with results sooner (if that matters to you) and if you make it filter out the "_error_" values, you will not have the deal with that at a later stage.
World = dict((i, vr_world.getChild(i)) for i in vr_world.getNodeNames() if i != "_error_")
This is a one-liner, but not necessarily much faster than your solution...
Maybe you can use a filter and a map, however I don't know if this would be any faster:
valid = filter(lambda i: i != "_error_", vr_world.getNodeNames())
World = map(lambda i: vr_world.getChild(i), valid)
Also, as you'll see a lot around here, profile first, and then optimize, otherwise you may be wasting time. You have two functions there, maybe they are the slow parts, not the iteration.

Python dictionary instead of switch/case

I've recently learned that python doesn't have the switch/case statement. I've been reading about using dictionaries in its stead, like this for example:
values = {
value1: do_some_stuff1,
value2: do_some_stuff2,
valueN: do_some_stuffN,
}
values.get(var, do_default_stuff)()
What I can't figure out is how to apply this to do a range test. So instead of doing some stuff if value1=4 say, doing some stuff if value1<4. So something like this (which I know doesn't work):
values = {
if value1 <val: do_some_stuff1,
if value2 >val: do_some_stuff2,
}
values.get(var, do_default_stuff)()
I've tried doing this with if/elif/else statements. It works fine but it seems to go considerably slower compared to the situation where I don't need the if statements at all (which is maybe something obvious an inevitable). So here's my code with the if/elif/else statement:
if sep_ang(val1,val2,X,Y)>=ROI :
main.removeChild(source)
elif sep_ang(val1,val2,X,Y)<=5.0:
integral=float(spectrum[0].getElementsByTagName("parameter")[0].getAttribute("free"))
index=float(spectrum[0].getElementsByTagName("parameter")[0].getAttribute("free"))
print name,val1,val2,sep_ang(val1,val2,X,Y),integral,index
print >> reg,'fk5;point(',val1,val2,')# point=cross text={',name,'}'
else:
spectrum[0].getElementsByTagName("parameter")[0].setAttribute("free","0") #Integral
spectrum[0].getElementsByTagName("parameter")[1].setAttribute("free","0") #Index
integral=float(spectrum[0].getElementsByTagName("parameter")[0].getAttribute("free"))
index=float(spectrum[0].getElementsByTagName("parameter")[0].getAttribute("free"))
print name,val1,val2,sep_ang(val1,val2,X,Y),integral,index
print >> reg,'fk5;point(',val1,val2,')# point=cross text={',name,'}'
Which takes close to 5 min for checking about 1500 values of the var sep_ang. Where as if I don't want to use setAttribute() to change values in my xml file based on the value of sep_ang, I use this simple if else:
if sep_ang(val1,val2,X,Y)>=ROI :
main.removeChild(source)
else:
print name,val1,val2,ang_sep(val1,val2,X,Y);print >> reg,'fk5;point(',val1,val2,')# point
Which only takes ~30sec. Again I know it's likely that adding that elif statement and changing values of that attribute inevitably increases the execution time of my code by a great deal, I was just curious if there's a way around it.
Edit:
Is the benefit of using bisect as opposed to an if/elif statement in my situation that it can check values over some range quicker than using a bunch of elif statements?
It seems like I'll still need to use elif statements. Like this for example:
range=[10,100]
options='abc'
def func(val)
return options[bisect(range, val)]
if func(val)=a:
do stuff
elif func(val)=b:
do other stuff
else:
do other other stuff
So then my elif statement are only checking against a single value.
Thanks much for the help, it's greatly appreciated.
A dictionary is the wrong structure for this. The bisect examples show an example of this sort of range test.
Whilst the dictionary approach works well for single values, if you want ranges, if ... else if ... else if is probably the simplest approach.
If you're looking for a single value this a good match to a dictionary - since this is what dictionaries are for - but if you're looking for a range it doesn't work. You could do it with a dict using something like:
values = {
lambda x: x < 4: foo,
lambda x: x > 4: bar
}
and then loop through all the key-value pairs in the dictionary, passing your value key and running the value as a function if the key function returns true.
However, this wouldn't give you any benefit over a number of if statements and would be harder to maintain and debug. So don't do it, and just use if instead.
In that case you would use an if/then/else. You cannot do this with a switch, either.
The idea of a switch statement is that you have a value V that you test for identity against N possible outcomes. You can do this with an if-construct - however that would take O(N) runtime on average. The switch gives you constant O(1) every time.
This is obviously not possible for ranges (since they are not easily hashable) and thus you use if-constructs for these cases.
Example
if value1 <val: do_some_stuff1()
elif value2 >val: do_some_stuff2()
Note that this is actually smaller than trying to use a dictionary.
dict is not for doing this (nor is switch!).
A couple posters have suggested a dict with containment functions, but this is not the solution you want at all. It is O(n) (like an if statement), it doesn't really work (because you could have overlapping conditions), is unpredictable (because you do not know what order you will do the loop), and is much less clear than the equivalent if statement. The if statement is probably the way you want to go if you have a short, static-length list of conditions to apply.
If you have tons of conditions or if they could change as a result of your program, you want a different data structure. You could implement a binary tree or keep a sorted list and use the bisect module to find a value associated with the given range.
I don't know of any practicable solution. If you want to go with the guess what it does approach though you could do something like this:
obsure_switch = {
lambda x: 1<x<6 : some_function,
...
}
[action() for condition,action in obscure_switch.iteritems() if condition(var)]
Finally figured out what to do!
So instead of using a bunch of elif statements I did this:
range=[10,100]
options='abc'
def func(val)
choose=str(options[bisect(range,val)])
exec choose+"()"
def a():
do_stuff
def b():
do_other_stuff
def c():
do_other_other stuff
Not only does it work but it goes almost as fast as my original 4 line code where I'm not changing any values of things!

Categories