Questions related to performance/efficiency in Python/Django - python

I have few questions which are bothering me since few days back. I'm a beginner Python/Django Programmer so I just want to clear few things before I dive into real time product development.(for Python 2.7.*)
1) saving value in a variable before using in a function
for x in some_list/tuple:
for x in some_list/tuple:
y = do_something(x)
Which one is faster or which one I SHOULD use.
2)Creating a new object of a model in Django
def myview(request):
u = User(username="xyz12",city="TA",name="xyz",...)
def myview(request):
d = {'username':"xyz12",'city':"TA",'name':"xyz",...}
u = User(**d)
3) creating dictionary
var = Dict(key1=val1,key2=val2,...)
var = {'key1':val1,'key2':val2,...}
4) I know .append() is faster than += but what if I want to append a list's elements to another
a = [1,2,3,],b=[4,5,6]
a += b
for i in b:

This is a very interesting question, but I think you don't ask it for the good reason. The performances gained by such optimisations are negligible, especially if you're working with small number of elements.
On the other hand, what is really important is the ease of reading the code and it's clarity.
def myview(request):
d = {'username':"xyz12",'city':"TA",'name':"xyz",...}
u = User(**d)
This code for example isn't "easy" to read and to understand at first sight. It requires to think about it before finding what is actually does. Unless you need the intermediary step, don't do it.
For the 4th point, I'd go for the first solution, way much clearer (and it avoids the function call overhead created by calling the same function in a loop). You could also use more specialised function for better performances such as reduce (see this answer : and this thread as well : What is the fastest way to merge two lists in python?).
The 1st and 3rd points are usually up to what you prefer, as both are really similar and will probably be optimised when compiled to bytecode anyway.
If you really want to optimise more your code, I advise you to go check this out :
PS : Ultimately, you can still do your own tests. Write two functions doing the exact same things with the two different methods you want to test, measure the execution times of these methods and compare them (be careful, do the tests multiple time to reduce the uncertainties).


speed up function based on list comprehension

I'm trying to get the 15 most relevant item for each users but every functions i tried took an eternity. (more than 6 hours i shutdown it after that ...)
I have 418 unique users, 3718 unique items.
U2tfifd dict has as well 418 entry and there is 32645 words in tfidf_feature_names.
Shape of my interactions_full_df is (40733, 3)
i tried :
def index_tfidf_users(user_id) :
return [users for users in U2tfifd[user_id].flatten().tolist()]
def get_relevant_items(user_id):
return sorted(zip(tfidf_feature_names, index_tfidf_users(user_id)), key=lambda x: -x[1])[:15]
def get_tfidf_token(user_id) :
return [words for words, values in get_relevant_items(user_id)]
then interactions_full_df["tags"] = interactions_full_df["user_id"].apply(lambda x : get_tfidf_token(x))
def get_tfidf_token(user_id) :
tags = []
v = sorted(zip(tfidf_feature_names, U2tfifd[user_id].flatten().tolist()), key=lambda x: -x[1])[:15]
for words, values in v :
return tags
def get_tfidf_token(user_id) :
v = sorted(zip(tfidf_feature_names, U2tfifd[user_id].flatten().tolist()), key=lambda x: -x[1])[:15]
tags = [words for words in v]
return tags
U2tfifd is a dict with keys = user_id, values = an array
There are several things going on which could cause poor performance in your code. The impact of each of these will depend on things like your Python version (2.x or 3.x), your RAM speed, and whatnot. You'll need to experiment and benchmark the various potential improvements yourself.
1. TFIDF Sparsity (~10x speedup depending on sparsity)
One glaring potential problem is that TFIDF naturally returns sparse data (e.g. a paragraph doesn't use anywhere near as many unique words as an entire book), and working with dense structures like numpy arrays is a strange choice when the data is probably zero almost everywhere.
If you'll be doing this same analysis in the future, it might be helpful to make/use a version of TFIDF with sparse array outputs so that when you extract your tokens you can skip over the zero values. This would likely have the secondary benefit of the entire sparse array for each user fitting in the cache and preventing costly RAM access in your sorts and other operations.
It might be worth sparsifying your data anyway. On my potato, a quick benchmark on data which should be similar to yours indicates that the process can be done in ~30s. The process replaces much of the work you're doing with a highly optimized routine coded in C and wrapped for use in Python. The only real cost is the second pass through the non-zero entries, but unless that pass is pretty efficient to begin with you should be better off working with sparse data.
2. Duplicated Efforts and Memoization (~100x speedup)
If U2tfifd has 418 entries and interactions_full_df has 40733 rows then at least 40315 (or 99.0%) of your calls to get_tfidf_token() are wasted since you've already computed the answer. There are tons of memoization decorators out there, but you don't need anything very complicated for your use case.
def memoize(f):
_cache = {}
def _f(arg):
if arg not in _cache:
_cache[arg] = f(arg)
return _cache[arg]
return _f
def get_tfidf_token(user_id):
Breaking this down, the function memoize() returns another function. The behavior of that function is to check a local cache for the expected return value before computing it and storing it if necessary.
The syntax #memoize... is short for something like the following.
def uncached_get_tfidf_token(user_id):
get_tfidf_token = memoize(uncached_get_tfidf_token)
The # symbol is used to signify that we want the modified, or decorated, version of get_tfidf_token() instead of the original. Depending on your application, it might be beneficial to chain decorators together.
3. Vectorized Operations (varying speedup, benchmarking necessary)
Python doesn't really have a notion of primitive types like other languages, and even integers take 24 bytes in memory on my machine. Lists aren't usually be packed, so you can incur costly cache misses as you're plowing through them. No matter how little work the CPU is doing for sorting and whatnot, clobbering a whole new chunk of memory to turn your array into a list and only using that brand new, expensive memory once is going to incur a performance hit.
Many of the things you are trying to do have fast (SIMD vectorized, parallelized, memory-efficient, packed memory, and other fun optimizations) numpy equivalents AND avoid unnecessary array copies and type conversions. It seems you're already using numpy anyway, so you won't have any extra imports or dependencies.
As one example, zip() creates another list in memory in Python 2.x and still does unnecessary work in Python 3.x when you really only care about the indices of tfidf_feature_names. To compute those indices, you can use something like the following, which avoids an unnecessary list creation and uses an optimized routine with slightly better asymptotic complexity as an added bonus.
def get_tfidf_token(user_id):
temp = U2tfifd[user_id].flatten()
ind = np.argpartition(temp, len(temp)-15)[-15:]
return tfidf_feature_names[ind] # works if tfidf_feature_names is a numpy array
return [tfidf_feature_names[i] for i in ind] # always works
Depending on the shape of U2tfifd[user_id], you could avoid the costly .flatten() computation by passing an axis argument to np.argsort() and flattening the 15 obtained indices instead.
4. Bonus
The sorted() function supports a reverse argument so that you can avoid extra computations like throwing a negative on every value. Simply use
sorted(..., reverse=True)
Even better, since you really don't care about the sort itself but just the 15 largest values you can get away with
to index the largest 15 instead of reversing the sort and taking the smallest 15. That doesn't really matter if you're using a better function for the application like np.argpartition(), but it could be helpful in the future.
You can also avoid some function calls by replacing .apply(lambda x : get_tfidf_token(x)) with .apply(get_tfidf_token) since get_tfidf_token is already a function which has the intended behavior. You don't really need the extra lambda.
As far as I can see though, most additional gains are fairly nitpicky and system-dependent. You can make most things faster with Cython or straight C with enough time for example, but you already have reasonably fast routines which do what you want out of the box. The extra engineering effort probably isn't worth any potential gains.

Most efficient way to determine if an element is in a list

So I have alist = [2,4,5,6,9,10], and b = 6. What is the more efficient way to determine if b is in alist?
if b in alist:
print " b is in alist"
def split_list(alist,b):
midpoint = len(alist)/2
if b<=alist[midpoint]:
alist =alist[:midpoint]:
I thought method number 1 is better because it is only one line of code, but I've read that method 2 is better because it searchs from middle of list rather than from the beginning the.
Actually the difference between the functions you have shown lies in the matter of time saving during execution. If you are sure that your list will always have more than 2 members then function 2 is better but not too much.
Here is how it works
Function 1
if b in alist:
print " b is in alist"
This will loop through all element in the list only looking for b and when it finds it makes it true but what if your list has 200 members times become sensitive for your program
Function 2
def split_list(alist,b):
midpoint = len(alist)/2
if b<=alist[midpoint]:
alist =alist[:midpoint]:
This does the same except now you are testing a condition first using that midpoint so as to know where might "b" be so as to save the task of looping through the whole list now you will loop half the time, Note:You will make sure that your list has much members may be more than 3 to be reasonable to do that remainder because it may make your logic easy and readable in the future. So in some way it has helped you but consider the fact that what if your list has 200 elements and you divide that by two will it be too helpful to divide it by two and use 100 loop?
No!It still take significant time!
My suggestion according to your case is that if you want to work with small lists your function 1 is better. But if you want to work with huge lists!! Here are some functions which will solve your problem will saving much of your time if you want the best performance for your program. This function uses some built in functions which does take small time to finish because of some list information are in already in memory
def is_inside(alist,b):
how_many=alist.count(b) #return the number of times x appears in the list
if how_many==0:
return False
return True
#you can also modify the function in case you want to check if an element appears more than once!
But if you don't want it to say how many times an element appears and only one satisfy your need! This also another way of doing so using some built in functions for lists
def is_inside(alist,b):
which_position=alist.index(b) #this methods throws an error if b is not in alist
return True
except Error:
return False
So life becomes simple when using built functions specifically for lists. You should consider reading how to use lists well when they long for performance of the programs stuffs like dequeue,stacks,queue,sets
Good source is the documentation itself Read here!
The expected way to find something in a list in python is using the in keyword. If you have a very large dataset, then you should use a data structure that is designed for efficient lookup, such as a set. Then you can still do a find via in.

Two different approaches to writing a function (map vs loop)

SO busy with some code, and have a function which basically takes dictionary where each value is a list, and returns the key with the largest list.
I wrote the following:
def max_list(dic):
if dic:
l1 = dic.values()
l1 = map(len, l1)
l2 = dic.keys()
return l2[l1.index(max(l1))]
return None
Someone else wrote the following:
def max_list(dic):
result = None
maxValue = 0
for key in dic.keys():
if len(dic[key]) >= maxValue:
result = key
maxValue = len(dic[key])
return result
Which would be the 'correct' way to do this, if there is one. I hope this is not regarded as community wiki (even though the code works), trying to figure which would be the best pattern in terms of the problem.
Another valid option:
maxkey,maxvalue = max(d.items(),key=lambda x: len(x[1]))
Of the two above, I would probably prefer the explicit for loop as you don't generate all sorts of intermediate objects just to throw them away.
As a side note, This solution doesn't work particularly well for empty dicts ... (it raises a ValueError). Since I expect that is an unusual case (rather than the norm), it shouldn't hurt to enclose in a try-except ValueError block.
the most pythonic would be max(dic,key=lambda x:len(dic[x])) ... at least I would think ...
maximizing readability and minimizing lines of code is pythonic ... usually
I think the question you should ask yourself is, what do you think the most important is: code maintainability or computation speed?
As the other answers point out, this problem has a very concise solution using a map. For most people this implementation would probably be more easy to read then the implementation with a loop.
In terms of computational speed, the map solution would be less efficient, but still be in the same Computational Magnitute.
Therefore, I think it is unlikely that the map method would ever have noticeably less performance. I would suggest you to use a profiler after your program is finished, so you can be sure where the real problem lies if your program turns out to run slower than desired.

get the flagged bit index in pure python

i involve here 2 questions: one is for 'how', and second is for 'is this great solution sounds ok?'
the thing is this: i have an object with int value that stores all the persons' ids that used that object. it's done using a flagging technique (person id is 0-10).
i got to a situation where in case this value is flagged with only one id, i want to get this id.
for the first test i used value & (value-1) which is nice, but as for the second thing, i started to wonder what's the best way to do it (the reason i wonder about it is because this calculation happens at least 300 times in a second in a critical place).
so the 1st way i thought about is using math.log(x,2), but i feel a little uncomfortable with this solution since it involves "hard" math on a value, instead of very simple bits operation, and i feel like i'm missing something.
the 2nd way i thought about is to count value<<1 until it reaches 1, but it as you could see in the benchmark test, it was just worse.
the 3rd way i was implemented is a non-calc way, and was the fastest, and it's using a dictionary with all the possible values for ids 0-10.
so like i said before: is there a 'right' way for doing it in pure python?
is a dictionary-based solution is a "legitimate" solution? (readability/any-other-reason-why-not-to?)
import math
import time
def find_bit_using_loop(num,_):
while num!=1:
return c
def find_bit_using_dict(num,_):
return options[num]
def get_bit_idx(num, func):
for i in xrange(100000):
#print a
return t
for i in xrange(20):
print "time using log:", get_bit_idx(num, math.log)
print "time using loop:", get_bit_idx(num, find_bit_using_loop)
print "time using dict:", get_bit_idx(num, find_bit_using_dict)
time using log: 0.0450000762939
time using loop: 0.156999826431
time using dict: 0.0199999809265
(there's a very similar question here: return index of least significant bit in Python , but first, in this case i know there's only 1 flagged bit, and second, i want to keep it a pure python solution)
If you are using Python 2.7 or 3.1 and above, you can use the bit_length method on integers. It returns the number of bits necessary to represent the integer, i.e. one more than the index of the most significant bit:
>>> (1).bit_length()
>>> (4).bit_length()
>>> (32).bit_length()
This is probably the most Pythonic solution as it's part of the standard library. If you find that dict performs better and this is a performance bottleneck, I see no reason not to use it though.

More efficient ways of doing this

for i in vr_world.getNodeNames():
if i != "_error_":
World[i] = vr_world.getChild(i)
vr_world.getNodeNames() returns me a gigantic list, vr_world.getChild(i) returns a specific type of object.
This is taking a long time to run, is there anyway to make it more efficient? I have seen one-liners for loops before that are supposed to be faster. Ideas?
kaloyan suggests using a generator. Here's why that may help.
If getNodeNames() builds a list, then your loop is basically going over the list twice: once to build it, and once when you iterate over the list.
If getNodeNames() is a generator, then your loop doesn't ever build the list; instead of creating the item and adding it to the list, it creates the item and yields it to the caller.
Whether or not this helps is contingent on a couple of things. First, it has to be possible to implement getNodeNames() as a generator. We don't know anything about the implementation details of that function, so it's not possible to say if that's the case. Next, the number of items you're iterating over needs to be pretty big.
Of course, none of this will have any effect at all if it turns out that the time-consuming operation in all of this is vr_world.getChild(). That's why you need to profile your code.
I don't think you can make it faster than what you have there. Yes, you can put the whole thing on one line but that will not make it any faster. The bottleneck obviously is getNodeNames(). If you can make it a generator, you will start populating the World dict with results sooner (if that matters to you) and if you make it filter out the "_error_" values, you will not have the deal with that at a later stage.
World = dict((i, vr_world.getChild(i)) for i in vr_world.getNodeNames() if i != "_error_")
This is a one-liner, but not necessarily much faster than your solution...
Maybe you can use a filter and a map, however I don't know if this would be any faster:
valid = filter(lambda i: i != "_error_", vr_world.getNodeNames())
World = map(lambda i: vr_world.getChild(i), valid)
Also, as you'll see a lot around here, profile first, and then optimize, otherwise you may be wasting time. You have two functions there, maybe they are the slow parts, not the iteration.
