cost=0
for i in range(12):
cost=cost+math.pow(float(float(q[i])-float(w[i])),2)
cost=(math.sqrt(cost))
Any faster alternative to this? i am need to improve my entire code so trying to improve each statements performance.
thanking u
In addition to the general optimization remarks that are already made (and to which I subscribe), there is a more "optimized" way of doing what you want: you manipulate arrays of values and combine them mathematically. This is a job for the very useful and widely used NumPy package!
Here is how you would do it:
q_array = numpy.array(q, dtype=float)
w_array = numpy.array(w, dtype=float)
cost = math.sqrt(((q_array-w_array)**2).sum())
(If your arrays q and w already contain floats, you can remove the dtype=float.)
This is almost as fast as it can get, since NumPy's operations are optimized for arrays. It is also much more legible than a loop, because it is both simple and short.
Just a hint, but usually real performance improvements come when you evaluate the code at a function or even higher level.
During a good evaluation, you may find whole blocks that code be thrown away or rewritten to simplify the process.
Profilers are useful AFTER you've cleaned up crufty not-very-legible code. irrespective of whether it's to be run once or N zillion times, you should not write code like that.
Why are you doing float(q[i]) and float(w[i])? What type(s) is/are the elements of q and `w'?
If x and y are floats, then x - y will be a float too, so that's 3 apparently redundant occurrences of float() already.
Calling math.pow() instead of using the ** operator bears the overhead of lookups on 'math' and 'pow'.
Etc etc
See if the following code gives the same answers and reads better and is faster:
costsq = 0.0
for i in xrange(12):
costsq += (q[i] - w[i]) ** 2
cost = math.sqrt(costsq)
After you've tested that and understood why the changes were made, you can apply the lessons to other Python code. Then if you have a lot more array or matrix work to do, consider using numpy.
Assuming q and w contain numbers the conversions to float are not necessary, otherwise you should convert the lists to a usable representation earlier (and separately from your calculation)
Given that your function seems to only be doing the equivalent of this:
cost = sum( (qi-wi)**2 for qi,wi in zip(q[:12],w) ) ** 0.5
Perhaps this form would execute faster.
Related
I have a vector of floats (coming from an operation on an array) and a float value (which is actually an element of the array, but that's unimportant), and I need to find the smallest float out of them all.
I'd love to be able to find the minimum between them in one line in a 'Pythony' way.
MinVec = N[i,:] + N[:,j]
Answer = min(min(MinVec),N[i,j])
Clearly I'm performing two minimisation calls, and I'd love to be able to replace this with one call. Perhaps I could eliminate the vector MinVec as well.
As an aside, this is for a short program in Dynamic Programming.
TIA.
EDIT: My apologies, I didn't specify I was using numpy. The variable N is an array.
You can append the value, then minimize. I'm not sure what the relative time considerations of the two approaches are, though - I wouldn't necessarily assume this is faster:
Answer = min(np.append(MinVec, N[i, j]))
This is the same thing as the answer above but without using numpy.
Answer = min(MinVec.append(N[i, j]))
All,
I am going to compute some feature values using the following python codes. But, because the input sizes are too big, it is very time-consuming. Please help me to optimize the codes.
leaving_volume=len([x for x in pickup_ids if x not in dropoff_ids])
arriving_volume=len([x for x in dropoff_ids if x not in pickup_ids])
transition_volume=len([x for x in dropoff_ids if x in pickup_ids])
union_ids=list(set(pickup_ids + dropoff_ids))
busstop_ids=[x for x in union_ids if self.geoitems[x].fare>0]
busstop_density=np.sum([Util.Geodist(self.geoitems[x].orilat, self.geoitems[x].orilng, self.geoitems[x].destlat, self.geoitems[x].destlng)/(1000*self.geoitems[x].fare) for x in busstop_ids])/len(busstop_ids) if len(busstop_ids) > 0 else 0
busstop_ids=[x for x in union_ids if self.geoitems[x].balance>0]
smartcard_balance=np.sum([self.geoitems[x].balance for x in busstop_ids])/len(busstop_ids) if len(busstop_ids) > 0 else 0
Hi, All,
Here is my revised version. I run this code on my GPS traces data. It is faster.
intersect_ids=set(pickup_ids).intersection( set(dropoff_ids) )
union_ids=list(set(pickup_ids + dropoff_ids))
leaving_ids=set(pickup_ids)-intersect_ids
leaving_volume=len(leaving_ids)
arriving_ids=set(dropoff_ids)-intersect_ids
arriving_volume=len(arriving_ids)
transition_volume=len(intersect_ids)
busstop_density=np.mean([Util.Geodist(self.geoitems[x].orilat, self.geoitems[x].orilng, self.geoitems[x].destlat, self.geoitems[x].destlng)/(1000*self.geoitems[x].fare) for x in union_ids if self.geoitems[x].fare>0])
if not busstop_density > 0:
busstop_density = 0
smartcard_balance=np.mean([self.geoitems[x].balance for x in union_ids if self.geoitems[x].balance>0])
if not smartcard_balance > 0:
smartcard_balance = 0
Many thanks for the help.
Just a few things I noticed, as some Python efficiency trivia:
if x not in dropoff_ids
Checking for membership using the in operator is more efficient on a set than a list. But iterating with for through a list is probably more efficient than on a set. So if you want your first two lines to be as efficient as possible you should have both types of data structure around beforehand.
list(set(pickup_ids + dropoff_ids))
It's more efficient to create your sets before you combine data, rather than creating a long list and constructing a set from it. Luckily you probably already have the set versions around now (see the first comment)!
Above all you need to ask yourself the question:
Is the time I save by constructing extra data structures worth the time it takes to construct them?
Next one:
np.sum([...])
I've been trained by Python to think of constructing a list and then applying a function that theoretically only requires a generator as a code smell. I'm not sure if this applies in numpy, since from what I remember it's not completely straightforward to pull data from a generator and put it in a numpy structure.
It looks like this is just a small fragment of your code. If you're really concerned about efficiency I'd recommend making use of numpy arrays rather than lists, and trying to stick within numpy's built-in data structures and function as much as possible. They are likely more highly optimized for raw data crunching in C than the built-in Python functions.
If you're really, really concerned about efficiency then you should probably be doing this data analysis straight-up in C. Especially if you don't have much more code than what you've presented here it might be pretty easy to translate over.
I can only support what machine yerning wrote in his this post. If you are thinking of switching to numpy so if your variables pickup_ids and dropoff_ids were numpy arrays (which maybe they already are else do:
dropoff_ids = np.array( dropoff_ids, dtype='i' )
pickup_ids = np.array( pickup_ids, dtype='i' )
then you can make use of the functions np.in1d() which will give you a True/False array which you can just sum over to get the total number of True entries.
leaving_volume = (-np.in1d( pickup_ids, dropoff_ids )).sum()
transition_volume= np.in1d( dropoff_ids, pickup_ids).sum()
arriving_volume = (-np.in1d( dropoff_ids, pickup_ids)).sum()
somehow I have the feeling that transition_volume = len(pickup_ids) - arriving_volume but I'm not 100% sure right now.
Another function that could be useful to you is np.unique() if you want to get rid of duplicate entries which in a way will turn your array into a set.
I have to solve many independent constrained linear least squares problems (with both bounds and constrains). So I do it multiple times in loops. For each problem I am looking for x that is min||Cx-d|| and x is bounded (in 0,1) and all x elements must sum to 1.
I am seeking a way of doing it fast, because although each optimization doesn't require a lot of time, I need to include it in a large loop.
For example my Matlab implementation seems like this:
img = imread('test.tif');
C = randi(255,6,4);
x=zeros(size(C,2),1);
tp = zeros(size(C,2),1);
Aeq = ones(1,size(C,2));
beq = 1;
options = optimset('LargeScale','off','Display','off');
A = (-1).*eye(size(C,2));
b = zeros(1,size(C,2));
result = zeros(size(img,1),size(img,2),size(C,2));
for i=1:size(img,1)
for j=1:size(img,2)
for k=1:size(img,3)
tp(k) = img(i,j,k);
end
x = lsqlin(C,tp,A,b,Aeq,beq,[],[],[],options);
for l=1:size(C,2)
result(i,j,l)=x(l);
end
end
end
and for a 500x500 loop it takes about 5 minutes. But my loops are much bigger than this. Any idea is welcome, but I would prefer a Matlab, Python or R solution.
As mentioned in the comments: The used commands are highly optimized, and switching languages is not going to make a world of difference.
First of all go back to what you are trying to achieve, perhaps to achieve that you don't need to solve this many hard problems.
If you come to the conclusion that you do want to do this exact calculation, consider changing the settings, and play around with the algorithm.
From doc lsqlin it appears the two main parameters that determine the speed are:
Number of iterations
Accepted tolerance
If all these things don't help enough, consider some tricks like:
Only solving problems that are sufficiently different
Solving with easier constraints, and rounding the result
Again I have a question concerning large loops.
Suppose I have a function
limits
def limits(a,b):
*evaluate integral with upper and lower limits a and b*
return float result
A and B are simple np.arrays that store my values a and b. Now I want to calculate the integral 300'000^2/2 times because A and B are of the length of 300'000 each and the integral is symmetrical.
In Python I tried several ways like itertools.combinations_with_replacement to create the combinations of A and B and then put them into the integral but that takes huge amount of time and the memory is totally overloaded.
Is there any way, for example transferring the loop in another language, to speed this up?
I would like to run the loop
for i in range(len(A)):
for j in range(len(B)):
np.histogram(limits(A[i],B[j]))
I think histrogramming the return of limits is desirable in order not to store additional arrays that grow squarely.
From what I read python is not really the best choice for this iterative ansatzes.
So would it be reasonable to evaluate this loop in another language within Python, if yes, How to do it. I know there are ways to transfer code, but I have never done it so far.
Thanks for your help.
If you're worried about memory footprint, all you need to do is bin the results as you go in the for loop.
num_bins = 100
bin_upper_limits = np.linspace(-456, 456, num=num_bins-1)
# (last bin has no upper limit, it goes from 456 to infinity)
bin_count = np.zeros(num_bins)
for a in A:
for b in B:
if b<a:
# you said the integral is symmetric, so we can skip these, right?
continue
new_result = limits(a,b)
which_bin = np.digitize([new_result], bin_upper_limits)
bin_count[which_bin] += 1
So nothing large is saved in memory.
As for speed, I imagine that the overwhelming majority of time is spent evaluating limits(a,b). The looping and binning is plenty fast in this case, even in python. To convince yourself of this, try replacing the line new_result = limits(a,b) with new_result = 234. You'll find that the loop runs very fast. (A few minutes on my computer, much much less than the 4 hour figure you quote.) Python does not loop very fast compared to C, but it doesn't matter in this case.
Whatever you do to speed up the limits() call (including implementing it in another language) will speed up the program.
If you change the algorithm, there is vast room for improvement. Let's take an example of what it seems you're doing. Let's say A and B are 0,1,2,3. You're integrating a function over the ranges 0-->0, 0-->1, 1-->1, 1-->2, 0-->2, etc. etc. You're re-doing the same work over and over. If you have integrated 0-->1 and 1-->2, then you can add up those two results to get the integral 0-->2. You don't have to use a fancy integration algorithm, you just have to add two numbers you already know.
Therefore it seems to me that you can compute integrals in all the smallest ranges (0-->1, 1-->2, 2-->3), store the results in an array, and add subsets of the results to get the integral over whatever range you want. If you want this program to run in a few minutes instead of 4 hours, I suggest thinking through an alternative algorithm along those lines.
(Sorry if I'm misunderstanding the problem you're trying to solve.)
I am new to python and my problem is the following:
I have defined a function func(a,b) that return a value, given two input values.
Now I have my data stored in lists or numpy arrays A,Band would like to use func for every combination. (A and B have over one million entries)
ATM i use this snippet:
for p in A:
for k in B:
value = func(p,k)
This takes really really a lot of time.
So i was thinking that maybe something like this:
C=(map(func,zip(A,B)))
But this method only works pairwise... Any ideas?
Thanks for help
First issue
You need to calculate the output of f for many pairs of values. The "standard" way to speed up this kind of loops (calculations) is to make your function f accept (NumPy) arrays as input, and do the calculation on the whole array at once (ie, no looping as seen from Python). Check any NumPy tutorial to get an introduction.
Second issue
If A and B have over a million entries each, there are one trillion combinations. For 64 bits numbers, that means you'll need 7.3 TiB of space just to store the result of your calculation. Do you have enough hard drive to just store the result?
Third issue
If A and B where much smaller, in your particular case you'd be able to do this:
values = f(*meshgrid(A, B))
meshgrid returns the cartesian product of A and B, so it's simply a way to generate the points that have to be evaluated.
Summary
You need to use NumPy effectively to avoid Python loops. (Or if all else fails or they can't easily be vectorized, write those loops in a compiled language, for instance by using Cython)
Working with terabytes of data is hard. Do you really need that much data?
Any solution that calls a function f 1e12 times in a loop is bound to be slow, specially in CPython (which is the default Python implementation. If you're not really sure and you're using NumPy, you're using it too).
suppose, itertools.product does what you need:
from itertools import product
pro = product(A,B)
C = map(lambda x: func(*x), pro)
so far as it is generator it doesn't require additional memory
One million times one million is one trillion. Calling f one trillion times will take a while.
Unless you have a way of reducing the number of values to compute, you can't do better than the above.
If you use NumPy, you should definitely look the np.vectorize function which is designed for this kind of problems...