Speed Up Program Below

Speed Up Program Below - python

I have written this for loop program below where I go through element by element of an array and do some math to those elements. Once the math is calculated it gets stored into another array.
for i in range(0, 1024):
x[i] = a * data[i]+ b * x[(i-1)] + c * x[(i-2)]
So in my program a, b, and c are just scalar numbers. Data and x are arrays. Data has an array size 1024 filled with numbers in each element. X is also an array size 1024 but it's filled with all zeros initially. In order to calculate the new elements of x I use the previous two elements of x. Initially the previous two are 0 and 0 since it takes the last two element from the x array of zeros. I multiply the current element of data by a, the last element of x by b, and the second to last element of x by c. Then I add everything up and save it to the current element of x. Then I do the same thing for every element in data and x.
This loop program works but I was wondering if there is a faster way to do it? Maybe using a combination of numpy functions like cumsum or dot product? Can some one help me maybe make the program faster? Thank you!

Best you could do using recursive method:
x = a * data
coef = np.array([c,b])
for i in range(2, 1024):
x[i] += np.dot(coef, x[i-2:i])
But even better, you can solve this recurrence equation to a closed form solution and apply directly without loop. (This is a basic 2nd order linear equation)

In general, if you want a programm that is fast, Python is not the best option. Python is great for prototyping since it is easy and has a lot of tools, however it is not verry computationally efficient in it's raw form if you compare it to for example C. What I usually do is to use Cython, is is a module for python that let's you convert your script to machiene code (as you do with C) which would greatly increase the speed of the appliation.
It let's you type cast the variables for example:
cdef double a, b, c
When you use a variable in Python the variables has to be checked every single time to make sure what type of variable it is (int, double, string etc). In C, that is not an issue since you have to decide from the start what the variable should be, decreasing the time consumption of the operation.

I would try to transform the for loop in a list comprehension which has much faster processing time in python.

Related

What is the fastest way to extract a sub-array from a numpy 2d array?

I would like to know the fastest way to extract a sub array from a very large numpy array.
I have an algorithm that needs to run in real time and I often have to extract a sub array which is very time consuming.
Here is how it is currently done:
array[max(0,y-q):max(0,y+q+1),max(0,x-q):max(0,x+q+1)]
To select a q*q array centered in x, y from the original one.
In most of the cases I use q=6
Is there a way to make it faster?
EDIT:
Here is the code using it
res_1 = np.mean(arr_1[max(0,y-q):max(0,y+q+1),max(0,x-q):max(0,x+q+1)])
res_2 = np.mean(arr_2[max(0,y-q):max(0,y+q+1),max(0,x-q):max(0,x+q+1)])
if (arr_0[y, x] - res_1)>0.035 and (arr_0[y, x] - res_2)>0.035:
return True
else:
return False

I did a quick benchmark with the timeit module. It is as I expected: Creating the subarray consumes almost no time. The mean computation takes all the time. Your real question should be, "How do I improve the mean computation here?"
Interestingly enough, mean runs faster in double precision rather than single precision. I guess the mean function always works in double precision internally.
You could also use array.sum() instead of mean. You know the sub-array size, so your comparison could be rewritten as ((arr_0[y, x] * N - res_1)>0.035 * N and ... I have not benchmarked this.

Best way to create a loop for multiplying a matrix by every one of its elements, then summing the results

very new to Python so apologies for the lack of vocabulary/knowledge. I would like to know if there is a better way to achieve what the code below provides. Using the loop I have made, I can generate and append all of the matrices/arrays formed from multiplying matrix A by each and every element within A. The last line of code then sums all of the elements in this array of arrays and prints out the result I want.
The problem is, when I get to about d = 600, I get SIGKILL errors, due to a lack of memory on my computer.
I have considered the mathematics behind it, which included breaking the summation into parts that dealt with different values of indices, but nothing seems to speed it up significantly.
This may be purely a memory-based issue, but I thought I would ask in case there are any Python/code based tips that could help. The code is as follows:
A = numpy.random.randint(0, 4, size=(d, d))
All = []
for n in range(0, d):
for m in range(0, d):
All.append(A*(A[n,m]))
print(numpy.sum(All))
So overall, I achieve the correct result, but due to the large size of the matrices and the number of multiplications, I cannot achieve the required d = 2000 I am looking for without a memory error. Thanks in advance.

You don't need to do looping here and building a new list if all you want is the total sum... what you're doing mathematically comes down to:
total = A.sum() ** 2

Fast way to construct a matrix in Python

I have been browsing through the questions, and could find some help, but I prefer having confirmation by asking it directly. So here is my problem.
I have an (numpy) array u of dimension N, from which I want to build a square matrix k of dimension N^2. Basically, each matrix element k(i,j) is defined as k(i,j)=exp(-|u_i-u_j|^2).
My first naive way to do it was like this, which is, I believe, Fortran-like:
for i in range(N):
for j in range(N):
k[i][j]=np.exp(np.sum(-(u[i]-u[j])**2))
However, this is extremely slow. For N=1000, for example, it is taking around 15 seconds.
My other way to proceed is the following (inspired by other questions/answers):
i, j = np.ogrid[:N,:N]
k = np.exp(np.sum(-(u[i]-u[j])**2,axis=2))
This is way faster, as for N=1000, the result is almost instantaneous.
So I have two questions.
1) Why is the first method so slow, and why is the second one so fast ?
2) Is there a faster way to do it ? For N=10000, it is starting to take quite some time already, so I really don't know if this was the "right" way to do it.
Thank you in advance !
P.S: the matrix is symmetric, so there must also be a way to make the process faster by calculating only the upper half of the matrix, but my question was more related to the way to manipulate arrays, etc.

First, a small remark, there is no need to use np.sum if u can be re-written as u = np.arange(N). Which seems to be the case since you wrote that it is of dimension N.
1) First question:
Accessing indices in Python is slow, so best is to not use [] if there is a way to not use it. Plus you call multiple times np.exp and np.sum, whereas they can be called for vectors and matrices. So, your second proposal is better since you compute your k all in once, instead of elements by elements.
2) Second question:
Yes there is. You should consider using only numpy functions and not using indices (around 3 times faster):
k = np.exp(-np.power(np.subtract.outer(u,u),2))
(NB: You can keep **2 instead of np.power, which is a bit faster but has smaller precision)
edit (Take into account that u is an array of tuples)
With tuple data, it's a bit more complicated:
ma = np.subtract.outer(u[:,0],u[:,0])**2
mb = np.subtract.outer(u[:,1],u[:,1])**2
k = np.exp(-np.add(ma, mb))
You'll have to use twice np.substract.outer since it will return a 4 dimensions array if you do it in one time (and compute lots of useless data), whereas u[i]-u[j] returns a 3 dimensions array.
I used np.add instead of np.sum since it keep the array dimensions.
NB: I checked with
N = 10000
u = np.random.random_sample((N,2))
I returns the same as your proposals. (But 1.7 times faster)

Optimizing a nested for loop

I'm trying avoid to use for loops to run my calculations. But I don't know how to do it. I have a matrix w with shape (40,100). Each line holds the position to a wave in a t time. For example first line w[0] is the initial condition (also w[1] for reasons that I will show).
To calculate the next line elements I use, for every t and x on shape range:
w[t+1,x] = a * w[t,x] + b * ( w[t,x-1] + w[t,x+1] ) - w[t-1,x]
Where a and b are some constants based on equation solution (it really doesn't matter), a = 2(1-r), b=r, r=(c*(dt/dx))**2. Where c is the wave speed and dt, dx are related to the increment on x and t direction.
Is there any way to avoid a for loop like:
for t in range(1,nt-1):
for x in range(1,nx-1):
w[t+1,x] = a * w[t,x] + b * ( w[t,x-1] + w[t,x+1] ) - w[t-1,x]
nt and nx are the shape of w matrix.

I assume you're setting w[:,0] and w[:-1] beforehand (to some constants?) because I don't see it in the loop.
If so, you can eliminate for x loop vectorizing this part of code:
for t in range(1,nt-1):
w[t+1,1:-1] = a*w[t,1:-1] + b*(w[t,:-2] + w[t,2:]) - w[t-1,1:-1]

Not really. If you want to do something for every element in your matrix (which you do), you're going to have to operate on each element in some way or another (most obvious way is with a for loop. Less obvious methods will either perform the same or worse).
If you're trying to avoid loops because loops are slow, know that sometimes loops are necessary to solve a certain kind of problem. However, there are lots of ways to make loops more efficient.
Generally with matrix problems like this where you're looking at the neighboring elements, a good solution is using some kind of dynamic programming or memoization (saving your work so you don't have to repeat calculations frequently). Like, suppose for each element you wanted to take the average of it and all the things around it (this is how blurring images works). Each pixel has 8 neighbors, so the average will be the sum / 9. Well, let's say you save the sums of the columns (save NW + W + SW, N + me + S, NE + E + SE). Well when you go to the next one to the right, just sum the values of your previous middle column, your previous last column, and the values of a new column (the new ones to the right). You just replaced adding 9 numbers with adding 5. In operations that are more complicated than addition, reducing 9 to 5 can mean a huge performance increase.
I looked at what you have to do and I couldn't think of a good way to do something like I just described. But see if you can think of something similar.
Also, remember multiplication is much more expensive than addition. So if you had a loop where, for instance, you had to multiply some number by the loop variable, instead of doing 1x, 2x, 3x, ..., you could do (value last time + x).

Allowing for deviations in exact values during matrix multiplication, python

I need to solve this:
Check if AT * n * A = n, where A is the test matrix, AT is the transposed test matrix and n = [[1,0,0,0],[0,-1,0,0],[0,0,-1,0],[0,0,0,-1]].
I don't know how to check for equality due to the numerical errors in the float multiplication. How do I go about doing this?
Current code:
def trans(A):
n = numpy.matrix([[1,0,0,0],[0,-1,0,0],[0,0,-1,0],[0,0,0,-1]])
c = numpy.matrix.transpose(A) * n * numpy.matrix(A)
Have then tried
>if c == n:
return True
I have also tried assigning variables to every element of matrix and then checking that each variable is within certain limits.

Typically, the way that numerical-precision limitations are overcome is by allowing for some epsilon (or error-value) between the actual value and expected value that is still considered 'equal'. For example, I might say that some value a is equal to some value b if they are within plus/minus 0.01. This would be implemented in python as:
def float_equals(a, b, epsilon):
return abs(a-b)<epsilon
Of course, for matrixes entered as lists, this isn't quite so simple. We have to check if all values are within the epsilon to their partner. One example solution would be as follows, assuming your matrices are standard python lists:
from itertools import product # need this to generate indexes
def matrix_float_equals(A, B, epsilon):
return all(abs(A[i][j]-B[i][j])<epsilon for i,j in product(xrange(len(A)), repeat = 2))
all returns True iff all values in a list are True (list-wise and). product effectively dot-products two lists, with the repeat keyword allowing easy duplicate lists. Therefore given a range repeated twice, it will produce a list of tuples for each index. Of course, this method of index generation assumes square, equally-sized matrices. For non-square matrices you have to get more creative, but the idea is the same.
However, as is typically the way in python, there are libraries that do this kind of thing for you. Numpy's allclose does exactly this; compares two numpy arrays for equality element-wise within some tolerance. If you're working with matrices in python for numeric analysis, numpy is really the way to go, I would get familiar with its basic API.

If a and b are numpy arrays or matrices of the same shape, then you can use allclose:
if numpy.allclose(a, b): # a is approximately equal to b
# do something ...
This checks that for all i and all j, |aij - bij| < εa for some absolute error εa (by default 10-5) and that |aij - bij| < |bij| εr for some relative error εr (by default 10-8). Thus it is safe to use, even if your calculations introduce numerical errors.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.