I was wondering why I get different result in the two prints? shouldn't they be the same?
import numpy as np
x = np.array([[1.5, 2], [2.4, 6]])
k = np.copy(x)
for i in range(len(x)):
for j in range(len(x[i])):
k[i][j] = 1 / (1 + np.exp(-x[i][j]))
print("K[i][j]:"+str(k[i][j]))
print("Value:"+str(1 / (1 + np.exp(-x[i][j]))))
I've just run your code with python3 and python2 and the results were absolutely the same.
Besides, you don't have to do looping when using numpy arrays allows you to express many kinds of data processing tasks as concise array expressions that might otherwise require writing loops. This practice of
replacing explicit loops with array expressions is commonly referred to as vectorization. In general, vectorized array operations will often be one or two (or more) orders of magnitude faster than their pure Python equivalents, with the biggest impact in any kind of numerical computations.
So, keeping all this in mind you may rewrite your code as follows:
import numpy as np
x = np.array([[1.5, 2], [2.4, 6]], dtype=np.float)
k = 1 / (1 + np.exp(-x))
When I run this script, 2 prints showed same results. This python is 3.5.2.
K[i][j]:0.817574476194
Value:0.817574476194
K[i][j]:0.880797077978
Value:0.880797077978
K[i][j]:0.916827303506
Value:0.916827303506
K[i][j]:0.997527376843
Value:0.997527376843
Related
I have written a function which takes an N by N array and compute an output array based on it.
heres how my code looks like this:
def calculate_output(input,N):
output = np.zeros((N, N))
for y in range(N):
for x in range(N):
val1 = 0 if y-1<0 else output[y-1][x]+input[y][x]
val2 = 0 if x-1<0 else output[y][x-1]+input[y][x]
output[y][x] = max(val1,val2)
return output
N = 10000
input = np.reshape(np.random.binomial(1, [0.25] * N * N), (N, N))
output =calculate_output(input,N)
however this compution is not fast enough and takes about 300 seconds on my machine.(compared to 3 seconds when implemented on C++)
is there any way to improve this without writing a C extension?
I have tries using pypy but in this case the code is even slower using pypy
CPython is very slow because it is an interpreter and it clearly cannot compete with C and C++ in such a case. The usual approach to reduce the cost of the interpreter is to avoid loops as much as possible and use few Numpy vectorized calls instead. However in this case, it is barely possible to write an efficient implementation using Numpy vectorized calls.
On the other hand PyPy is often much better for numerical codes because of the JIT compilation. But its implementation of Numpy is not great at all mainly because they used an implementation of Numpy rewritten in Python which is not as good as the native Numpy implementation and the native implementation would not be efficient because of the way Python modules are currently implemented. To put it shortly, AFAIK, the PyPy JIT cannot optimize Numpy access with the native implementation. As the result, the JIT can be slower than the CPython interpreter in your case.
However, you can speed up the code a lot using the Numba JIT compiler which has been written for this exact use-case. Moreover, few optimizations can be implemented to speed up the code even more (whatever the programming language used):
conditionals are generally slow, you can move them in loops performing only the borders
writing zeros initially in the output matrix is not required and is actually slower
Using 2D direct indexing is cleaner and likely a bit faster
integers can be used instead of floating-point numbers since the output contains only integers and computing integers is faster than computing the same operation with floating-point numbers.
import numba as nb
#nb.njit(['int32[:,::1](int32[:,::1],int32)', 'int64[:,::1](int64[:,::1],int64)'])
def calculate_output(input,N):
output = np.empty((N, N), input.dtype)
for x in range(0,N):
val2 = 0 if x-1<0 else output[0,x-1]+input[0,x]
output[0,x] = max(0,val2)
for y in range(1,N):
val1 = 0 if y-1<0 else output[y-1,0]+input[y,0]
output[y,0] = max(val1,0)
for y in range(1,N):
for x in range(1,N):
val1 = output[y-1,x]+input[y,x]
val2 = output[y,x-1]+input[y,x]
output[y,x] = max(val1,val2)
return output
The resulting calculate_output call is 730 times faster on my machine.
I am trying to maximize computation performance using numpy (remove python for loop). Here is my initial implementation
np.random.seed(128)
l = []
for i in range(1000):
v = np.random.randn(7)
l.append(np.linalg.norm(v))
l = np.array(l)
l
The above code simply takes the Frobenius norm of a vector of size 7, and appends it to a list. This is repeated for 1000 times. To remove the for loop, I construct a matrix of size (1000, 7), and then take the norm of the matrix with axis=1 as shown below.
np.random.seed(128)
v = np.random.randn(1000, 7)
v = np.linalg.norm(v, axis=1)
However, when I check for equality of l to v with np.all(l == v), it outputs False for me. I don't understand why numpy behaves in such way. I checked the dtype of values for v and l and both are np.float64
you can read the following issue.
it is said there:
numpy in general does not guarantee that semantically equivalent
operations like this will produce identical results. Even operations
like sum can produce different results depending on memory layout (and
this is on purpose -- making them identical all the time would require
either big slowdowns or intentionally reducing precision).
so this is where the difference lies, you should not expect the same results but the same results up to tolerance. so the simplest solution to compare them will be the one suggested by Divakar:
np.allclose(l,v)
another possible option is:
np.array_equal(np.round(l,12),np.round(v,12))
(python 2)
My for-loop is this
vx2=[]
vy2=[]
vz2=[]
for xn in range(0,npoints-2):
vx11=vx1[xn]+.5*(fxxx_list[xn]+fxxx_list[xn+1])*dt
vy11=vy1[xn]+.5*(fxxx_list[xn]+fxxx_list[xn+1])*dt
vz11=vz1[xn]+.5*(fxxx_list[xn]+fxxx_list[xn+1])*dt
vx2.append(vx11)
vy2.append(vy11)
vz2.append(vz11)
print vx2, vy2, vz2
My prof. told me I could speed this up by replacing my for-loops by just operating on Numpy arrays, but I found that multiplying non-integers and adding/multiplying in the same Numpy function was inefficient. Is there an elegant way to write this using Numpy instead of a for-loop?
I've already tried this:
#number of iterations
xn=n1[0:998]
array=np.array(xn)
vxn=vx1[0:998]
vyn=vy1[0:998]
vzn=vz1[0:998]
vvv=np.multiply((dt),(fxxx_list))
vx2=vxn+vvv
vy2=vyn+vvv
vz2=vzn+vvv
But I couldn't get my algorithm enitrely correct and as you can see it's kind of a mess and takes just as long as the for-loop.
Try this:
fxxx_list = np.array(fxxx_list)
vx = np.array([vx1,vy1,vy2])
vx11, vy11, vz11 = vx + (fxxx_list[:-2] + fxxx_list[1:-1])/2 * dt
If the arrays vx1,... are also of lenght n_points, than you should use
vx = np.array([vx1,vy1,vy2])[:,:-2]
NOTE: I'am assuming that vx1,... are all of the same length
I have some code that was originally written in C (by someone else) using C-style malloc arrays. I later converted a lot of it to C++ style, using vector<vector<vector<complex>>> arrays for consistency with the rest of my project. I never timed it, but both methods seemed to be of similar speed.
I recently started a new project in python, and I wanted to use some of this old code. Not wanting to move data back and for between projects, I decided to port this old code into python so that it's all in one project. I naively typed up all of the code in python syntax, replacing any arrays in the old code with numpy arrays (initialising them like this array = np.zeros(list((1024, 1024)), dtype=complex)). The code works fine, but it is excruciatingly slow. If I had to guess, I would say it's on the order of 1000 times slower.
Now having looked into it, I see that a lot of people say numpy is very slow for element-wise operations. While I have used some of the numpy functions for common mathematical operations, such as FFTs and matrix multiplication, most of my code involves nested for loops. A lot of it is pretty complicated and doesn't seem to me to be amenable to reducing to simple array operations that are faster in numpy.
So, I'm wondering if there is an alternative to numpy that is faster for these kind of calculations. The ideal scenario would be that there is a module that I can import that has a lot of the same functionality, so I don't have to rewrite much of my code (i.e., something that can do FFTs and initialises arrays in the same way, etc.), but failing that, I would be happy with something that I could at least use for the more computationally demanding parts of the code and cast back and forth between the numpy arrays as needed.
cpython arrays sounded promising, but a lot of benchmarks I've seen don't show enough of a difference in speed for my purposes. To give an idea of the kind of thing I'm talking about, this is one of the methods that is slowing down my code. This is called millions of times, and the vz_at() method contains a lookup table and does some interpolation to give the final return value:
def tra(self, tr, x, y, z_number, i, scalex, idx, rmax2, rminsq):
M = 1024
ixo = int(x[i] / scalex)
iyo = int(y[i] / scalex)
nx1 = ixo - idx
nx2 = ixo + idx
ny1 = iyo - idx
ny2 = iyo + idx
for ix in range(nx1, nx2 + 1):
rx2 = x[i] - float(ix) * scalex
rx2 = rx2 * rx2
ixw = ix
while ixw < 0:
ixw = ixw + M
ixw = ixw % M
for iy in range(ny1, ny2 + 1):
rsq = y[i] - float(iy) * scalex
rsq = rx2 + rsq * rsq
if rsq <= rmax2:
iyw = iy
while iyw < 0:
iyw = iyw + M
iyw = iyw % M
if rsq < rminsq:
rsq = rminsq
vz = P.vz_at(z_number[i], rsq)
tr[ixw, iyw] += vz
All up, there are a couple of thousands of lines of code; this is just a small snippet to give an example. To be clear, a lot of my arrays are 1024x1024x1024 or 1024x1024 and are complex-valued. Others are one-dimensional arrays on the order of a million elements. What's the best way I can speed these element-wise operations up?
For information, some of your code can be made more concise and thus a bit more readable. For instance:
array = np.zeros(list((1024, 1024)), dtype=complex)).
can be written
array = np.zeros((1024, 1024), dtype=complex)
As you are trying out Python, this is at least a nice benefit :-)
Now, for your problem there are several solutions in the current Python scientific landscape:
Numba is a just-in-time compiler for Python that is dedicated to array processing, achieving good performance when NumPy hits its limits.
Pros: Little to no modification of your code as you just write plain Python, shows good performance in many situations. Numba should recognize some NumPy operations to avoid a Numba->Python->NumPy slowdown.
Cons: Can be tedious to install and hence to distribute Numba-based code.
Cython is a mix of Python and C to generate compiled functions. You can start from a pure Python file and accelerate the code via type annotations and the use of some "C"-isms.
Pros: stable, widely used, relatively easy to distribute Cython-based code.
Cons: need to rewrite the performance critical code, even if only in part.
As an additional hint, Nicolas Rougier (a French scientist) wrote an online book on many situations where you can make use of NumPy to speed up Python code: http://www.labri.fr/perso/nrougier/from-python-to-numpy/
Hi I am running scientific computing using numpy + numba.
I've realized that numpy array addition in-place is very slow... compared to matlab
here is the matlab code:
tic;
% A,B are 2-d matrices, ind may not be distinct
for ii=1:N
A(ind(ii),:) = A(ind(ii),:) + B(ii,:);
end
toc;
and here is the numpy code:
s = time.time()
# A,B are numpy.ndarray, ind may not be distinct
for k in xrange(N):
A[ind[k],:] += B[k,:];
print time.time() - s
The result shows that numpy code is 10x slower than matlab... which confuses me a lot.
Moreover, when I pull the addition out of for loop, and just compare a single matrix addition with numpy.add, numpy and matlab seem to be comparable at speed.
One factor I know is that matlab uses JIT for version>=2012a to speed up for loop, but I tried numba on python code, it still does not speed up even a bit. I think this has to do with that numba has not touched numpy.add function at all, hence the performance does not change at all.
I am guessing that matlab does some sick caching for this case, hence it beats numpy dramatically.
Any suggestion on how to speed up numpy ?
Try
A[ind] += B[:N]
i.e. without any loop.
If ind could have duplicate elements, you can use np.add.at:
np.add.at(A, ind, B[:N])
Here'a version that uses dot matrix multiplication. It constructs a matrix of 1s and 0s from ind.
def bar(A,B,ind):
K,M =B.shape
N,M =A.shape
I = np.zeros((N,K))
I[ind,np.arange(K)] = 1
return A+np.dot(I,B)
For a problem with sizes like K,M,N = 30,14,15 this is about 3x faster. But for larger ones like K,M,N = 300,100,150 it's a bit slower.