Here i have a 2 numpy arrays, and a function that will take those arrays as an input, and then do some numpy calculation and return the result. It works as it is but it's slow and i think we can use multiprocessing to make it a bit faster.
Anyway, here's my code :
A = #4 dimensions big numpy array
B = #1 dimension numpy array
def function(A, B):
P = np.einsum("ijkl,ij->kl", A, B)
return P.astype(np.uint8)
result = function(A,B)
I'm still quite new into this Multiprocessing stuff, but think that we're able to make array A and B as a shared memory (maybe using multiprocessing.Array() ??) , and then make multiple processes to compute the function(A, B). But i still can't quite understand how to put all of that in the code.
EDIT:
Alright, so it seems like the approach above doesn't work, but let's try another case, but now, lets say the length of array A is 120, and now i want to use only 3/4 parts of array A from index number 0 to 89 and use all of array B in the Process No.1
And then, i also want to use 3/4 parts of array A but from index number 30 to 119 and use all parts of array B in the Process No.2, will that help? Of course i can make the A array even larger to get it's part computed with even more process where, but the thing is, will this concept works?
Related
My goal is to create an array where each elemet is normal(size={})) of each element of it.
I am trying to oprimize:
it = 2 ** arange(6, 25)
M = zeros(len(it))
for x in range(len(it)):
M[x] = (normal(size=it[x]))
I have these not working so far:
N = zeros(len(it))
it = 2 ** arange(6, 25)
N = (normal(size=it))
Further I tried:
N = (normal(size=it[:]))
Provided my data, I believe that such a manual work, or for loop is really inefficient, so I am trying to come up with vectorized operations.
i receive:
File "mtrand.pyx", line 1335, in numpy.random.mtrand.RandomState.normal
File "common.pyx", line 557, in numpy.random.common.cont
ValueError: array is too big; `arr.size * arr.dtype.itemsize` is larger than the maximum possible size.
you've not been very precise in where these functions are coming from, but I'm guessing that by normal(size=it[:]) you mean:
import numpy as np
it = 2 ** np.arange(6, 25)
np.random.normal(size=it)
which would be telling numpy to create a 19 dimensional array (i.e. len(it)) that contains 6 × 1085 elements (i.e. np.prod(it.astype(float)) as floats because the number overflows an int64). numpy is saying that it can't do that, which seems like a reasonable thing to do.
Numpy doesn't like the "ragged arrays" you're trying to create, neither do most matrix/numeric libraries, hence support is limited!
I'm unsure why you consider that the "loop is really inefficient". You're creating ~33 million of floats from 19 iterations of a simple Python loop. The vast majority of time will be in highly optimised Numpy library code and some tiny (basically unmeasurable) amount of time will be spent evaluating your Python bytecode.
If you really want a one-liner then you can do:
X = [np.random.normal(size=2**i) for i in range(6, 25)]
which makes the split between Numpy and Python worlds more obvious.
Note that on my laptop, the Python code executes in ~5µs while the Numpy code runs for ~800ms. So you're trying to optimise the 0.0006% part!
Note that it's not always a win to use Numpy's vectorization, it only helps with larger arrays, for example the above loop is "faster" than:
X = [np.random.normal(i) for i in 2**np.arange(6, 25)]
4.8 vs 5.1 µs for the Python code, because of the time spent marshalling objects into/out of the Numpy world. Again, none of this matters, just use whichever solution makes your code easier to understand. A few microseconds is nothing compared to seconds.
I have written this for loop program below where I go through element by element of an array and do some math to those elements. Once the math is calculated it gets stored into another array.
for i in range(0, 1024):
x[i] = a * data[i]+ b * x[(i-1)] + c * x[(i-2)]
So in my program a, b, and c are just scalar numbers. Data and x are arrays. Data has an array size 1024 filled with numbers in each element. X is also an array size 1024 but it's filled with all zeros initially. In order to calculate the new elements of x I use the previous two elements of x. Initially the previous two are 0 and 0 since it takes the last two element from the x array of zeros. I multiply the current element of data by a, the last element of x by b, and the second to last element of x by c. Then I add everything up and save it to the current element of x. Then I do the same thing for every element in data and x.
This loop program works but I was wondering if there is a faster way to do it? Maybe using a combination of numpy functions like cumsum or dot product? Can some one help me maybe make the program faster? Thank you!
Best you could do using recursive method:
x = a * data
coef = np.array([c,b])
for i in range(2, 1024):
x[i] += np.dot(coef, x[i-2:i])
But even better, you can solve this recurrence equation to a closed form solution and apply directly without loop. (This is a basic 2nd order linear equation)
In general, if you want a programm that is fast, Python is not the best option. Python is great for prototyping since it is easy and has a lot of tools, however it is not verry computationally efficient in it's raw form if you compare it to for example C. What I usually do is to use Cython, is is a module for python that let's you convert your script to machiene code (as you do with C) which would greatly increase the speed of the appliation.
It let's you type cast the variables for example:
cdef double a, b, c
When you use a variable in Python the variables has to be checked every single time to make sure what type of variable it is (int, double, string etc). In C, that is not an issue since you have to decide from the start what the variable should be, decreasing the time consumption of the operation.
I would try to transform the for loop in a list comprehension which has much faster processing time in python.
I have two real arrays (a and b), and I would like create a complex array (c) which takes the two real arrays as its real and imaginary parts respectively.
The simplest one would be
c = a + b * 1.0j
However, since my data size is quite large, such code is not very efficient.
We can also do the following,
c = np.empty(data_shape)
c.real = a
c.imag = b
I am wondering is there a better way to do that (e.g. using buffer or something)?
Thank you very much!
Since the real and imaginary parts of each element have to be contiguous, you will have to allocate another buffer to interleave the data no matter what. The second method shown in the question is therefore about as efficient as you're likely to get. One alternative would be
np.stack((a, b), axis=-1).view(np.complex).squeeze(-1)
This works for any array shape, not just 1D. It ensures proper interleaving by stacking along the last dimension in C order.
This assumes that your datatype is np.float. If not, either promote to float (e.g. a = a.astype(float)), or possibly change np.complex to something else.
I have been browsing through the questions, and could find some help, but I prefer having confirmation by asking it directly. So here is my problem.
I have an (numpy) array u of dimension N, from which I want to build a square matrix k of dimension N^2. Basically, each matrix element k(i,j) is defined as k(i,j)=exp(-|u_i-u_j|^2).
My first naive way to do it was like this, which is, I believe, Fortran-like:
for i in range(N):
for j in range(N):
k[i][j]=np.exp(np.sum(-(u[i]-u[j])**2))
However, this is extremely slow. For N=1000, for example, it is taking around 15 seconds.
My other way to proceed is the following (inspired by other questions/answers):
i, j = np.ogrid[:N,:N]
k = np.exp(np.sum(-(u[i]-u[j])**2,axis=2))
This is way faster, as for N=1000, the result is almost instantaneous.
So I have two questions.
1) Why is the first method so slow, and why is the second one so fast ?
2) Is there a faster way to do it ? For N=10000, it is starting to take quite some time already, so I really don't know if this was the "right" way to do it.
Thank you in advance !
P.S: the matrix is symmetric, so there must also be a way to make the process faster by calculating only the upper half of the matrix, but my question was more related to the way to manipulate arrays, etc.
First, a small remark, there is no need to use np.sum if u can be re-written as u = np.arange(N). Which seems to be the case since you wrote that it is of dimension N.
1) First question:
Accessing indices in Python is slow, so best is to not use [] if there is a way to not use it. Plus you call multiple times np.exp and np.sum, whereas they can be called for vectors and matrices. So, your second proposal is better since you compute your k all in once, instead of elements by elements.
2) Second question:
Yes there is. You should consider using only numpy functions and not using indices (around 3 times faster):
k = np.exp(-np.power(np.subtract.outer(u,u),2))
(NB: You can keep **2 instead of np.power, which is a bit faster but has smaller precision)
edit (Take into account that u is an array of tuples)
With tuple data, it's a bit more complicated:
ma = np.subtract.outer(u[:,0],u[:,0])**2
mb = np.subtract.outer(u[:,1],u[:,1])**2
k = np.exp(-np.add(ma, mb))
You'll have to use twice np.substract.outer since it will return a 4 dimensions array if you do it in one time (and compute lots of useless data), whereas u[i]-u[j] returns a 3 dimensions array.
I used np.add instead of np.sum since it keep the array dimensions.
NB: I checked with
N = 10000
u = np.random.random_sample((N,2))
I returns the same as your proposals. (But 1.7 times faster)
I have a problem of performances in inizializing a dictionary of 4-D numpy tensor.
I have a list of coefficients names:
cnames = ['CN', 'CM', 'CA', 'CY', 'CLN' ...];
that is not fixed-size (it depends from upper code).
For each coefficient i have to generate a 4-D tensor [nalpha X nmach X nbeta X nalt] of zeros (for preallocation purposes), so I do:
#Number of coefficients
numofc = len(cnames);
final_data = {};
#I have to generate <numofc> 4D matrixes
for i in range(numofc):
final_data[cnames[i]]=n.zeros((nalpha,nmach,nbeta,nalt));
each index is an integer between 10 and 30.
each index is an integer between 100 and 200
This takes like 4 minutes. How can I speed up this? Or am I doing something wrong?
The code you posted should not take 4 minutes to run (unless cnames is extremely large or you have very little RAM and is forced to use swap space).
import numpy as np
cnames = ['CN', 'CM', 'CA', 'CY', 'CLN']*1000
nalpha,nmach,nbeta,nalt = 10,20,30,40
#Number of coefficients
numofc = len(cnames)
final_data = {}
#I have to generate <numofc> 4D matrixes
for i in range(numofc):
final_data[cnames[i]]=np.zeros((nalpha,nmach,nbeta,nalt))
Even if cnames has 5000 elements, it should still take only on the order of a couple seconds:
% time test.py
real 0m4.559s
user 0m0.856s
sys 0m3.328s
The semicolons at the end of statements suggests you have experience in some other language. Be careful of translating commands from that language line-by-line into NumPy/Python. Coding in NumPy as one would in C is a recipe for slowness.
In particular, try to avoid updating elements in an array element-by-element. This works fine in C, but is very slow with Python. NumPy achieves speed by delegating to functions coded in Fortran or Cython or C or C++. By updating arrays element-by-element you are using Python loops which are not as fast.
Instead, try to rephrase your computation in terms of operations on whole arrays (or at least, slices of arrays).
I have probably speculated too much on the cause of the problem. You need to profile your code, and then, if you want more specific help, post the result of the profile plus the problematic code (most helpfully in the form of an SSCCE).