I have 400 2x2 numpy matrices, and I need to sum them all together. I was wondering if there's any better way to do it than using a for loop, as it consumes a lot of time and memory to iterate through a for loop, particularly if I have more matrices (which might be the case in the future) ?
Just figured it out. All my matrices were on a list, so I used
np.sum(<list>, axis=0)
And it gives me the resultant 2x2 matrix of the sum of all the 400 matrices!
Related
I'm toying with a problem modelled by a linear system, which can be written as a square block-tridiagonal matrix. These blocks are of size b = 4n+8, and the full matrix is of size Nb; N could be arbitrarily large (reasonably, of course) while n is kept rather small (typically less than 10).
The blocks themselves are sparse, the first diagonal being only identity matrices, and the second diagonals having only n+1 non-zero columns (so 3n+7 columns of zeroes) per block. These columns are contiguous, either zeroes then non-zeroes or the other way around.
Building all these blocks in memory results in a 3N-2 x b x b array that can be turned into a sparse matrix with scipy.sparse.bsr_matrix, then cast to CSR format and trimmed of the excess zeroes. It works nicely but I'd rather skip this temporary large and sparse array (for N = 1e4, n = 5 it's 5.6 zeros for every relevant entry!) altogether.
I had a look at scipy.sparse.dok_matrix, recommended for slicing and incremental building. Creating my entries fits in a tidy loop but the process is ~10 times longer than using bsr_matrix with my unnecessary dense array, which will be detrimental to the future use cases.
It doesn't seem like bsr_matrix can be used directly with scipy sparse matrices as input.
Using bsr_matrix without including the diagonal blocks, then adding a sparse eye greatly reduces the number of zeros (3.5 per relevant entry in my test configuration) and speeds up the process by a third compared to the original solution. Score!
Any clue on things that I could do to further reduce the original imprint of this matrix? The obvious goal being to give me more freedom with the choice of N.
EDIT
I managed to improved things a tad more by constructing the three block-diagonals separately. By doing so, I need less padding for my 2nd diagonals (n+3 instead of 3n+7; 1.3 zeroes per relevant entry), dividing my original blocks into two vertical blocks (one full of zeroes) and I only need it one diagonal at a time, cutting the memory cost in half on top of that. The main diagonal remains constructed with the eye method. The icing on the cake: a speed up of 25% compared to my 3rd bullet point, probably because separating the two 2nd diagonals saves some array reshaping operations needed before using bsr_matrix. Compared to the original method, for my (N, n) = (1e4, 5) test case it's ~20M zeroes saved when comparing matrices before trimming. At 128 bits each, it's a decent gain already!
The only possible improvement that I can picture now is building these diagonals separately, without any padding, then inserting columns of zeros (probably via products with block-matrices of identities) and finally adding everything together.
I also read something about using a dict to update an empty dok_matrix, but in my case I think I would need to expand lists of indices, take their Cartesian product to construct the keys and each element of my blocks would need to be an individual value as one apparently cannot use slices as dictionary keys.
I ended up implementing the solution I proposed in my last paragraph.
For each 2nd diagonal, I construct a block sparse matrix without any padding, then I transform it into a matrix of the proper shape by a right-hand side product with a block matrix whose blocks are identity. I do need to store zeroes here to use bsr_matrix (I first gave a try to the method scipy.sparse.block_diag, but it was extremely slow), but less of them compared to my semi-padding solution: (4n+7)(n+1) vs (4n+8)(n+3); and they can be represented with 8 bits instead of 128. Execution time is increased by ~40% but I can live with that (and it's still a decrease of 20% compared to the first solution).
I might be missing something here, but for now I'm pretty satisfied with this solution.
EDIT
When trimming the zeroes of the RHS matrices before effecting the product, the execution time is reduced by 30% compared to the previously most efficient solution; all's well that ends well.
Good afternoon everybody, I was putting raw data into numpy arrays, then I wanted to perform operations, as logarithm base 10, with "if"s to those arrays, nevertheless, those numpy arrays are too big and consequently they take a lot of time to complete them.
enter image description here
x = [ 20*math.log10(i) if i>0 and 20*math.log10(i)>=-60 else (-(120+20*math.log10(abs(i))) if i<0 and 20*math.log10(abs(i))>=-60 else -60) for i in a3 ]
In the piece of code before, I use one of the channels array throwed out from the raw audio data, "a3", and I made another array, "x", that will contain an array to plot from -120 to 0, in the y edge. Futhermore, as you could note, I needed to separate positive original elements from numpy array than negative original elements from numpy array, and also 0s, being -60 the after operations 0. Having this final plot:
enter image description here
The problem with this code, is that, as I said before, it takes approximately 10 seconds to finish the computing, and this is only for 1 channel, and I need to compute 8 channels, so I need to wait approximately 80 seconds.
I wanted to know if there is a faster way to perform this, in addition, I found out a way to apply numpy.log10 to the whole numpy array, and it compute in less than two seconds:
x = 20*numpy.log10(abs(a3))
But I did not find anything related to manipulate the preferences of that operation, numpy.log10, with ifs, conditionals, or something like that. I really need to identify the negative and positive original values, and also the 0s, and obviously transform the 0 to -60, making the -60 the minimum limit, and the reference point, as the code that I showed you before.
Note: I already tried to do it with loops, like "for" and "while", but it takes way more time than the actual method, like 14 second each one.
Thank you for your responses!!
In general, when posting questions, its best practice to include a small working example. I know you included a picture of your data, but that is hard for others to use, so it would have been better to just give us a small array of data. This is important, because the solution often depends on the data. For example, all your data is (i think) between -1 and 1 so that log is always negative. If this isn't the case, then your solution might not work.
There is no need to check if i>0 and then apply abs if i is negative. This is exactly what applying abs does in the first place.
As you noticed, we can also use numpy vectorization to avoid the list comprehension. It is usually faster to do something like np.sin(X) than [ np.sin(x) for x in X].
Finally, if you do something like X>0 in numpy, it returns a boolean array saying if each element is >0.
Note that another way to have written your list comprehension would be first take 20*math.log10(abs(i)) and replace all values <-60 with -60 and then anywhere where i<0, flip the data about -60`. We can do this in the vectorized operation.
-120*(a3<0)+np.sign(a3)*np.maximum(20*np.log10(np.abs(a3)),-60)
This can probably be optimized a bit since a3<0 and np.sign(a3) are doing similar things. That said, I'm pretty sure this is faster than list comprehensions.
I have a numpy script that is currently running quite slowly.
spends the vast majority of it's time performing the following operation inside a loop:
terms=zip(Coeff_3,Coeff_2,Curl_x,Curl_y,Curl_z,Ex,Ey,Ez_av)
res=[np.dot(C2,array([C_x,C_y,C_z]))+np.dot(C3,array([ex,ey,ez])) for (C3,C2,C_x,C_y,C_z,ex,ey,ez) in terms]
res=array(res)
Ex[1:Nx-1]=res[1:Nx-1,0]
Ey[1:Nx-1]=res[1:Nx-1,1]
It's the list comprehension that is really slowing this code down.
In this case, Coeff_3, and Coeff_2 are length 1000 lists whose elements are 3x3 numpy matricies, and Ex,Ey,Ez, Curl_x, etc are all length 1000 numpy arrays.
I realize it might be faster if i did things like setting a single 3x1000 E vector, but i have to perform a significant amount of averaging of different E vectors between step, which would make things very unwieldy.
Curiously however, i perform this operation twice per loop (once for Ex,Ey, once for Ez), and performing the same operation for the Ez's takes almost twice as long:
terms2=zip(Coeff_3,Coeff_2,Curl_x,Curl_y,Curl_z,Ex_av,Ey_av,Ez)
res2=array([np.dot(C2,array([C_x,C_y,C_z]))+np.dot(C3,array([ex,ey,ez])) for (C3,C2,C_x,C_y,C_z,ex,ey,ez) in terms2])
Anyone have any idea what's happening? Forgive me if it's anything obvious, i'm very new to python.
As pointed out in previous comments, use array operations. np.hstack(), np.vstack(), np.outer() and np.inner() are useful here. You're code could become something like this (not sure about your dimensions):
Cxyz = np.vstack((Curl_x,Curl_y,Curl_z))
C2xyz = np.dot(C2, Cxyz)
...
Check the shape of your resulting dimensions, to make sure you translated your problem right. Sometimes numexpr can also to speed up such tasks significantly with little extra effort,
I have a large matrix (approx. 80,000 X 60,000), and I basically want to scramble all the entries (that is, randomly permute both rows and columns independently).
I believe it'll work if I loop over the columns, and use randperm to randomly permute each column. (Or, I could equally well do rows.) Since this involves a loop with 60K iterations, I'm wondering if anyone can suggest a more efficient option?
I've also been working with numpy/scipy, so if you know of a good option in python, that would be great as well.
Thanks!
Susan
Thanks for all the thoughtful answers! Some more info: the rows of the matrix represent documents, and the data in each row is a vector of tf-idf weights for that document. Each column corresponds to one term in the vocabulary. I'm using pdist to calculate cosine similarities between all pairs of papers. And I want to generate a random set of papers to compare to.
I think that just permuting the columns will work, then, because each paper gets assigned a random set of term frequencies. (Permuting the rows just means reordering the papers.) As Jonathan pointed out, this has the advantage of not making a new copy of the whole matrix, and it sounds like the other options all will.
You should be able to reshape the matrix to a 1 × 4800000000 "array", randperm it, and finally reshape it back to a 80000 × 60000 matrix.
This will require copying the 4.8 billion entries 3 times at worst. This might not be efficient.
EDIT: Actually Matlab automatically uses linear indexing, so the first reshape is not needed. Just
reshape(x(randperm(4800000000), 80000, 60000))
is enough (thus reducing 1 unnecessary potential copying).
Note that, this assumes you have a dense matrix. If you have a sparse matrix, you could extract the values, and then randomly reassign indices to them. If there are N nonzero entries, then only 8N copying are needed at worst (3 numbers are required to describe one entry).
I think it would be better to do this:
import numpy as np
flat = matrix.ravel()
np.random.shuffle(flat)
You are basically flattening the matrix to a list, shuffling the list, and then re-constructing a matrix out of the list.
Both solutions above are great, and will work, but I believe both will involve making a completely new copy of the entire matrix in memory while doing the work. Since this is a huge matrix, that's pretty painful. In the case of the MATLAB solution, I think you'll be possibly creating two extra temporary copies, depending on how reshape works internally. I think you were on the right track by operating on columns, but the problem is that it will only scramble along columns. However, I believe if you do randperm along rows after that, you'll end up with a fully permuted matrix. This way you'll only be creating temporary variables that are, at worst, 80,000 by 1. Yes, that's two loops with 60,000 and 80,000 iterations each, but internally that's going to have to happen regardless. The algorithm is going to have to visit each memory location at least twice. You could probably do a more efficient algorithm by writing a C MEX function that operates completely in place, but I assume you'd rather not do that.
I need to diagonalise a very large number of matrices.
These matrices are by themselves quite small (say a x a where a<=10) but due to
their sheer number, it takes a lot of time to diagonalise them all using a for loop
and the numpy.linalg.eig function. So I wanted to make an array of matrices, i.e.,
an array of 2D arrays, but unfortunately, Python seems to consider this to be a 3-dimensional array, gets confused and refuses to do the job. So, is there any way to prevent Python from looking at this array of 2D arrays as a 3D array?
Thanks,
A Python novice
EDIT: To be more clear, I'm not interested in this 3D array per se. Since in general, feeding an array to a function seems to be much faster than using a for loop to feed all elements one by one, I just tried to put all matrices which I need to diagonalise in an array.
If you have an 3D array like:
a = np.random.normal(size=(20,10,10))
you can then just loop through all 20 of the 10x10 arrays using:
for k in xrange(a.shape[0]):
b = np.linalg.eig(a[k,:,:])
where you would save b in a more sophisticated way. This may be what you are already doing, but you can't apply np.linalg.eig to a 3D array and have it calculate along a single axis, so you are stuck with the loop unless there is a formalism for combining all of your arrays into a single 2D array. I doubt however that that would be faster than just looping over the individual 2D arrays.