Let's say I have following numpy arrays:
import numpy as np
a = np.array([1, 2])
b = np.array([1])
c = np.array([1, 4, 8, 10])
How can I do something like np.vstack((a, b, c)) without any error? I know there is a pure python way l = [a, b, c] but that's not efficient enough. I'd like to implement it in a numpy method. Do you have any idea? Thanks in advance!
In [863]: a = np.array([1, 2])
In [864]: b = np.array([1])
In [865]: c = np.array([1, 4, 8, 10])
A list of these 3 arrays:
In [866]: ll=[a,b,c]
An object dtype array made from this list:
In [867]: A=np.array(ll)
In [868]: A
Out[868]: array([array([1, 2]), array([1]), array([ 1, 4, 8, 10])], dtype=object)
A, like ll contains pointers to data objects elsewhere in memory. In terms of memory use they are equally efficient.
In [870]: id(A[1]),id(b)
Out[870]: (3032501768, 3032501768)
You can perform a limited number of math operations on the elements of A, for example addition works as one might expect
In [871]: A+3
Out[871]: array([array([4, 5]), array([4]), array([ 4, 7, 11, 13])], dtype=object)
But there's little to no speed advantage, e.g.
In [876]: timeit [x+3 for x in ll]
100000 loops, best of 3: 9.52 µs per loop
In [877]: timeit A+3
100000 loops, best of 3: 14.6 µs per loop
and other things like np.max don't work. You have to test this case by case.
More details here: Maintaining numpy subclass inside a container after applying ufunc and other object array questions.
To get numpy speed, you need to imbed the vectors into an array. Either a 2D array or 1D array could work. You could make an array of zeros that is large enough to hold all the values. Then put the vectors in that array. Or, you could make a large 1D array and concatenate the vectors end to end.
import numpy as np
a = np.array([1, 2])
b = np.array([1])
c = np.array([1, 4, 8, 10])
# Imbed the vectors in a 2D array
A = np.zeros((3, max(a.size, b.size, c.size)))
A[0, :a.size] = a
A[1, :b.size] = b
A[2, :c.size] = c
# 1D array imbedding
B = np.zeros(a.size + b.size + c.size)
B[:a.size] = a
B[a.size:(a.size+b.size)] = b
B[(a.size+b.size):] = c
%timeit A+3
1000000 loops, best of 3: 780 ns per loop
%timeit B+3
1000000 loops, best of 3: 764 ns per loop
This has the advantage of numpy speed. But it involves more coding work, and it is less easy to interpret the values of your arrays.
Also, to decide whether the 1D or 2D solution is better, it makes sense to think about how your using the arrays. For example, if the values are Fourier series coefficients, then the 2D array would probably be better. With a 2D array you can keep specific elements of your vectors aligned.
However, I could also imagine applications where concatenating vectors into a single 1D array would make more sense. I hope this was helpful.
Related
Given an x-dataset,
x = np.array([1, 2, 3, 4, 5])
what is the most efficient way to create the NumPy array where each x coordinate is paired with a y-coordinate of value 0? I am wondering if there is a way specifically that doesn't require any hard coding, so that x could vary in length without causing failure.
As per your problem statement, the following is one way to do it.
# initialize an array of zeros
In [36]: res = np.zeros((2, *x.shape), dtype=x.dtype)
# fill `x` as first row
In [37]: res[0] = x
In [38]: res
Out[38]:
array([[1, 2, 3, 4],
[0, 0, 0, 0]])
When we initialize the array of zeros, we use 2 for axis-0 dimension since your requirement is to create a 2D array. For the column size we simply take the length from the x array. For reasonably larger arrays, this approach would be the fastest.
This sounds simple, and I think I'm overcomplicating this in my mind.
I want to make an array whose elements are generated from two source arrays of the same shape, depending on which element in the source arrays is greater.
to illustrate:
import numpy as np
array1 = np.array((2,3,0))
array2 = np.array((1,5,0))
array3 = (insert magic)
>> array([2, 5, 0))
I can't work out how to produce an array3 that combines the elements of array1 and array2 to produce an array where only the greater of the two array1/array2 element values is taken.
Any help would be much appreciated. Thanks.
We could use NumPy built-in np.maximum, made exactly for that purpose -
np.maximum(array1, array2)
Another way would be to use the NumPy ufunc np.max on a 2D stacked array and max-reduce along the first axis (axis=0) -
np.max([array1,array2],axis=0)
Timings on 1 million datasets -
In [271]: array1 = np.random.randint(0,9,(1000000))
In [272]: array2 = np.random.randint(0,9,(1000000))
In [274]: %timeit np.maximum(array1, array2)
1000 loops, best of 3: 1.25 ms per loop
In [275]: %timeit np.max([array1, array2],axis=0)
100 loops, best of 3: 3.31 ms per loop
# #Eric Duminil's soln1
In [276]: %timeit np.where( array1 > array2, array1, array2)
100 loops, best of 3: 5.15 ms per loop
# #Eric Duminil's soln2
In [277]: magic = lambda x,y : np.where(x > y , x, y)
In [278]: %timeit magic(array1, array2)
100 loops, best of 3: 5.13 ms per loop
Extending to other supporting ufuncs
Similarly, there's np.minimum for finding element-wise minimum values between two arrays of same or broadcastable shapes. So, to find element-wise minimum between array1 and array2, we would have :
np.minimum(array1, array2)
For a complete list of ufuncs that support this feature, please refer to the docs and look for the keyword : element-wise. Grep-ing for those, I got the following ufuncs :
add, subtract, multiply, divide, logaddexp, logaddexp2, true_divide,
floor_divide, power, remainder, mod, fmod, divmod, heaviside, gcd,
lcm, arctan2, hypot, bitwise_and, bitwise_or, bitwise_xor, left_shift,
right_shift, greater, greater_equal, less, less_equal, not_equal,
equal, logical_and, logical_or, logical_xor, maximum, minimum, fmax,
fmin, copysign, nextafter, ldexp, fmod
If your condition ever becomes more complex, you could use np.where:
import numpy as np
array1 = np.array((2,3,0))
array2 = np.array((1,5,0))
array3 = np.where( array1 > array2, array1, array2)
# array([2, 5, 0])
You could replace array1 > array2 with any condition. If all you want is the maximum, go with #Divakar's answer.
And just for fun :
magic = lambda x,y : np.where(x > y , x, y)
magic(array1, array2)
# array([2, 5, 0])
I am working on a python project and making use of numpy. I frequently have to compute Kronecker products of matrices by the identity matrix. These are a pretty big bottleneck in my code so I would like to optimize them. There are two kinds of products I have to take. The first one is:
np.kron(np.eye(N), A)
This one is pretty easy to optimize by simply using scipy.linalg.block_diag. The product is equivalent to:
la.block_diag(*[A]*N)
Which is about 10 times faster. However, I am unsure on how to optimize the second kind of product:
np.kron(A, np.eye(N))
Is there a similar trick I can use?
One approach would be to initialize an output array of 4D and then assign values into it from A. Such an assignment would broadcast values and this is where we would get efficiency in NumPy.
Thus, a solution would be like so -
# Get shape of A
m,n = A.shape
# Initialize output array as 4D
out = np.zeros((m,N,n,N))
# Get range array for indexing into the second and fourth axes
r = np.arange(N)
# Index into the second and fourth axes and selecting all elements along
# the rest to assign values from A. The values are broadcasted.
out[:,r,:,r] = A
# Finally reshape back to 2D
out.shape = (m*N,n*N)
Put as a function -
def kron_A_N(A, N): # Simulates np.kron(A, np.eye(N))
m,n = A.shape
out = np.zeros((m,N,n,N),dtype=A.dtype)
r = np.arange(N)
out[:,r,:,r] = A
out.shape = (m*N,n*N)
return out
To simulate np.kron(np.eye(N), A), simply swap the operations along the first and second and similarly for third and fourth axes -
def kron_N_A(A, N): # Simulates np.kron(np.eye(N), A)
m,n = A.shape
out = np.zeros((N,m,N,n),dtype=A.dtype)
r = np.arange(N)
out[r,:,r,:] = A
out.shape = (m*N,n*N)
return out
Timings -
In [174]: N = 100
...: A = np.random.rand(100,100)
...:
In [175]: np.allclose(np.kron(A, np.eye(N)), kron_A_N(A,N))
Out[175]: True
In [176]: %timeit np.kron(A, np.eye(N))
1 loops, best of 3: 458 ms per loop
In [177]: %timeit kron_A_N(A, N)
10 loops, best of 3: 58.4 ms per loop
In [178]: 458/58.4
Out[178]: 7.842465753424658
I am interested in calculating a large NumPy array. I have a large array A which contains a bunch of numbers. I want to calculate the sum of different combinations of these numbers. The structure of the data is as follows:
A = np.random.uniform(0,1, (3743, 1388, 3))
Combinations = np.random.randint(0,3, (306,3))
Final_Product = np.array([ np.sum( A*cb, axis=2) for cb in Combinations])
My question is if there is a more elegant and memory efficient way to calculate this? I find it frustrating to work with np.dot() when a 3-D array is involved.
If it helps, the shape of Final_Product ideally should be (3743, 306, 1388). Currently Final_Product is of the shape (306, 3743, 1388), so I can just reshape to get there.
np.dot() won't give give you the desired output , unless you involve extra step(s) that would probably include reshaping. Here's one vectorized approach using np.einsum to do it one shot without any extra memory overhead -
Final_Product = np.einsum('ijk,lk->lij',A,Combinations)
For completeness, here's with np.dot and reshaping as discussed earlier -
M,N,R = A.shape
Final_Product = A.reshape(-1,R).dot(Combinations.T).T.reshape(-1,M,N)
Runtime tests and verify output -
In [138]: # Inputs ( smaller version of those listed in question )
...: A = np.random.uniform(0,1, (374, 138, 3))
...: Combinations = np.random.randint(0,3, (30,3))
...:
In [139]: %timeit np.array([ np.sum( A*cb, axis=2) for cb in Combinations])
1 loops, best of 3: 324 ms per loop
In [140]: %timeit np.einsum('ijk,lk->lij',A,Combinations)
10 loops, best of 3: 32 ms per loop
In [141]: M,N,R = A.shape
In [142]: %timeit A.reshape(-1,R).dot(Combinations.T).T.reshape(-1,M,N)
100 loops, best of 3: 15.6 ms per loop
In [143]: Final_Product =np.array([np.sum( A*cb, axis=2) for cb in Combinations])
...: Final_Product2 = np.einsum('ijk,lk->lij',A,Combinations)
...: M,N,R = A.shape
...: Final_Product3 = A.reshape(-1,R).dot(Combinations.T).T.reshape(-1,M,N)
...:
In [144]: print np.allclose(Final_Product,Final_Product2)
True
In [145]: print np.allclose(Final_Product,Final_Product3)
True
Instead of dot you could use tensordot. Your current method is equivalent to:
np.tensordot(A, Combinations, [2, 1]).transpose(2, 0, 1)
Note the transpose at the end to put the axes in the correct order.
Like dot, the tensordot function can call down to the fast BLAS/LAPACK libraries (if you have them installed) and so should be perform well for large arrays.
I need to compute the trace of a matrix across all its diagonals. That is, for an nxm matrix, the operation should produce n+m-1 'traces'. Here is an example program:
import numpy as np
A=np.arange(12).reshape(3,4)
def function_1(A):
output=np.zeros(A.shape[0]+A.shape[1]-1)
for i in range(A.shape[0]+A.shape[1]-1):
output[i]=np.trace(A,A.shape[1]-1-i)
return output
A
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
function_1(A)
array([ 3., 9., 18., 15., 13., 8.])
My hope is to find a way to replace the loop in the program, since I need to do this computation many times on very large matrices. One avenue that looks promising is
to use numpy.einsum, but I can't quite figure out how to do it. Alternatively I have looked into rewriting the problem entirely with loops in cython:
%load_ext cythonmagic
%%cython
import numpy as np
cimport numpy as np
cimport cython
#cython.boundscheck(False)
#cython.wraparound(False)
def function_2(long [:,:] A):
cdef int n=A.shape[0]
cdef int m=A.shape[1]
cdef long [::1] output = np.empty(n+m-1,dtype=np.int64)
cdef size_t l1
cdef int i,j, k1
cdef long out
it_list1=range(m)
it_list2=range(m,m+n-1)
for l1 in range(len(it_list1)):
k1=it_list1[l1]
i=0
j=m-1-k1
out=0
while (i<n)&(j<m):
out+=A[i,j]
i+=1
j+=1
output[k1]=out
for l1 in range(len(it_list2)):
k1=it_list2[l1]
i=k1-m+1
j=0
out=0
while (i<n)&(j<m):
out+=A[i,j]
i+=1
j+=1
output[k1]=out
return np.array(output)
The cython program outperforms the program looping through np.trace:
%timeit function_1(A)
10000 loops, best of 3: 62.7 µs per loop
%timeit function_2(A)
100000 loops, best of 3: 9.66 µs per loop
So, basically I want to get feedback on whether there was a more efficient way to use numpy/scipy routines, or if I have probably achieved the
fastest way using cython.
If you want to stay away from Cython, building a diagonal index array and using np.bincount may do the trick:
>>> import numpy as np
>>> a = np.arange(12).reshape(3, 4)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> rows, cols = a.shape
>>> rows_arr = np.arange(rows)
>>> cols_arr = np.arange(cols)
>>> diag_idx = rows_arr[:, None] - (cols_arr - (cols - 1))
>>> diag_idx
array([[3, 2, 1, 0],
[4, 3, 2, 1],
[5, 4, 3, 2]])
>>> np.bincount(diag_idx.ravel(), weights=a.ravel())
array([ 3., 9., 18., 15., 13., 8.])
By my timings, for your example input, it is 4x faster than your original pure Python method. So I don't think it is going to be faster than your Cython code, but you may want to time it.
If your matrix shape is sufficiently far away from being square, i.e. if it is tall or wide, then you can use stride tricks efficiently to do this. You can use stride tricks in any case, but it may not be super memory efficient if the matrix is near square.
What you need to do is create a new array view on the same data which is constructed in a way that the step going from one line to the next also causes an increment in the column. This is achieved by changing the strides of the array.
The problem that one needs to take care of lies at the borders of the array, where one needs to zero-pad. If the array is far from being square, this does not matter. If it is square, then we need twice the size of the array to pad.
If you do not need the smaller traces at the edges, then you do not need to zero-pad.
Here goes (assuming more columns than lines, but easily adapted):
import numpy as np
from numpy.lib.stride_tricks import as_strided
A = np.arange(30).reshape(3, 10)
A_embedded = np.hstack([np.zeros([3, 2]), A, np.zeros([3, 2])])
A = A_embedded[:, 2:-2] # We are now sure that the memory around A is padded with 0, but actually we never really need A again
new_strides = (A.strides[0] + A.strides[1], A.strides[1])
B = as_strided(A_embedded, shape=A_embedded[:, :-2].shape, strides=new_strides)
traces = B.sum(0)
print A
print B
print traces
In order to conform with the output you show in your example, you need to reverse it (see #larsmans comment)
traces = traces[::-1]
This is a specific example with concrete numbers. If this is useful to your usecase I can turn it into a general function.
Here's an improved version of your Cython function.
Honestly, this is how I'd do it if Cython is an option.
import numpy as np
from libc.stdint cimport int64_t as i64
from cython cimport boundscheck, wraparound
#boundscheck(False)
#wraparound(False)
def all_trace_int64(i64[:,::1] A):
cdef:
int i,j
i64[:] t = np.zeros(A.shape[0] + A.shape[1] - 1, dtype=np.int64)
for i in range(A.shape[0]):
for j in range(A.shape[1]):
t[A.shape[0]-i+j-1] += A[i,j]
return np.array(t)
This will be significantly faster than the version you give in your question because it iterates over the array in in the order in which it is stored in memory.
For small arrays, the two approaches are nearly the same, though this one is marginally faster on my machine.
I wrote this function so that it requires a C-contiguous array.
If you have a Fortran contiguous array, transpose it, then reverse the order of the output.
This does return the answers in the opposite order from the function shown in your example, so you will need to reverse the order of the array if the order is particularly important.
You may also improve performance by compiling with heavier optimizations.
For example, you could build your Cython code in the IPython notebook with additional compiler flags by replacing
%%cython
with something like
%%cython -c=-O3 -c=-march=native -c=-funroll-loops -f
Edit:
When doing this, you will also want to make sure that your values aren't generated by an outer product. If your values come from an outer product, this operation can be combined with the outer product into a single call to np.convolve.
This is competitive if the array is large:
def f5(A):
rows, cols = A.shape
N = rows + cols -1
out = np.zeros(N, A.dtype)
for idx in range(rows):
out[N-idx-cols:N-idx] += A[idx]
return out[::-1]
Although it uses a Python loop it's faster than the bincount solution (for large arrays.. on my system..)
This method does have high sensitivity to the array column/row ratio, because this ratio determines how much looping is done in Python relative to Numpy.
As #Jaime pointed out it's efficient to iterate the smallest dimension, e.g.:
def f6(A):
rows, cols = A.shape
N = rows + cols -1
out = np.zeros(N, A.dtype)
if rows > cols:
for idx in range(cols):
out[N-idx-rows:N-idx] += A[:, idx]
else:
for idx in range(rows):
out[N-idx-cols:N-idx] += A[idx]
out = out[::-1]
return out
But it should be noted that for larger array sizes (e.g. 100000 x 500 on my system) accessing the array row by row as in the first code I posted could still be faster, probably because of how the array is laid out in the RAM
(it's faster to fetch contiguous chunks than spread out bits).
This can be done by (slightly abusively) using scipy.sparse.dia_matrix in two ways, one sparser than the other.
The first one, yielding the exact result, uses the dia_matrix stored data vector
import numpy as np
from scipy.sparse import dia_matrix
A = np.arange(30).reshape(3, 10)
traces = dia_matrix(A).data.sum(1)[::-1]
A less memory-intensive method would be to work the other way round:
import numpy as np
from scipy.sparse import dia_matrix
A = np.arange(30).reshape(3, 10)
A_dia = dia_matrix((A, range(len(A))), shape=(A.shape[1],) * 2)
traces = np.array(A_dia.sum(1)).ravel()[::-1]
Note however, that two entries are missing in this solution. This may be correctible in a smart way, but I am not sure yet.
#moarningsun found the solution:
rows, cols = A.shape
A_dia = dia_matrix((A, np.arange(rows)), shape=(cols,)*2)
traces1 = A_dia.sum(1).A.ravel()
A_dia = dia_matrix((A, np.arange(-rows+1, 1)), shape=(rows,)*2)
traces2 = A_dia.sum(1).A.ravel()
traces = np.concatenate((traces1[::-1], traces2[-2::-1]))
np.trace does what you want:
import numpy as np
A = array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
n = A.shape[0]
[np.trace(A, i) for i in range(-n+1, n+1)]
Edit: Changed np.sum(np.diag()) to np.trace() according to the suggestion from #user2357112.
Use the numpy array trace method:
import numpy as np
A = np.array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
A.trace()
returns:
15