I have four numpy arrays like:
X1 = array([[1, 2], [2, 0]])
X2 = array([[3, 1], [2, 2]])
I1 = array([[1], [1]])
I2 = array([[1], [1]])
And I'm doing:
Y = array([I1, X1],
[I2, X2]])
To get:
Y = array([[ 1, 1, 2],
[ 1, 2, 0],
[-1, -3, -1],
[-1, -2, -2]])
Like this example, I have large matrices, where X1 and X2 are n x d matrices.
Is there an efficient way in Python whereby I can get the matrix Y?
Although I am aware of the iterative manner, I am searching for an efficient manner to accomplish the above mentioned.
Here, Y is an n x (d+1) matrix and I1 and I2 are identity matrices of the dimension n x 1.
How about the following:
In [1]: import numpy as np
In [2]: X1 = np.array([[1,2],[2,0]])
In [3]: X2 = np.array([[3,1],[2,2]])
In [4]: I1 = np.array([[1],[1]])
In [5]: I2 = np.array([[4],[4]])
In [7]: Y = np.vstack((np.hstack((I1,X1)),np.hstack((I2,X2))))
In [8]: Y
Out[8]:
array([[1, 1, 2],
[1, 2, 0],
[4, 3, 1],
[4, 2, 2]])
Alternatively you could create an empty array of the appropriate size and fill it using the appropriate slices. This would avoid making intermediate arrays.
You need numpy.bmat
In [4]: A = np.mat('1 ; 1 ')
In [5]: B = np.mat('2 2; 2 2')
In [6]: C = np.mat('3 ; 5')
In [7]: D = np.mat('7 8; 9 0')
In [8]: np.bmat([[A,B],[C,D]])
Out[8]:
matrix([[1, 2, 2],
[1, 2, 2],
[3, 7, 8],
[5, 9, 0]])
For a numpy array, this page suggests the syntax may be of the form
vstack([hstack([a,b]),
hstack([c,d])])
Related
I look for an efficient way to get a row-wise intersection of two two-dimensional numpy ndarrays. There is only one intersection per row. For example:
[[1, 2], ∩ [[0, 1], -> [1,
[3, 4]] [0, 3]] 3]
In the best case zeros should be ignored:
[[1, 2, 0], ∩ [[0, 1, 0], -> [1,
[3, 4, 0]] [0, 3, 0]] 3]
My solution:
import numpy as np
arr1 = np.array([[1, 2],
[3, 4]])
arr2 = np.array([[0, 1],
[0, 3]])
arr3 = np.empty(len(arr1))
for i in range(len(arr1)):
arr3[i] = np.intersect1d(arr1[i], arr2[i])
print(arr3)
# [ 1. 3.]
I have about 1 million rows, so the vectorized operations are most preferred. You are welcome to use other python packages.
You can use np.apply_along_axis.
I wrote a solution that pads to the size of the arr1.
Didn't test the efficiency.
import numpy as np
def intersect1d_padded(x):
x, y = np.split(x, 2)
padded_intersection = -1 * np.ones(x.shape, dtype=np.int)
intersection = np.intersect1d(x, y)
padded_intersection[:intersection.shape[0]] = intersection
return padded_intersection
def rowwise_intersection(a, b):
return np.apply_along_axis(intersect1d_padded,
1, np.concatenate((a, b), axis=1))
result = rowwise_intersection(arr1,arr2)
>>> array([[ 1, -1],
[ 3, -1]])
if you know you have only one element in the intersection you can use
result = rowwise_intersection(arr1,arr2)[:,0]
>>> array([1, 3])
You can also modify intersect1d_padded to return a scalar with the intersection value.
I don't know of an elegant way to do it in numpy, but a simple list comprehension can do the trick:
[list(set.intersection(set(_x),set(_y)).difference({0})) for _x,_y in zip(x,y)]
for example, I have the numpy arrays like this
a =
array([[1, 2, 3],
[4, 3, 2]])
and index like this to select the max values
max_idx =
array([[0, 2],
[1, 0]])
how can I access there positions at the same time, to modify them.
like "a[max_idx] = 0" getting the following
array([[1, 2, 0],
[0, 3, 2]])
Simply use subscripted-indexing -
a[max_idx[:,0],max_idx[:,1]] = 0
If you are working with higher dimensional arrays and don't want to type out slices of max_idx for each axis, you can use linear-indexing to assign zeros, like so -
a.ravel()[np.ravel_multi_index(max_idx.T,a.shape)] = 0
Sample run -
In [28]: a
Out[28]:
array([[1, 2, 3],
[4, 3, 2]])
In [29]: max_idx
Out[29]:
array([[0, 2],
[1, 0]])
In [30]: a[max_idx[:,0],max_idx[:,1]] = 0
In [31]: a
Out[31]:
array([[1, 2, 0],
[0, 3, 2]])
Numpy support advanced slicing like this:
a[b[:, 0], b[:, 1]] = 0
Code above would fit your requirement.
If b is more than 2-D. A better way should be like this:
a[np.split(b, 2, axis=1)]
The np.split will split ndarray into columns.
Assume we have two matrices:
x = np.random.randint(10, size=(2, 3, 3))
idx = np.random.randint(3, size=(2, 3))
The question is to access the element of x using idx, in the way as:
dim1 = x[0, range(0,3), idx[0]] # slicing x[0] using idx[0]
dim2 = x[1, range(0,3), idx[1]]
res = np.vstack((dim1, dim2))
Is there a neat way to do this?
You can just index it the basic way, only that the size of indexer array has to match. That's what those .reshape s are for:
x[np.array([0,1]).reshape(idx.shape[0], -1),
np.array([0,1,2]).reshape(-1,idx.shape[1]),
idx]
Out[29]:
array([[ 0.10786251, 0.2527514 , 0.11305823],
[ 0.67264076, 0.80958292, 0.07703623]])
Here's another way to do it with reshaping -
x.reshape(-1,x.shape[2])[np.arange(idx.size),idx.ravel()].reshape(idx.shape)
Sample run -
In [2]: x
Out[2]:
array([[[5, 0, 9],
[3, 0, 7],
[7, 1, 2]],
[[5, 3, 5],
[8, 6, 1],
[7, 0, 9]]])
In [3]: idx
Out[3]:
array([[2, 1, 2],
[1, 2, 0]])
In [4]: x.reshape(-1,x.shape[2])[np.arange(idx.size),idx.ravel()].reshape(idx.shape)
Out[4]:
array([[9, 0, 2],
[3, 1, 7]])
I have a 2D NumPy array and I hope to expand its size on both dimensions by copying the bottom row and right column.
For example, from 2x2:
[[0,1],
[2,3]]
to 4x4:
[[0,1,1,1],
[2,3,3,3],
[2,3,3,3],
[2,3,3,3]]
What's the best way to do it?
Thanks.
Here, the hstack and vstack functions can come in handy. For example,
In [16]: p = array(([0,1], [2,3]))
In [20]: vstack((p, p[-1], p[-1]))
Out[20]:
array([[0, 1],
[2, 3],
[2, 3],
[2, 3]])
And remembering that p.T is the transpose:
So now you can do something like the following:
In [16]: p = array(([0,1], [2,3]))
In [22]: p = vstack((p, p[-1], p[-1]))
In [25]: p = vstack((p.T, p.T[-1], p.T[-1])).T
In [26]: p
Out[26]:
array([[0, 1, 1, 1],
[2, 3, 3, 3],
[2, 3, 3, 3],
[2, 3, 3, 3]])
So the 2 lines of code should do it...
Make an empty array and copy whatever rows, columns you want into it.
def expand(a, new_shape):
x, y = a.shape
r = np.empty(new_shape, a.dtype)
r[:x, :y] = a
r[x:, :y] = a[-1:, :]
r[:x, y:] = a[:, -1:]
r[x:, y:] = a[-1, -1]
return r
I have two numpy matrixes (or sparse equivalents) like:
>>> A = numpy.array([[1,0,2],[3,0,0],[4,5,0],[0,2,2]])
>>> A
array([[1, 0, 2],
[3, 0, 0],
[4, 5, 0],
[0, 2, 2]])
>>> B = numpy.array([[2,3],[3,4],[5,0]])
>>> B
array([[2, 3],
[3, 4],
[5, 0]])
>>> C = mean_dot_product(A, B)
>>> C
array([[6 , 3],
[6 , 9],
[11.5, 16],
[8 , 8]])
where C[i, j] = sum(A[i,k] * B[k,j]) / count_nonzero(A[i,k] * B[k,j])
There is a fast way to preform this operation in numpy?
A non ideal solution is:
>>> maskA = A > 0
>>> maskB = B > 0
>>> maskA.dtype=numpy.uint8
>>> maskB.dtype=numpy.uint8
>>> D = replace_zeros_with_ones(numpy.dot(maskA,maskB))
>>> C = numpy.dot(A,B) / D
Anyone have a better algorithm?
Further, if A or B are sparse matrix, making them dense (replacing zeros with ones) make memory occupation expolde!
Why you need replace_zeros_with_ones? I delete this line and run your code and get the right result.
You can do this by only one line if all the numbers are not negtaive:
np.dot(A, B)/np.dot(np.sign(A), np.sign(B))