I'm new with Python and programming in general.
I want to create a function that multiplies two np.array of the same size and get their scalar value, for example:
matrix_1 = np.array([[1, 1], [0, 1], [1, 0]])
matrix_2 = np.array([[1, 2], [1, 1], [0, 0]])
I want to get 4 as output ((1 * 1) + (1 * 2) + (0 * 1) + (1 * 1) + (1 * 0) + (0 * 0))
Thanks!
Multiply two matrices element-wise
Sum all the elements
multiplied_matrix = np.multiply(matrix_1,matrix_2)
sum_of_elements = np.sum(multiplied_matrix)
print(sum_of_elements) # 4
Or in one shot:
print(np.sum(np.multiply(matrix_1, matrix_2))) # 4
You can make use of np.multiply() to multiply the two arrays elementwise, then we call np.sum() on this matrix. So we thus can calculate the result with:
np.multiply(matrix_1, matrix_2).sum()
For your given sample matrix, we thus obtain:
>>> matrix_1 = np.array([[1, 1], [0, 1], [1, 0]])
>>> matrix_2 = np.array([[1, 2], [1, 1], [0, 0]])
>>> np.multiply(matrix_1, matrix_2)
array([[1, 2],
[0, 1],
[0, 0]])
>>> np.multiply(matrix_1, matrix_2).sum()
4
There are a couple of ways to do it (Frobenius inner product) using numpy, e.g.
np.sum(A * B)
np.dot(A.flatten(), B.flatten())
np.trace(np.dot(A, B.T))
np.einsum('ij,ij', A, B)
One recommended way is using numpy.einsum, since it can be adapted to not only matrices but also multiway arrays (i.e., tensors).
Matrices of the same size
Take the matrices what you give as an example,
>>> import numpy as np
>>> matrix_1 = np.array([[1, 1], [0, 1], [1, 0]])
>>> matrix_2 = np.array([[1, 2], [1, 1], [0, 0]])
then, we have
>>> np.einsum('ij, ij ->', matrix_1, matrix_2)
4
Vectors of the same size
An example like this:
>>> vector_1 = np.array([1, 2, 3])
>>> vector_2 = np.array([2, 3, 4])
>>> np.einsum('i, i ->', vector_1, vector_2)
20
Tensors of the same size
Take three-way arrays (i.e., third-order tensors) as an example,
>>> tensor_1 = np.array([[[1, 2], [3, 4]], [[2, 3], [4, 5]], [[3, 4], [5, 6]]])
>>> print(tensor_1)
[[[1 2]
[3 4]]
[[2 3]
[4 5]]
[[3 4]
[5 6]]]
>>> tensor_2 = np.array([[[2, 3], [4, 5]], [[3, 4], [5, 6]], [[6, 7], [8, 9]]])
>>> print(tensor_2)
[[[2 3]
[4 5]]
[[3 4]
[5 6]]
[[6 7]
[8 9]]]
then, we have
>>> np.einsum('ijk, ijk ->', tensor_1, tensor_2)
248
For more usage about numpy.einsum, I recommend:
Understanding NumPy's einsum
Related
I want to create an array made of arrays in Python with numpy
I'm trying to calcule the inverse of a matrix made by some other matrix using numpy method linalg.inv() but it calculates one inverse for each submatrix instead of a general inverse
for example, lets say I have:
a = np.array([[1, 2],
[3, 4]])
b = np.array([[5, 6],
[7, 8]])
i = np.array([[1, 0],
[0, 1]])
what I've tried is:
c = np.array([[a, i],
[i, b]])
what I want is
>> [[1, 2, 1, 0]
[3, 4, 0, 1]
[1, 0, 5, 6]
[0, 1, 7, 8]]
what I get is
>> [[[[1 2]
[3 4]]
[[1 0]
[0 1]]]
[[[1 0]
[0 1]]
[[5 6]
[7 8]]]]
You can use the np.block function, which can be used to assemble a block of matrices. You can do something like this,
np.block([[a,i],[i,b]])
I have this set of equations I want to perform:
x = np.linspace(0, 2, 3)
y = np.linspace(x, x+2, 3)
I then want to populate the 2D array with a calculation that does:
a = 2*x + y
So for example, given an array:
x = [0, 1, 2]
Then, the array y is:
y = [[0, 1, 2],
[1, 2, 3],
[2, 3, 4]]
When I perform the operation a = 2*x + y I should get the array:
a = [[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]
How do I do this, keeping in mind I want to perform this operation quickly for array of size up to 10000x10000 (or larger)?
Or do your code adding two Ts:
print((2*x+y.T).T)
Output:
[[0 1 2]
[3 4 5]
[6 7 8]]
I'm encountering a problem that I hope you can help me solve.
I have a 2D numpy array which I want to divide into bins by value. Then I need to know the exact initial indices of all the numbers in each bin.
For example, consider the matrix
[[1,2,3], [4,5,6], [7,8,9]]
and the bin array
[0,2,4,6,8,10].
Then the element first element ([0,0]) should be stored in one bin, the next two elements ([0,1],[0,2]) should be stored in another bin and so on. The desired output looks like this:
[[[0,0]],[[0,1],[0,2]],[[1,0],[1,1]],[[1,2],[2,0]],[[2,1],[2,2]]]
Even though I tried several numpy functions, I'm not able to do this in an elegant way. The best attempt might be
>>> a = [[1,2,3], [4,5,6], [7,8,9]]
>>> bins = [0,2,4,6,8,10]
>>> bin_in_mat = np.digitize(a, bins, right=False)
>>> bin_in_mat
array([[1, 2, 2],
[3, 3, 4],
[4, 5, 5]])
>>> indices = np.argwhere(bin_in_mat)
>>> indices
array([[0, 0],
[0, 1],
[0, 2],
[1, 0],
[1, 1],
[1, 2],
[2, 0],
[2, 1],
[2, 2]])
but this doesn't solve my problem. Any suggestions?
You need to leave numpy and use a loop for this - it's not capable of representing your result:
bin_in_mat = np.digitize(a, bins, right=False)
bin_contents = [np.argwhere(bin_in_mat == i) for i in range(len(bins))]
>>> for b in bin_contents:
... print(repr(b))
array([], shape=(0, 2), dtype=int64)
array([[0, 0]], dtype=int64)
array([[0, 1],
[0, 2]], dtype=int64)
array([[1, 0],
[1, 1]], dtype=int64)
array([[1, 2],
[2, 0]], dtype=int64)
array([[2, 1],
[2, 2]], dtype=int64)
Note that digitize is a bad choice for large integer input (until 1.15), and is faster and more correct as bin_in_mat = np.searchsorted(bins, a, side='left')
Consider the array a
np.random.seed([3,1415])
a = np.random.randint(0, 10, (10, 2))
a
array([[0, 2],
[7, 3],
[8, 7],
[0, 6],
[8, 6],
[0, 2],
[0, 4],
[9, 7],
[3, 2],
[4, 3]])
What is a vectorized way to get the cumulative argmax?
array([[0, 0], <-- both start off as max position
[1, 1], <-- 7 > 0 so 1st col = 1, 3 > 2 2nd col = 1
[2, 2], <-- 8 > 7 1st col = 2, 7 > 3 2nd col = 2
[2, 2], <-- 0 < 8 1st col stays the same, 6 < 7 2nd col stays the same
[2, 2],
[2, 2],
[2, 2],
[7, 2], <-- 9 is new max of 2nd col, argmax is now 7
[7, 2],
[7, 2]])
Here is a non-vectorized way to do it.
Notice that as the window expands, argmax applies to the growing window.
pd.DataFrame(a).expanding().apply(np.argmax).astype(int).values
array([[0, 0],
[1, 1],
[2, 2],
[2, 2],
[2, 2],
[2, 2],
[2, 2],
[7, 2],
[7, 2],
[7, 2]])
Here's a vectorized pure NumPy solution that performs pretty snappily:
def cumargmax(a):
m = np.maximum.accumulate(a)
x = np.repeat(np.arange(a.shape[0])[:, None], a.shape[1], axis=1)
x[1:] *= m[:-1] < m[1:]
np.maximum.accumulate(x, axis=0, out=x)
return x
Then we have:
>>> cumargmax(a)
array([[0, 0],
[1, 1],
[2, 2],
[2, 2],
[2, 2],
[2, 2],
[2, 2],
[7, 2],
[7, 2],
[7, 2]])
Some quick testing on arrays with thousands to millions of values suggests that this is anywhere between 10-50 times faster than looping at the Python level (either implicitly or explicitly).
I cant think of a way to vectorize this over both columns easily; but if the number of columns is small relative to the number of rows, that shouldn't be an issue and a for loop should suffice for that axis:
import numpy as np
import numpy_indexed as npi
a = np.random.randint(0, 10, (10))
max = np.maximum.accumulate(a)
idx = npi.indices(a, max)
print(idx)
I would like to make a function that computes cumulative argmax for 1d array and then apply it to all columns. This is the code:
import numpy as np
np.random.seed([3,1415])
a = np.random.randint(0, 10, (10, 2))
def cumargmax(v):
uargmax = np.frompyfunc(lambda i, j: j if v[j] > v[i] else i, 2, 1)
return uargmax.accumulate(np.arange(0, len(v)), 0, dtype=np.object).astype(v.dtype)
np.apply_along_axis(cumargmax, 0, a)
The reason for converting to np.object and then converting back is a workaround for Numpy 1.9, as mentioned in generalized cumulative functions in NumPy/SciPy?
Seemingly simple question: I have an array with two columns, the first represents an ID and the second a count. I'd like to update it with another, similar array such that
import numpy as np
a = np.array([[1, 2],
[2, 2],
[3, 1],
[4, 5]])
b = np.array([[2, 2],
[3, 1],
[4, 0],
[5, 3]])
a.update(b) # ????
>>> np.array([[1, 2],
[2, 4],
[3, 2],
[4, 5],
[5, 3]])
Is there a way to do this with indexing/slicing such that I don't simply have to iterate over each row?
Generic case
Approach #1: You can use np.add.at to do such an ID-based adding operation like so -
# First column of output array as the union of first columns of a,b
out_id = np.union1d(a[:,0],b[:,0])
# Initialize second column of output array
out_count = np.zeros_like(out_id)
# Find indices where the first columns of a,b are placed in out_id
_,a_idx = np.where(a[:,None,0]==out_id)
_,b_idx = np.where(b[:,None,0]==out_id)
# Place second column of a into out_id & add in second column of b
out_count[a_idx] = a[:,1]
np.add.at(out_count, b_idx,b[:,1])
# Stack the ID and count arrays into a 2-column format
out = np.column_stack((out_id,out_count))
To find a_idx and b_idx, as probably a faster alternative, np.searchsorted could be used like so -
a_idx = np.searchsorted(out_id, a[:,0], side='left')
b_idx = np.searchsorted(out_id, b[:,0], side='left')
Sample input-output :
In [538]: a
Out[538]:
array([[1, 2],
[4, 2],
[3, 1],
[5, 5]])
In [539]: b
Out[539]:
array([[3, 7],
[1, 1],
[4, 0],
[2, 3],
[6, 2]])
In [540]: out
Out[540]:
array([[1, 3],
[2, 3],
[3, 8],
[4, 2],
[5, 5],
[6, 2]])
Approach #2: You can use np.bincount to do the same ID based adding -
# First column of output array as the union of first columns of a,b
out_id = np.union1d(a[:,0],b[:,0])
# Get all IDs and counts in a single arrays
id_arr = np.concatenate((a[:,0],b[:,0]))
count_arr = np.concatenate((a[:,1],b[:,1]))
# Get binned summations
summed_vals = np.bincount(id_arr,count_arr)
# Get mask of valid bins
mask = np.in1d(np.arange(np.max(out_id)+1),out_id)
# Mask valid summed bins for final counts array output
out_count = summed_vals[mask]
# Stack the ID and count arrays into a 2-column format
out = np.column_stack((out_id,out_count))
Specific case
If the ID columns in a and b are sorted, it becomes easier, as we can just use masks with np.in1d to index into the output ID array created with np.union like so -
# First column of output array as the union of first columns of a,b
out_id = np.union1d(a[:,0],b[:,0])
# Masks of first columns of a and b matches in the output ID array
mask1 = np.in1d(out_id,a[:,0])
mask2 = np.in1d(out_id,b[:,0])
# Initialize second column of output array
out_count = np.zeros_like(out_id)
# Place second column of a into out_id & add in second column of b
out_count[mask1] = a[:,1]
np.add.at(out_count, np.where(mask2)[0],b[:,1])
# Stack the ID and count arrays into a 2-column format
out = np.column_stack((out_id,out_count))
Sample run -
In [552]: a
Out[552]:
array([[1, 2],
[2, 2],
[3, 1],
[4, 5],
[8, 5]])
In [553]: b
Out[553]:
array([[2, 2],
[3, 1],
[4, 0],
[5, 3],
[6, 2],
[8, 2]])
In [554]: out
Out[554]:
array([[1, 2],
[2, 4],
[3, 2],
[4, 5],
[5, 3],
[6, 2],
[8, 7]])
>>> col=np.unique(np.hstack((b[:,0],a[:,0])))
>>> dif=np.setdiff1d(col,a[:,0])
>>> val=b[np.in1d(b[:,0],dif)]
>>> result=np.concatenate((a,val))
array([[1, 2],
[2, 2],
[3, 1],
[4, 5],
[5, 3]])
Note that if you want the result become sorted you can use np.lexsort :
result[np.lexsort((result[:,0],result[:,0]))]
Explanation :
First you can find the unique ids with following command :
>>> col=np.unique(np.hstack((b[:,0],a[:,0])))
>>> col
array([1, 2, 3, 4, 5])
Then find the different between the ids if a and all of ids :
>>> dif=np.setdiff1d(col,a[:,0])
>>> dif
array([5])
Then find the items within b with the ids in diff :
>>> val=b[np.in1d(b[:,0],dif)]
>>> val
array([[5, 3]])
And at last concatenate the result with list a:
>>> np.concatenate((a,val))
consider another example with sorting :
>>> a = np.array([[1, 2],
... [2, 2],
... [3, 1],
... [7, 5]])
>>>
>>> b = np.array([[2, 2],
... [3, 1],
... [4, 0],
... [5, 3]])
>>>
>>> col=np.unique(np.hstack((b[:,0],a[:,0])))
>>> dif=np.setdiff1d(col,a[:,0])
>>> val=b[np.in1d(b[:,0],dif)]
>>> result=np.concatenate((a,val))
>>> result[np.lexsort((result[:,0],result[:,0]))]
array([[1, 2],
[2, 2],
[3, 1],
[4, 0],
[5, 3],
[7, 5]])
That's an old question but here is a solution with pandas (that could be generalized for other aggregation functions than sum). Also sorting will occur automatically:
import pandas as pd
import numpy as np
a = np.array([[1, 2],
[2, 2],
[3, 1],
[4, 5]])
b = np.array([[2, 2],
[3, 1],
[4, 0],
[5, 3]])
print((pd.DataFrame(a[:, 1], index=a[:, 0])
.add(pd.DataFrame(b[:, 1], index=b[:, 0]), fill_value=0)
.astype(int))
.reset_index()
.to_numpy())
Output:
[[1 2]
[2 4]
[3 2]
[4 5]
[5 3]]