Related
I am trying to find a fast vectorized (at least partially) solution finding combinatorial occurrence between two 2D numpy array to identified Single Point Polymorphism linkage.
The shape of each array is
(factors, samples)
an example for matrix 1 is as follows:
array([[0., 1., 1.],
[1., 0., 1.]])
and matrix 2
array([[1., 1., 0.],
[0., 0., 0.]])
I need to find the total number of occurrence along samples axis for each permutation of 2 factors at the same position of 2 matrix (order matters because (1,0) count is different from (0,1) count). Therefore the combinations should be [(0, 0), (0, 1), (1, 0), (1, 1)] and the final output is (factor, factor) for counts of each occurrence.
For combination (0,0) for instance, we get the matrix
array([[0, 1],
[0., 1]])
Because
0 counts (0,0) along row 0 of matrix 1 & row 0 of matrix 2,
1 along row 0 of matrix 1 & row 1 of matrix 2,
0 along row 1 of matrix 1 & row 0 of matrix 2,
1 along row 1 of matrix 1 & row 1 of matrix 2,
With example data
import numpy as np
array1 = np.array([
[0., 1., 1.],
[1., 0., 1.]])
array2 = np.array([
[1., 1., 0.],
[0., 0., 0.]])
We can count the desired combinations with np.einsum and reshape to a suitable array
c1 = np.array([1-array1, array1]).astype('int')
c2 = np.array([1-array2, array2]).astype('int')
np.einsum('ijk,lmk->iljm', c1, c2).reshape(-1, len(array1), len(array2))
Output
array([[[0, 1], # counts for (0,0)
[0, 1]],
[[1, 0], # counts for (0,1)
[1, 0]],
[[1, 2], # counts for (1,0)
[1, 2]],
[[1, 0], # counts for (1,1)
[1, 0]]])
Checking that the previous results are equal to dot products
import itertools as it
np.array([x # y.T for x, y in it.product(c1, c2)])
Output
array([[[0, 1],
[0, 1]],
[[1, 0],
[1, 0]],
[[1, 2],
[1, 2]],
[[1, 0],
[1, 0]]])
Since I realized the solution while trying to derive a manual example for the question, I will just provide that we should solve these by dot products:
matrix1_0 = (array1[0]==0).astype('int')
matrix1_1 = (array1[0]==1).astype('int')
matrix2_0 = (array2[1]==0).astype('int')
matrix2_1 = (array2[1]==1).astype('int')
count_00 = np.dot(matrix1_0 , matrix2_0.T)
count_01 = np.dot(matrix1_0 , matrix2_1.T)
count_10 = np.dot(matrix1_1 , matrix2_0.T)
count_11 = np.dot(matrix1_1 , matrix2_1.T)
These would correspond to sum of number of occurrence for each combination for each factor along a certain axis (sample axis 1 here).
Im looking for an efficient 'for loop' avoiding solution that solves an array related problem I'm having. I want to use a huge 1Darray (A -> size = 250.000) of values between 0 and 40 for indexing in one dimension, and a array (B) with the same size with values between 0 and 9995 for indexing in a second dimension.
The result should be an array with size (41, 9996) with for each index the amount of times that any value from array 1 occurs at a value from array 2.
Example:
A = [0, 3, 2, 4, 3]
B = [1, 2, 2, 0, 2]
which should result in:
[[0, 1, 0,
[0, 0, 0,
[0, 0, 1,
[0, 0, 2,
[1, 0, 0]]
The dirty way is too slow as the amount of data is huge, what you would be able to do is:
out = np.zeros(41,9995)
for i in A:
for j in B:
out[i,j] += 1
which will take 238.000 * 238.000 loops...
I've tried this, which works partially:
out = np.zeros(41,9995)
out[A,B] += 1
Which generates a result with 1 everywhere, regardless of the amount of times the values occur.
Does anyone have a clue how to fix this? Thanks in advance!
You are looking for a sparse tensor:
import torch
A = [0, 3, 2, 4, 3]
B = [1, 2, 2, 0, 2]
idx = torch.LongTensor([A, B])
torch.sparse.FloatTensor(idx, torch.ones(idx.shape[1]), torch.Size([5,3])).to_dense()
Output:
tensor([[0., 1., 0.],
[0., 0., 0.],
[0., 0., 1.],
[0., 0., 2.],
[1., 0., 0.]])
You can also do the same with scipy sparse matrix:
import numpy as np
from scipy.sparse import coo_matrix
coo_matrix((np.ones(len(A)), (np.array(A), np.array(B))), shape=(5,3)).toarray()
output:
array([[0., 1., 0.],
[0., 0., 0.],
[0., 0., 1.],
[0., 0., 2.],
[1., 0., 0.]])
Sometimes it is better to leave the matrix in its sparse representation, rather than forcing it to be "dense" again.
Use numpy.add.at:
import numpy as np
A = [0, 3, 2, 4, 3]
B = [1, 2, 2, 0, 2]
arr = np.zeros((5, 3))
np.add.at(arr, (A, B), 1)
print(arr)
Output
[[0. 1. 0.]
[0. 0. 0.]
[0. 0. 1.]
[0. 0. 2.]
[1. 0. 0.]]
Given that the numbers are in a small range, bincount would be a good choice for bin-based summing -
def accumulate_coords(A,B):
nrows = A.max()+1
ncols = B.max()+1
return np.bincount(A*ncols+B,minlength=nrows*ncols).reshape(-1,ncols)
Sample run -
In [55]: A
Out[55]: array([0, 3, 2, 4, 3])
In [56]: B
Out[56]: array([1, 2, 2, 0, 2])
In [58]: accumulate_coords(A,B)
Out[58]:
array([[0, 1, 0],
[0, 0, 0],
[0, 0, 1],
[0, 0, 2],
[1, 0, 0]])
I have an array of shape [batch_size, N], for example:
[[1 2]
[3 4]
[5 6]]
and I need to create a 3 indices array with shape [batch_size, N, N] where for every batch I have a N x N diagonal matrix, where diagonals are taken by the corresponding batch element, for example in this case, In this simple case, the result I am looking for is:
[
[[1,0],[0,2]],
[[3,0],[0,4]],
[[5,0],[0,6]],
]
How can I make this operation without for loops and exploting vectorization? I guess it is an extension of dimension, but I cannot find the correct function to do this.
(I need it as I am working with tensorflow and prototyping with numpy).
Try it in tensorflow:
import tensorflow as tf
A = [[1,2],[3 ,4],[5,6]]
B = tf.matrix_diag(A)
print(B.eval(session=tf.Session()))
[[[1 0]
[0 2]]
[[3 0]
[0 4]]
[[5 0]
[0 6]]]
Approach #1
Here's a vectorized one with np.einsum for input array, a -
# Initialize o/p array
out = np.zeros(a.shape + (a.shape[1],),dtype=a.dtype)
# Get diagonal view and assign into it input array values
diag = np.einsum('ijj->ij',out)
diag[:] = a
Approach #2
Another based on slicing for assignment -
m,n = a.shape
out = np.zeros((m,n,n),dtype=a.dtype)
out.reshape(-1,n**2)[...,::n+1] = a
Using np.expand_dims with an element-wise product with np.eye
a = np.array([[1, 2],
[3, 4],
[5, 6]])
N = a.shape[1]
a = np.expand_dims(a, axis=1)
a*np.eye(N)
array([[[1., 0.],
[0., 2.]],
[[3., 0.],
[0., 4.]],
[[5., 0.],
[0., 6.]]])
Explanation
np.expand_dims(a, axis=1) adds a new axis to a, which will now be a (3, 1, 2) ndarray:
array([[[1, 2]],
[[3, 4]],
[[5, 6]]])
You can now multiply this array with a size N identity matrix, which you can generate with np.eye:
np.eye(N)
array([[1., 0.],
[0., 1.]])
Which will yield the desired output:
a*np.eye(N)
array([[[1., 0.],
[0., 2.]],
[[3., 0.],
[0., 4.]],
[[5., 0.],
[0., 6.]]])
Yu can use numpy.diag
m = [[1, 2],
[3, 4],
[5, 6]]
[np.diag(b) for b in m]
EDIT The following plot shows the average execution time for the solution above (solid line), and compared it against #Divakar's (dashed line) for different batch-sizes and different matrix sizes
I don't believe you get much of an improvement, but this is just based on this simple metric
You basically want a function that does the opposite of/reverses np.block(..)
I needed the same thing, so I wrote this little function:
def split_blocks(x, m=2, n=2):
"""
Reverse the action of np.block(..)
>>> x = np.random.uniform(-1, 1, (2, 18, 20))
>>> assert (np.block(split_blocks(x, 3, 4)) == x).all()
:param x: (.., M, N) input matrix to split into blocks
:param m: number of row splits
:param n: number of column, splits
:return:
"""
x = np.array(x, copy=False)
nd = x.ndim
*shape, nr, nc = x.shape
return list(map(list, x.reshape((*shape, m, nr//m, n, nc//n)).transpose(nd-2, nd, *range(nd-2), nd-1, nd+1)))
how to convert reshape 1D numpy array to 2D numpy array
and fill with zeroes on the columns.
For example:
Input:
a = np.array([1,2,3])
Expected output:
np.array([[0, 0, 1],
[0, 0, 2],
[0, 0, 3]])
How do I do this?
a = np.array([1,2,3])
Option 1
np.pad (this should be fast)
np.pad(a[:, None], ((0, 0), (2, 0)), mode='constant')
array([[0, 0, 1],
[0, 0, 2],
[0, 0, 3]])
Option 2
Assign a slice to np.zeros (also very fast)
b = np.zeros((3, 3))
b[:, -1] = a
array([[0., 0., 1.],
[0., 0., 2.],
[0., 0., 3.]])
For your specific example:
a = np.array([1,2,3])
a.resize([3, 3])
a = np.rot90(a, k=3)
hope this helps
Create a function that creates a zero array of m x n x 3 dimensionality. Then go through your original matrix and assign its values to those parties of the new zero matrix that should be non-zero.
Is it possible to apply numpy broadcasting (with 1D arrays),
x=np.arange(3)[:,np.newaxis]
y=np.arange(3)
x+y=
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4]])
to 3d matricies similar to the one below, such that each element in a[i] is treated as a 1D vector like in the example above?
a=np.zeros((2,2,2))
a[0]=1
b=a
result=a+b
resulting in
result[0,0]=array([[2, 2],
[2, 2]])
result[0,1]=array([[1, 1],
[1, 1]])
result[1,0]=array([[1, 1],
[1, 1]])
result[1,1]=array([[0, 0],
[0, 0]])
You can do this in the same way as if they are 1d array, i.e, insert a new axis between axis 0 and axis 1 in either a or b:
a + b[:,None] # or a[:,None] + b
(a + b[:,None])[0,0]
#array([[ 2., 2.],
# [ 2., 2.]])
(a + b[:,None])[0,1]
#array([[ 1., 1.],
# [ 1., 1.]])
(a + b[:,None])[1,0]
#array([[ 1., 1.],
# [ 1., 1.]])
(a + b[:,None])[1,1]
#array([[ 0., 0.],
# [ 0., 0.]])
Since a and b are of same shape, say (2,2,2), a+b will indeed work.
The way broadcasting works is that it matches the dimensions of the operands in reverse order, starting from the last dimension going up (e.g. considering columns before rows in a two-dimensional case). If the dimensions match then the next dimension is considered.
In case the dimensions don't match AND if one of the dimensions is 1 then that operand's dimension is repeated to match the other operand (e.g. if a.shape = (2,1,2) and b.shape = (2,2,2) then the values at the 1st dimension of a are repeated to make the shape (2,2,2))