Neighbors in a 2D array python - python

I have a 2D numpy array as follows:
start = np.array([
[1,1,0,1],
[1,0,0,1],
[0,1,0,0]
])
I need to get the same matrix, but replace each value with the number of neighbors to which I could get by moving by one step in any direction, but walking only along 1
As a result, I should get the follow:
finish = np.array([
[4,4,0,2],
[4,0,0,2],
[0,4,0,0]
])
It seems to me that this is a well-known problem, but I have not even figured out how to formulate it in search, since everything that I was looking for is a bit different.
What's the best way to do this?

You can use the scipy.ndimage labeling function with a customized structure array s:
import numpy as np
from scipy.ndimage import label
start = np.asarray([ [1,1,0,1],
[1,0,0,1],
[0,1,0,0] ])
#structure array what to consider as "neighbors"
s = [[1,1,1],
[1,1,1],
[1,1,1]]
#label blobs in array
labeledarr,_ = label(start, structure=s)
#retrieve blobs and the number of elements within each blobs
blobnr, blobval = np.unique(labeledarr.ravel(), return_counts=True)
#substitute blob label with the number of elements
finish = np.zeros_like(labeledarr)
for k, v in zip(blobnr[1:], blobval[1:]):
finish[labeledarr==k] = v
print(finish)
Output:
[[4 4 0 2]
[4 0 0 2]
[0 4 0 0]]
I am sure the final step of substituting the label number with the value of its occurrence can be optimized in terms of speed.
And #mad-physicist rightly mentioned that the initially used labeledarr.flat should be substituted by labeledarr.ravel(). The reasons for this are explained here.

You can use scipy.ndimage.label to label connected regions and return the number of regions as #Mr.T points out. This can than be used to create a boolean mask for indexing and counting.
Credits should go to #Mr.T as he came up with a similar solution first. This answer is still posted as the second part is different, I find it more readable and its 40% faster on my machine.
import numpy as np
from scipy.ndimage import label
a = [[1,1,0,1],
[1,0,0,1],
[0,1,0,0]])
# Label connected regions, the second arg defines the connection structure
labeled, n_labels = label(a, np.ones((3,3)))
# Replace label value with the size of the connected region
b = np.zeros_like(labeled)
for i in range(1, n_labels+1):
target = (labeled==i)
b[target] = np.count_nonzero(target)
print(b)
output:
[[4 4 0 2]
[4 0 0 2]
[0 4 0 0]]

Related

How to append arrays to another numpy array?

I am trying to loop through a set of coordinates and 'stacking' these arrays of coordinates to another array (so in essence I want to have an array of arrays) using numpy.
This is my attempt:
import numpy as np
all_coordinates = np.array([[]])
for y in range(2):
for x in range(2):
coordinate = np.array([[x,y]])
# append
all_coordinates = np.append(all_coordinates,[coordinate])
print(all_coordinates)
But it's not working. It's just concatenating the individual numbers and not appending the array.
Instead of giving me (the output that I want to achieve):
[[0 0] [1 0] [0,1] [1,1]]
The output I get instead is:
[0 0 1 0 0 1 1 1]
Why? What I am doing wrong here?
The problem that stack functions don't work, is that they need that the row added is of the same size of the already present rows. Using np.array([[]]), the first row is has a length of zero, which means that you can only add rows that also have length zero.
In order to solve this, we need to tell Numpy that the first row is of size two and not zero. The array thus needs to be of size (0, 2) and not (0, 0). This can be done using one of the array-initializing functions that accept size arguments, like empty, zeros or ones. Which function does not matter, as there are no spaces to fill.
Then you can use one of the functions mentioned in comments, like vstack or stack. The code thus becomes:
import numpy as np
all_coordinates = np.zeros((0, 2))
for y in range(2):
for x in range(2):
coordinate = np.array([[x,y]])
# append
all_coordinates = np.vstack((all_coordinates, coordinate))
print(all_coordinates)
In such a case, I would use a list and only convert it into an array once you have appended all the elements you want.
here is a suggested improvement
import numpy as np
all_coordinates = []
for y in range(2):
for x in range(2):
coordinate = np.array([x,y])
# append
all_coordinates.append(coordinate)
all_coordinates = np.array(all_coordinates)
print(all_coordinates)
The output of this code is indeed
array([[0, 0],
[1, 0],
[0, 1],
[1, 1]])

Given a 3D image array, return a list of indices with a value above a threshold and a minimum distance between all selected indices?

I have a 3D numpy array that represents a 3D image and I want to create a list from it with all the (x,y,z) coordinates/index tuples that are both above a certain value, and within a certain distance from other coordinates also above that certain value. So if coords (3,4,5) and (3,3,3) were both above the value, but the minimum distance apart was 4, then only one of these coords would be added to the new array (doesnt matter which).
I thought about doing something like this:
arr = [(x,y,z) for x in range(x_dim) for y in range(y_dim) for z in range(z_dim) if original_arr[z][y][x]>threshold
To get arr, which contains all coordinates above the threshold. Im stuck on how to remove all coordinates from array 'arr' which are then too close to other coordinates also inside it. Checking each coordinate against every other coordinate isnt possible, as due to the image being very large it would take too long.
Any ideas? Thanks
You can replace your threshold checking with:
import numpy as np
arr = np.argwhere(original_array> threshold)
The rest depends on your arr size and data type(please provide image size and dtype to assist better). If the number of points above the threshold is not too high you can use:
from sklearn.metrics.pairwise import euclidean_distances
euclidean_distances(arr,arr)
And check for distance threshold. If it is a high number of points, you can check via a loop iteration(I usually try to avoid changing loop variable array inside the loop, but this will save you a lot of memory space and time in case of large image):
arr = np.argwhere(original_array>threshold)
for i in range(arr.shape[0]):
try:
diff = np.argwhere(np.sum(arr[i+1:,:]-arr[i,:], axis=1)<=distance)
arr = np.delete(arr, diff+i+1, axis=0)
except IndexError as e:
break
your arr will contain coordinates you want:
output for sample code:
original_array = np.arange(40).reshape(10,2,2).astype(np.int32)
threshold = 5
distance = 3
arr:
[[1 1 0]
[4 1 1]
[8 1 1]]
distance matrix between final points:
[[0. 3.16227766 7.07106781]
[3.16227766 0. 4. ]
[7.07106781 4. 0. ]]
EDIT: per comment, if you want to ignore distance along z axis, replace this line:
diff = np.argwhere(np.sum((arr[i+1:,:]-arr[i,:])[:,0:2], axis=1)<=distance)

Summarize ndarray by 2d array in Python

I want to summarize a 3d array dat using indices contained in a 2d array idx.
Consider the example below. For each margin along dat[:, :, i], I want to compute the median according to some index idx. The desired output (out) is a 2d array, whose rows record the index and columns record the margin. The following code works but is not very efficient. Any suggestions?
import numpy as np
dat = np.arange(12).reshape(2, 2, 3)
idx = np.array([[0, 0], [1, 2]])
out = np.empty((3, 3))
for i in np.unique(idx):
out[i,] = np.median(dat[idx==i], axis = 0)
print(out)
Output:
[[ 1.5 2.5 3.5]
[ 6. 7. 8. ]
[ 9. 10. 11. ]]
To visualize the problem better, I will refer to the 2x2 dimensions of the array as the rows and columns, and the 3 dimension as depth. I will refer to vectors along the 3rd dimension as "pixels" (pixels have length 3), and planes along the first two dimensions as "channels".
Your loop is accumulating a set of pixels selected by the mask idx == i, and taking the median of each channel within that set. The result is an Nx3 array, where N is the number of distinct incides that you have.
One day, generalized ufuncs will be ubiquitous in numpy, and np.median will be such a function. On that day, you will be able to use reduceat magic1 to do something like
unq, ind = np.unique(idx, return_inverse=True)
np.median.reduceat(dat.reshape(-1, dat.shape[-1]), np.r_[0, np.where(np.diff(unq[ind]))[0]+1])
1 See Applying operation to unevenly split portions of numpy array for more info on the specific type of magic.
Since this is not currently possible, you can use scipy.ndimage.median instead. This version allows you to compute medians over a set of labeled areas in an array, which is exactly what you have with idx. This method assumes that your index array contains N densely packed values, all of which are in range(N). Otherwise the reshaping operations will not work properly.
If that is not the case, start by transforming idx:
_, ind = np.unique(idx, return_inverse=True)
idx = ind.reshape(idx.shape)
OR
idx = np.unique(idx, return_inverse=True)[1].reshape(idx.shape)
Since you are actually computing a separate median for each region and channel, you will need to have a set of labels for each channel. Flesh out idx to have a distinct set of indices for each channel:
chan = dat.shape[-1]
offset = idx.max() + 1
index = np.stack([idx + i * offset for i in range(chan)], axis=-1)
Now index has an identical set of regions defined in each channel, which you can use in scipy.ndimage.median:
out = scipy.ndimage.median(dat, index, index=range(offset * chan)).reshape(chan, offset).T
The input labels must be densely packed from zero to offset * chan for index=range(offset * chan) to work properly, and the reshape operation to have the right number of elements. The final transpose is just an artifact of how the labels are arranged.
Here is the complete product, along with an IDEOne demo of the result:
import numpy as np
from scipy.ndimage import median
dat = np.arange(12).reshape(2, 2, 3)
idx = np.array([[0, 0], [1, 2]])
def summarize(dat, idx):
idx = np.unique(idx, return_inverse=True)[1].reshape(idx.shape)
chan = dat.shape[-1]
offset = idx.max() + 1
index = np.stack([idx + i * offset for i in range(chan)], axis=-1)
return median(dat, index, index=range(offset * chan)).reshape(chan, offset).T
print(summarize(dat, idx))

How to find linearly independent rows from a matrix

How to identify the linearly independent rows from a matrix? For instance,
The 4th rows is independent.
First, your 3rd row is linearly dependent with 1t and 2nd row. However, your 1st and 4th column are linearly dependent.
Two methods you could use:
Eigenvalue
If one eigenvalue of the matrix is zero, its corresponding eigenvector is linearly dependent. The documentation eig states the returned eigenvalues are repeated according to their multiplicity and not necessarily ordered. However, assuming the eigenvalues correspond to your row vectors, one method would be:
import numpy as np
matrix = np.array(
[
[0, 1 ,0 ,0],
[0, 0, 1, 0],
[0, 1, 1, 0],
[1, 0, 0, 1]
])
lambdas, V = np.linalg.eig(matrix.T)
# The linearly dependent row vectors
print matrix[lambdas == 0,:]
Cauchy-Schwarz inequality
To test linear dependence of vectors and figure out which ones, you could use the Cauchy-Schwarz inequality. Basically, if the inner product of the vectors is equal to the product of the norm of the vectors, the vectors are linearly dependent. Here is an example for the columns:
import numpy as np
matrix = np.array(
[
[0, 1 ,0 ,0],
[0, 0, 1, 0],
[0, 1, 1, 0],
[1, 0, 0, 1]
])
print np.linalg.det(matrix)
for i in range(matrix.shape[0]):
for j in range(matrix.shape[0]):
if i != j:
inner_product = np.inner(
matrix[:,i],
matrix[:,j]
)
norm_i = np.linalg.norm(matrix[:,i])
norm_j = np.linalg.norm(matrix[:,j])
print 'I: ', matrix[:,i]
print 'J: ', matrix[:,j]
print 'Prod: ', inner_product
print 'Norm i: ', norm_i
print 'Norm j: ', norm_j
if np.abs(inner_product - norm_j * norm_i) < 1E-5:
print 'Dependent'
else:
print 'Independent'
To test the rows is a similar approach.
Then you could extend this to test all combinations of vectors, but I imagine this solution scale badly with size.
With sympy you can find the linear independant rows using: sympy.Matrix.rref:
>>> import sympy
>>> import numpy as np
>>> mat = np.array([[0,1,0,0],[0,0,1,0],[0,1,1,0],[1,0,0,1]]) # your matrix
>>> _, inds = sympy.Matrix(mat).T.rref() # to check the rows you need to transpose!
>>> inds
[0, 1, 3]
Which basically tells you the rows 0, 1 and 3 are linear independant while row 2 isn't (it's a linear combination of row 0 and 1).
Then you could remove these rows with slicing:
>>> mat[inds]
array([[0, 1, 0, 0],
[0, 0, 1, 0],
[1, 0, 0, 1]])
This also works well for rectangular (not only for quadratic) matrices.
I edited the code for Cauchy-Schwartz inequality which scales better with dimension: the inputs are the matrix and its dimension, while the output is a new rectangular matrix which contains along its rows the linearly independent columns of the starting matrix. This works in the assumption that the first column in never null, but can be readily generalized in order to implement this case too. Another thing that I observed is that 1e-5 seems to be a "sloppy" threshold, since some particular pathologic vectors were found to be linearly dependent in that case: 1e-4 doesn't give me the same problems. I hope this could be of some help: it was pretty difficult for me to find a really working routine to extract li vectors, and so I'm willing to share mine. If you find some bug, please report them!!
from numpy import dot, zeros
from numpy.linalg import matrix_rank, norm
def find_li_vectors(dim, R):
r = matrix_rank(R)
index = zeros( r ) #this will save the positions of the li columns in the matrix
counter = 0
index[0] = 0 #without loss of generality we pick the first column as linearly independent
j = 0 #therefore the second index is simply 0
for i in range(R.shape[0]): #loop over the columns
if i != j: #if the two columns are not the same
inner_product = dot( R[:,i], R[:,j] ) #compute the scalar product
norm_i = norm(R[:,i]) #compute norms
norm_j = norm(R[:,j])
#inner product and the product of the norms are equal only if the two vectors are parallel
#therefore we are looking for the ones which exhibit a difference which is bigger than a threshold
if absolute(inner_product - norm_j * norm_i) > 1e-4:
counter += 1 #counter is incremented
index[counter] = i #index is saved
j = i #j is refreshed
#do not forget to refresh j: otherwise you would compute only the vectors li with the first column!!
R_independent = zeros((r, dim))
i = 0
#now save everything in a new matrix
while( i < r ):
R_independent[i,:] = R[index[i],:]
i += 1
return R_independent
I know this was asked a while ago, but here is a very simple (although probably inefficient) solution. Given an array, the following finds a set of linearly independent vectors by progressively adding a vector and testing if the rank has increased:
from numpy.linalg import matrix_rank
def LI_vecs(dim,M):
LI=[M[0]]
for i in range(dim):
tmp=[]
for r in LI:
tmp.append(r)
tmp.append(M[i]) #set tmp=LI+[M[i]]
if matrix_rank(tmp)>len(LI): #test if M[i] is linearly independent from all (row) vectors in LI
LI.append(M[i]) #note that matrix_rank does not need to take in a square matrix
return LI #return set of linearly independent (row) vectors
#Example
mat=[[1,2,3,4],[4,5,6,7],[5,7,9,11],[2,4,6,8]]
LI_vecs(4,mat)
I interpret the problem as finding rows that are linearly independent from other rows.
That is equivalent to finding rows that are linearly dependent on other rows.
Gaussian elimination and treat numbers smaller than a threshold as zeros can do that. It is faster than finding eigenvalues of a matrix, testing all combinations of rows with Cauchy-Schwarz inequality, or singular value decomposition.
See:
https://math.stackexchange.com/questions/1297437/using-gauss-elimination-to-check-for-linear-dependence
Problem with floating point numbers:
http://numpy-discussion.10968.n7.nabble.com/Reduced-row-echelon-form-td16486.html
With regards to the following discussion:
Find dependent rows/columns of a matrix using Matlab?
from sympy import *
A = Matrix([[1,1,1],[2,2,2],[1,7,5]])
print(A.nullspace())
It is obvious that the first and second row are multiplication of each other.
If we execute the above code we get [-1/3, -2/3, 1]. The indices of the zero elements in the null space show independence. But why is the third element here not zero? If we multiply the A matrix with the null space, we get a zero column vector. So what's wrong?
The answer which we are looking for is the null space of the transpose of A.
B = A.T
print(B.nullspace())
Now we get the [-2, 1, 0], which shows that the third row is independent.
Two important notes here:
Consider whether we want to check the row dependencies or the column
dependencies.
Notice that the null space of a matrix is not equal to the null
space of the transpose of that matrix unless it is symmetric.
You can basically find the vectors spanning the columnspace of the matrix by using SymPy library's columnspace() method of Matrix object. Automatically, they are the linearly independent columns of the matrix.
import sympy as sp
import numpy as np
M = sp.Matrix([[0, 1, 0, 0],
[0, 0, 1, 0],
[1, 0, 0, 1]])
for i in M.columnspace():
print(np.array(i))
print()
# The output is following.
# [[0]
# [0]
# [1]]
# [[1]
# [0]
# [0]]
# [[0]
# [1]
# [0]]

A particular way of resizing a matrix

Having a nxn (6x6 in the example below) matrix filled only with 0 and 1:
old_matrix=[[0,0,0,1,1,0],
[1,1,1,1,0,0],
[0,0,1,0,0,0],
[1,0,0,0,0,1],
[0,1,1,1,1,0],
[1,0,0,1,1,0]]
I want to resize it in a particular way. Taking (2x2) sub-matrice and checking if there are more ones or zeros. This means the new matrix will be (3x3) If there are more 1 than 0 un the sub-matrice a 1 value will be assigned in the new matrix. Otherwise, (if there are less or equal) its new value will be 0.
new_matrix=[[0,1,0],
[0,0,0],
[0,1,0]]
I've tried to achieve this by using lots of whiles. However it doesn seem to work. Here's what I got so far:
def convert_track(a):
#converts original map to a 8x8 tile Track
NEW_TRACK=[]
w=0 #matrix width
h=0 #matrix heigth
t_w=0 #submatrix width
t_h=0 #submatrix heigth
BLACK=0 #number of ones in submatrix
WHITE=0 #number of zeros in submatrix
while h<=6:
while w<=6:
l=[]
while t_h<=2 and h<=6:
t_w=0
while t_w<=2 and w<=6:
if a[h][w]==1:
BLACK+=1
else:
WHITE+=1
t_w+=1
w+=1
h+=1
t_h+=1
t_w=0
t_h+=1
if BLACK<=WHITE:
l.append(0)
else:
l.append(1)
BLACK=0
WHITE=0
t_h=0
NEW_TRACK.append(l)
return NEW_TRACK
Raises the error list index out of range or returns the list
[[0]]
is there an easier way to achieve this? What am i doing wrong?
If you are willing/able to use NumPy you can do something like this. If you're working with anything like the data you've shown it's well worth your time to learn as operations like these can be done very efficiently and with very little code.
import numpy as np
from scipy.signal import convolve2d
old_matrix=[[0,0,0,1,1,0],
[1,1,1,1,0,0],
[0,0,1,0,0,0],
[1,0,0,0,0,1],
[0,1,1,1,1,0],
[1,0,0,1,1,0]]
a = np.array(old_matrix)
k = np.ones((2,2))
# compute sums at each submatrix
local_sums = convolve2d(a, k, mode='valid')
# restrict to sums corresponding to non-overlapping
# sub-matrices with local_sums[::2, ::2] and check if
# there are more 1 than 0 elements
result = local_sums[::2, ::2] > 2
# convert back to Python list if needed
new_matrix = result.astype(np.int).tolist()
Result:
>>> result.astype(np.int).tolist()
[[0, 1, 0], [0, 0, 0], [0, 1, 0]]
Here I've used convolve2d to compute the sums at each submatrix. From what I can tell you are only interested in non-overlapping sub-matrices, so the part local_sums[::2, ::2] chops out only the sums corresponding to those.

Categories