Is there a way through python numpy operations to produce the following result?
Input 1d array :
[3, 0, 0, 2, 2, 1]
Output 2d array :
3 0 0 2 2 1
0 0 0 2 2 1
0 0 0 2 2 1
2 2 2 2 2 1
2 2 2 2 2 1
1 1 1 1 1 1
in addition to jeromie’s brilliant answer, here is a support for unordered array:
indexes = np.arange(len(arr))
idx = np.maximum(indexes[None,:], indexes[:, None])
arr[idx]
Assuming the input array always contains increasing values, you can use np.maximum(arr[None,:], arr[:, None]). This compute the maximum of arr[i] and arr[j] for all items at the location (i, j) of the output array thanks to Numpy broadcasting. If the input does not always contains increasing values, then the out needs to be better defined.
Adir's solution is brilliant and should be accepted.
EDIT: original answer --
It seems that you want to essentially make a matrix by "rotating" a vector like a wiper blade from the top left corner. This produces that pattern:
def wiper_blade_matrix(x):
n = len(x)
z = np.zeros((n, n), x.dtype)
for k in range(-(n - 1), n):
z[np.where(np.eye(n, k=k))] += x[abs(k):]
return z
Usage:
In [4]: wiper_blade_matrix(np.array([0, 0, 0, 1, 1, 2]))
Out[4]:
array([[0, 0, 0, 1, 1, 2],
[0, 0, 0, 1, 1, 2],
[0, 0, 0, 1, 1, 2],
[1, 1, 1, 1, 1, 2],
[1, 1, 1, 1, 1, 2],
[2, 2, 2, 2, 2, 2]])
In [5]: wiper_blade_matrix(np.array([0, 0, 0, 5, 1, 2]))
Out[5]:
array([[0, 0, 0, 5, 1, 2],
[0, 0, 0, 5, 1, 2],
[0, 0, 0, 5, 1, 2],
[5, 5, 5, 5, 1, 2],
[1, 1, 1, 1, 1, 2],
[2, 2, 2, 2, 2, 2]])
In [6]: wiper_blade_matrix(np.array([3, 0, 0, 2, 2, 1]))
Out[6]:
array([[3, 0, 0, 2, 2, 1],
[0, 0, 0, 2, 2, 1],
[0, 0, 0, 2, 2, 1],
[2, 2, 2, 2, 2, 1],
[2, 2, 2, 2, 2, 1],
[1, 1, 1, 1, 1, 1]])
Related
This question is very similar to this question but I am not sure how to apply the answer on 2D or 3D arrays.
For a simple example, using the following 2 dimensional array of shape (5,5):
In [158]: a
Out[158]:
array([[0, 0, 0, 0, 0],
[0, 1, 1, 1, 0],
[0, 1, 1, 1, 0],
[0, 1, 1, 1, 0],
[0, 0, 0, 0, 0]])
I want to get the indices of the edges. For this case:
(array([1, 1, 1, 2, 2, 3, 3, 3]), array([1, 2, 3, 1, 3, 1, 2, 3]))
Right now I shift the array in both directions/axes, compare to the original array and identify the cells that have different values:
In [230]: np.nonzero(a!= np.roll(a,shift=(1,1),axis=(0,1)))
Out[230]: (array([1, 1, 1, 2, 2, 3, 3, 4, 4, 4]), array([1, 2, 3, 1, 4, 1, 4, 2, 3, 4]))
Some of the indices are correct but some others not. I guess that the 4s should become 3s because of the shifts I applied but I am not sure how to correct this since I am planning to apply this to much more complicated (and bigger) mask arrays. My final goal is to apply this to 3D arrays.
I am using Python 3.7.1
You can convolve your array with an edge detection filter
import numpy as np
from scipy.ndimage import convolve
x = np.array(
[[0, 0, 0, 0, 0],
[0, 1, 1, 1, 0],
[0, 1, 1, 1, 0],
[0, 1, 1, 1, 0],
[0, 0, 0, 0, 0]])
fil = [[-1,-1,-1],
[-1, 8,-1],
[-1,-1,-1]]
np.where(convolve(x,fil, mode='constant') > 1)
Out:
(array([1, 1, 1, 2, 2, 3, 3, 3]), array([1, 2, 3, 1, 3, 1, 2, 3]))
The result of the convolution
convolve(x,fil, mode='constant')
Out:
[[-1 -2 -3 -2 -1]
[-2 5 3 5 -2]
[-3 3 0 3 -3]
[-2 5 3 5 -2]
[-1 -2 -3 -2 -1]]
Another, similar post called Flood Fill in Python is a very general question on flood fill and the answer only contains a broad pseudo code example. I'm look for an explicit solution with numpy or scipy.
Let's take this array for example:
a = np.array([
[0, 1, 1, 1, 1, 0],
[0, 0, 1, 2, 1, 1],
[0, 1, 1, 1, 1, 0]
])
For selecting element 0, 0 and flood fill with value 3, I'd expect:
[
[3, 1, 1, 1, 1, 0],
[3, 3, 1, 2, 1, 1],
[3, 1, 1, 1, 1, 0]
]
For selecting element 0, 1 and flood fill with value 3, I'd expect:
[
[0, 3, 3, 3, 3, 0],
[0, 0, 3, 2, 3, 3],
[0, 3, 3, 3, 3, 0]
]
For selecting element 0, 5 and flood fill with value 3, I'd expect:
[
[0, 1, 1, 1, 1, 3],
[0, 0, 1, 2, 1, 1],
[0, 1, 1, 1, 1, 0]
]
This should be a fairly basic operation, no? Which numpy or scipy method am I overlooking?
Approach #1
Module scikit-image offers the built-in to do the same with skimage.segmentation.flood_fill -
from skimage.morphology import flood_fill
flood_fill(image, (x, y), newval)
Sample runs -
In [17]: a
Out[17]:
array([[0, 1, 1, 1, 1, 0],
[0, 0, 1, 2, 1, 1],
[0, 1, 1, 1, 1, 0]])
In [18]: flood_fill(a, (0, 0), 3)
Out[18]:
array([[3, 1, 1, 1, 1, 0],
[3, 3, 1, 2, 1, 1],
[3, 1, 1, 1, 1, 0]])
In [19]: flood_fill(a, (0, 1), 3)
Out[19]:
array([[0, 3, 3, 3, 3, 0],
[0, 0, 3, 2, 3, 3],
[0, 3, 3, 3, 3, 0]])
In [20]: flood_fill(a, (0, 5), 3)
Out[20]:
array([[0, 1, 1, 1, 1, 3],
[0, 0, 1, 2, 1, 1],
[0, 1, 1, 1, 1, 0]])
Approach #2
We can use skimage.measure.label with some array-masking -
from skimage.measure import label
def floodfill_by_xy(a,xy,newval):
x,y = xy
l = label(a==a[x,y])
a[l==l[x,y]] = newval
return a
To make use of SciPy based label function - scipy.ndimage.measurements.label, it would mostly be the same -
from scipy.ndimage.measurements import label
def floodfill_by_xy_scipy(a,xy,newval):
x,y = xy
l = label(a==a[x,y])[0]
a[l==l[x,y]] = newval
return a
Note : These would work as in-situ edits.
In numpy is it possible to make a difference between this 2 arrays:
[[0 0 0 0 1 1 1 1 2 2 2 2]
[0 1 2 3 0 1 2 3 0 1 2 3]]
[[0 0 0 0 1 1 1 2 2 2]
[0 1 2 3 0 2 3 0 1 2]]
to have this result
[[1 2]
[1 3]]
?
This is one way. You can also use numpy.unique for a similar solution (easier in v1.13+, see Find unique rows in numpy.array), but if performance is not an issue you can use set.
import numpy as np
A = np.array([[0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2],
[0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3]])
B = np.array([[0, 0, 0, 0, 1, 1, 1, 2, 2, 2],
[0, 1, 2, 3, 0, 2, 3, 0, 1, 2]])
res = np.array(list(set(map(tuple, A.T)) - set(map(tuple, B.T)))).T
array([[2, 1],
[3, 1]])
We can think 2D array as 2 times of 1D array and using numpy.setdiff1d to compare them.
What about:
a=[[0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2], [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3]]
b=[[0, 0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 3, 0, 2, 3, 0, 1, 2]]
a = np.array(a).T
b = np.array(b).T
A = [tuple(t) for t in a]
B = [tuple(t) for t in b]
set(A)-set(B)
Out: {(1, 1), (2, 3)}
Given a matrix in python numpy which has for some of its rows, leading zeros. I need to shift all zeros to the end of the line.
E.g.
0 2 3 4
0 0 1 5
2 3 1 1
should be transformed to
2 3 4 0
1 5 0 0
2 3 1 1
Is there any nice way to do this in python numpy?
To fix for leading zeros rows -
def fix_leading_zeros(a):
mask = a!=0
flipped_mask = mask[:,::-1]
a[flipped_mask] = a[mask]
a[~flipped_mask] = 0
return a
To push all zeros rows to the back -
def push_all_zeros_back(a):
# Based on http://stackoverflow.com/a/42859463/3293881
valid_mask = a!=0
flipped_mask = valid_mask.sum(1,keepdims=1) > np.arange(a.shape[1]-1,-1,-1)
flipped_mask = flipped_mask[:,::-1]
a[flipped_mask] = a[valid_mask]
a[~flipped_mask] = 0
return a
Sample runs -
In [220]: a
Out[220]:
array([[0, 2, 3, 4],
[0, 0, 1, 5],
[2, 3, 1, 1]])
In [221]: fix_leading_zero_rows(a)
Out[221]:
array([[2, 3, 4, 0],
[1, 5, 0, 0],
[2, 3, 1, 1]])
In [266]: a
Out[266]:
array([[0, 2, 3, 4, 0],
[0, 0, 1, 5, 6],
[2, 3, 0, 1, 0]])
In [267]: push_all_zeros_back(a)
Out[267]:
array([[2, 3, 4, 0, 0],
[1, 5, 6, 0, 0],
[2, 3, 1, 0, 0]])
leading zeros, simple loop
ar = np.array([[0, 2, 3, 4],
[0, 0, 1, 5],
[2, 3, 1, 1]])
for i in range(ar.shape[0]):
for j in range(ar.shape[1]): # prevent infinite loop if row all zero
if ar[i,0] == 0:
ar[i]=np.roll(ar[i], -1)
ar
Out[31]:
array([[2, 3, 4, 0],
[1, 5, 0, 0],
[2, 3, 1, 1]])
What is the most pythonic equivalent for matlab's dummyvar function in order to deal with category variables nicely?
Here is an example illustrating my problem, with a NxM matrix which denotes M different ways of partitioning N data points into <=N categories.
>> partitions
array([[1, 1, 2, 2, 1, 2, 2, 2, 1, 1],
[1, 2, 2, 1, 2, 1, 2, 2, 2, 1],
[1, 1, 1, 2, 2, 2, 1, 3, 3, 2]])
The task is to efficiently count the number of times that any two data points are classified into the same category and store the result in a NxN matrix. In matlab this could be accomplished as a one-liner with dummyvar which creates a column variable for each category for each partition.
>> dummyvar(partitions)*dummyvar(partitions)'
ans =
3 2 1 1 1 1 1 0 1 2
2 3 2 0 2 0 2 1 2 1
1 2 3 1 1 1 3 2 1 0
1 0 1 3 1 3 1 1 0 2
1 2 1 1 3 1 1 1 2 2
1 0 1 3 1 3 1 1 0 2
1 2 3 1 1 1 3 2 1 0
0 1 2 1 1 1 2 3 2 0
1 2 1 0 2 0 1 2 3 1
2 1 0 2 2 2 0 0 1 3
The most efficient way that I can think of to solve this task is writing an O(n*m) loop that emulates dummyvar's behavior. (Note that the code below prefers partition.shape[0] << partition.shape[1], which is likely to be true in general but is unsafe to assume).
dv=np.zeros((0,10))
for row in partitions:
for val in xrange(1,np.max(row)+1):
dv=np.vstack((dv,row==val))
np.dot(dv.T,dv)
And of course because vstack in a loop is very inefficient this can be improved by finding the desired size and creating the array to start out with, but I am really looking for a one liner to do it just as in matlab.
Edit: Some more information about what I am doing just for added context. I am writing library functions in python (where no python implementation exists) for a library for analyzing brain networks. Existing working matlab source is avaiable. Due to domain-specific constraints the roughly maximal size of the input is networks of a few thousand nodes. However, basically all of the functions I write have to scale well to large inputs.
You can do a little broadcasting magic to get your dummy arrays fast:
>>> partitions = np.array([[1, 1, 2, 2, 1, 2, 2, 2, 1, 1],
... [1, 2, 2, 1, 2, 1, 2, 2, 2, 1],
... [1, 1, 1, 2, 2, 2, 1, 3, 3, 2]])
>>> n = np.max(partitions)
>>> d = (partitions.T[:, None, :] == np.arange(1, n+1)[:, None]).astype(np.int)
>>> d = d.reshape(partitions.shape[1], -1)
>>> d.dot(d.T)
array([[3, 2, 1, 1, 1, 1, 1, 0, 1, 2],
[2, 3, 2, 0, 2, 0, 2, 1, 2, 1],
[1, 2, 3, 1, 1, 1, 3, 2, 1, 0],
[1, 0, 1, 3, 1, 3, 1, 1, 0, 2],
[1, 2, 1, 1, 3, 1, 1, 1, 2, 2],
[1, 0, 1, 3, 1, 3, 1, 1, 0, 2],
[1, 2, 3, 1, 1, 1, 3, 2, 1, 0],
[0, 1, 2, 1, 1, 1, 2, 3, 2, 0],
[1, 2, 1, 0, 2, 0, 1, 2, 3, 1],
[2, 1, 0, 2, 2, 2, 0, 0, 1, 3]])
There is the obvious drawback that, even if a row has only a few different values, the dummy array we are creating will have as many columns for that row as needed for the row with the most values. But unless you have huge arrays, it is probably going to be faster than any other approach.
Well, if you are after a scalable solution, you want to use a sparse array for your dummy matrix. The following code may be hard to follow if you are not familiar with the details of the CSR sparse format:
import scipy.sparse as sps
def sparse_dummyvar(partitions):
num_rows = np.sum(np.max(partitions, axis=1))
nnz = np.prod(partitions.shape)
as_part = np.argsort(partitions, axis=1)
# You could get s_part from the indices in as_part, left as
# an exercise for the reader...
s_part = np.sort(partitions, axis=1)
mask = np.hstack(([[True]]*len(items_per_row),
s_part[:, :-1] != s_part[:, 1:]))
indptr = np.where(mask.ravel())[0]
indptr = np.append(indptr, nnz)
return sps.csr_matrix((np.repeat([1], nnz), as_part.ravel(), indptr),
shape=(num_rows, partitions.shape[1],))
This returns the transpose of dummyvar(partitions). You could get the array without transposing simply by calling csc_matrix instead of csr_matrix and swapping the shape values. But since you only are after the product of the matrix with its transpose, and scipy converts everything to CSR format before multiplying, it is probably slightly faster like this. You can now do:
>>> dT = sparse_dummyvar(partitions)
>>> dT.T.dot(dT)
<10x10 sparse matrix of type '<type 'numpy.int32'>'
with 84 stored elements in Compressed Sparse Column format>
>>> dT.T.dot(dT).A
array([[3, 2, 1, 1, 1, 1, 1, 0, 1, 2],
[2, 3, 2, 0, 2, 0, 2, 1, 2, 1],
[1, 2, 3, 1, 1, 1, 3, 2, 1, 0],
[1, 0, 1, 3, 1, 3, 1, 1, 0, 2],
[1, 2, 1, 1, 3, 1, 1, 1, 2, 2],
[1, 0, 1, 3, 1, 3, 1, 1, 0, 2],
[1, 2, 3, 1, 1, 1, 3, 2, 1, 0],
[0, 1, 2, 1, 1, 1, 2, 3, 2, 0],
[1, 2, 1, 0, 2, 0, 1, 2, 3, 1],
[2, 1, 0, 2, 2, 2, 0, 0, 1, 3]])