Related
Is there a way through python numpy operations to produce the following result?
Input 1d array :
[3, 0, 0, 2, 2, 1]
Output 2d array :
3 0 0 2 2 1
0 0 0 2 2 1
0 0 0 2 2 1
2 2 2 2 2 1
2 2 2 2 2 1
1 1 1 1 1 1
in addition to jeromie’s brilliant answer, here is a support for unordered array:
indexes = np.arange(len(arr))
idx = np.maximum(indexes[None,:], indexes[:, None])
arr[idx]
Assuming the input array always contains increasing values, you can use np.maximum(arr[None,:], arr[:, None]). This compute the maximum of arr[i] and arr[j] for all items at the location (i, j) of the output array thanks to Numpy broadcasting. If the input does not always contains increasing values, then the out needs to be better defined.
Adir's solution is brilliant and should be accepted.
EDIT: original answer --
It seems that you want to essentially make a matrix by "rotating" a vector like a wiper blade from the top left corner. This produces that pattern:
def wiper_blade_matrix(x):
n = len(x)
z = np.zeros((n, n), x.dtype)
for k in range(-(n - 1), n):
z[np.where(np.eye(n, k=k))] += x[abs(k):]
return z
Usage:
In [4]: wiper_blade_matrix(np.array([0, 0, 0, 1, 1, 2]))
Out[4]:
array([[0, 0, 0, 1, 1, 2],
[0, 0, 0, 1, 1, 2],
[0, 0, 0, 1, 1, 2],
[1, 1, 1, 1, 1, 2],
[1, 1, 1, 1, 1, 2],
[2, 2, 2, 2, 2, 2]])
In [5]: wiper_blade_matrix(np.array([0, 0, 0, 5, 1, 2]))
Out[5]:
array([[0, 0, 0, 5, 1, 2],
[0, 0, 0, 5, 1, 2],
[0, 0, 0, 5, 1, 2],
[5, 5, 5, 5, 1, 2],
[1, 1, 1, 1, 1, 2],
[2, 2, 2, 2, 2, 2]])
In [6]: wiper_blade_matrix(np.array([3, 0, 0, 2, 2, 1]))
Out[6]:
array([[3, 0, 0, 2, 2, 1],
[0, 0, 0, 2, 2, 1],
[0, 0, 0, 2, 2, 1],
[2, 2, 2, 2, 2, 1],
[2, 2, 2, 2, 2, 1],
[1, 1, 1, 1, 1, 1]])
My dataframe is:
X=[0,1,2
1,0,3
2,3,0]
X shape is 3*3.
For every value, I want to expand n times in every row and column, that is, transform my dataframe to the shape of (3*n)*(3*n),
if n=2, my ideal result is:
X=[0,0,1,1,2,2
0,0,1,1,2,2
1,1,0,0,3,3
1,1,0,0,3,3
2,2,3,3,0,0
2,2,3,3,0,0]
How to do that? thanks!
One way using pandas.Index.repeat:
ind = df.index.repeat(2)
new_df = df.iloc[ind, ind]
print(new_df)
Output:
0 0 1 1 2 2
0 0 0 1 1 2 2
0 0 0 1 1 2 2
1 1 1 0 0 3 3
1 1 1 0 0 3 3
2 2 2 3 3 0 0
2 2 2 3 3 0 0
You could use numpy.repeat, as follows:
import numpy as np
X = np.array([[0, 1, 2],
[1, 0, 3],
[2, 3, 0]] )
res = X.repeat(2, axis=1).repeat(2, axis=0)
print(res)
Output
[[0 0 1 1 2 2]
[0 0 1 1 2 2]
[1 1 0 0 3 3]
[1 1 0 0 3 3]
[2 2 3 3 0 0]
[2 2 3 3 0 0]]
A base python solution (without imports) would be a nested list comprehension:
>>> [[y for y in x for _ in range(3)] for x in X for _ in range(3)]
[[0, 0, 0, 1, 1, 1, 2, 2, 2],
[0, 0, 0, 1, 1, 1, 2, 2, 2],
[0, 0, 0, 1, 1, 1, 2, 2, 2],
[1, 1, 1, 0, 0, 0, 3, 3, 3],
[1, 1, 1, 0, 0, 0, 3, 3, 3],
[1, 1, 1, 0, 0, 0, 3, 3, 3],
[2, 2, 2, 3, 3, 3, 0, 0, 0],
[2, 2, 2, 3, 3, 3, 0, 0, 0],
[2, 2, 2, 3, 3, 3, 0, 0, 0]]
>>>
I have a requirement where I want to convert a 2D matrix to 3D by separating 3 unique values across 3 dimensions.
For Example:
convert
A = [1 2 3 3
1 1 2 1
3 2 2 3
1 3 3 2]
to
A = [[1 0 0 0
1 1 0 1
0 0 0 0
1 0 0 0]
[0 1 0 0
0 0 1 0
0 1 1 0
0 0 0 1]
[0 0 1 1
0 0 0 0
1 0 0 1
0 1 1 0]]
Pardon me if the syntax of matrix representation is not correct.
Use broadcasting with outer-equality for a vectorized solution -
# Input array
In [8]: A
Out[8]:
array([[1, 2, 3, 3],
[1, 1, 2, 1],
[3, 2, 2, 3],
[1, 3, 3, 2]])
In [11]: np.equal.outer(np.unique(A),A).view('i1')
Out[11]:
array([[[1, 0, 0, 0],
[1, 1, 0, 1],
[0, 0, 0, 0],
[1, 0, 0, 0]],
[[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 1, 1, 0],
[0, 0, 0, 1]],
[[0, 0, 1, 1],
[0, 0, 0, 0],
[1, 0, 0, 1],
[0, 1, 1, 0]]], dtype=int8)
To use the explicit dimension-extension + comparison, it would be :
(A == np.unique(A)[:,None,None]).view('i1')
You can use np.unique and take advantage of boolean arrays and cast them to int using numpy.ndarray.astype.
import numpy as np
a=np.array([[1, 2, 3, 3], [1, 1, 2, 1], [3, 2, 2, 3], [1, 3, 3, 2]])
[a==i.astype(int) for i in np.unique(a)]
Output:
[array([[1, 0, 0, 0],
[1, 1, 0, 1],
[0, 0, 0, 0],
[1, 0, 0, 0]]),
array([[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 1, 1, 0],
[0, 0, 0, 1]]),
array([[0, 0, 1, 1],
[0, 0, 0, 0],
[1, 0, 0, 1],
[0, 1, 1, 0]])]
EDIT: Ch3steR's answer is better
A = np.array([[1,2,3,3], [1,1,2,1], [3,2,2,3], [1,3,3,2]])
unique_values = np.unique(A)
B = np.array([np.zeros_like(A) for i in range(len(unique_values))])
for idx, value in enumerate(unique_values):
B[idx][A == value] = 1
Given a matrix in python numpy which has for some of its rows, leading zeros. I need to shift all zeros to the end of the line.
E.g.
0 2 3 4
0 0 1 5
2 3 1 1
should be transformed to
2 3 4 0
1 5 0 0
2 3 1 1
Is there any nice way to do this in python numpy?
To fix for leading zeros rows -
def fix_leading_zeros(a):
mask = a!=0
flipped_mask = mask[:,::-1]
a[flipped_mask] = a[mask]
a[~flipped_mask] = 0
return a
To push all zeros rows to the back -
def push_all_zeros_back(a):
# Based on http://stackoverflow.com/a/42859463/3293881
valid_mask = a!=0
flipped_mask = valid_mask.sum(1,keepdims=1) > np.arange(a.shape[1]-1,-1,-1)
flipped_mask = flipped_mask[:,::-1]
a[flipped_mask] = a[valid_mask]
a[~flipped_mask] = 0
return a
Sample runs -
In [220]: a
Out[220]:
array([[0, 2, 3, 4],
[0, 0, 1, 5],
[2, 3, 1, 1]])
In [221]: fix_leading_zero_rows(a)
Out[221]:
array([[2, 3, 4, 0],
[1, 5, 0, 0],
[2, 3, 1, 1]])
In [266]: a
Out[266]:
array([[0, 2, 3, 4, 0],
[0, 0, 1, 5, 6],
[2, 3, 0, 1, 0]])
In [267]: push_all_zeros_back(a)
Out[267]:
array([[2, 3, 4, 0, 0],
[1, 5, 6, 0, 0],
[2, 3, 1, 0, 0]])
leading zeros, simple loop
ar = np.array([[0, 2, 3, 4],
[0, 0, 1, 5],
[2, 3, 1, 1]])
for i in range(ar.shape[0]):
for j in range(ar.shape[1]): # prevent infinite loop if row all zero
if ar[i,0] == 0:
ar[i]=np.roll(ar[i], -1)
ar
Out[31]:
array([[2, 3, 4, 0],
[1, 5, 0, 0],
[2, 3, 1, 1]])
What is the most pythonic equivalent for matlab's dummyvar function in order to deal with category variables nicely?
Here is an example illustrating my problem, with a NxM matrix which denotes M different ways of partitioning N data points into <=N categories.
>> partitions
array([[1, 1, 2, 2, 1, 2, 2, 2, 1, 1],
[1, 2, 2, 1, 2, 1, 2, 2, 2, 1],
[1, 1, 1, 2, 2, 2, 1, 3, 3, 2]])
The task is to efficiently count the number of times that any two data points are classified into the same category and store the result in a NxN matrix. In matlab this could be accomplished as a one-liner with dummyvar which creates a column variable for each category for each partition.
>> dummyvar(partitions)*dummyvar(partitions)'
ans =
3 2 1 1 1 1 1 0 1 2
2 3 2 0 2 0 2 1 2 1
1 2 3 1 1 1 3 2 1 0
1 0 1 3 1 3 1 1 0 2
1 2 1 1 3 1 1 1 2 2
1 0 1 3 1 3 1 1 0 2
1 2 3 1 1 1 3 2 1 0
0 1 2 1 1 1 2 3 2 0
1 2 1 0 2 0 1 2 3 1
2 1 0 2 2 2 0 0 1 3
The most efficient way that I can think of to solve this task is writing an O(n*m) loop that emulates dummyvar's behavior. (Note that the code below prefers partition.shape[0] << partition.shape[1], which is likely to be true in general but is unsafe to assume).
dv=np.zeros((0,10))
for row in partitions:
for val in xrange(1,np.max(row)+1):
dv=np.vstack((dv,row==val))
np.dot(dv.T,dv)
And of course because vstack in a loop is very inefficient this can be improved by finding the desired size and creating the array to start out with, but I am really looking for a one liner to do it just as in matlab.
Edit: Some more information about what I am doing just for added context. I am writing library functions in python (where no python implementation exists) for a library for analyzing brain networks. Existing working matlab source is avaiable. Due to domain-specific constraints the roughly maximal size of the input is networks of a few thousand nodes. However, basically all of the functions I write have to scale well to large inputs.
You can do a little broadcasting magic to get your dummy arrays fast:
>>> partitions = np.array([[1, 1, 2, 2, 1, 2, 2, 2, 1, 1],
... [1, 2, 2, 1, 2, 1, 2, 2, 2, 1],
... [1, 1, 1, 2, 2, 2, 1, 3, 3, 2]])
>>> n = np.max(partitions)
>>> d = (partitions.T[:, None, :] == np.arange(1, n+1)[:, None]).astype(np.int)
>>> d = d.reshape(partitions.shape[1], -1)
>>> d.dot(d.T)
array([[3, 2, 1, 1, 1, 1, 1, 0, 1, 2],
[2, 3, 2, 0, 2, 0, 2, 1, 2, 1],
[1, 2, 3, 1, 1, 1, 3, 2, 1, 0],
[1, 0, 1, 3, 1, 3, 1, 1, 0, 2],
[1, 2, 1, 1, 3, 1, 1, 1, 2, 2],
[1, 0, 1, 3, 1, 3, 1, 1, 0, 2],
[1, 2, 3, 1, 1, 1, 3, 2, 1, 0],
[0, 1, 2, 1, 1, 1, 2, 3, 2, 0],
[1, 2, 1, 0, 2, 0, 1, 2, 3, 1],
[2, 1, 0, 2, 2, 2, 0, 0, 1, 3]])
There is the obvious drawback that, even if a row has only a few different values, the dummy array we are creating will have as many columns for that row as needed for the row with the most values. But unless you have huge arrays, it is probably going to be faster than any other approach.
Well, if you are after a scalable solution, you want to use a sparse array for your dummy matrix. The following code may be hard to follow if you are not familiar with the details of the CSR sparse format:
import scipy.sparse as sps
def sparse_dummyvar(partitions):
num_rows = np.sum(np.max(partitions, axis=1))
nnz = np.prod(partitions.shape)
as_part = np.argsort(partitions, axis=1)
# You could get s_part from the indices in as_part, left as
# an exercise for the reader...
s_part = np.sort(partitions, axis=1)
mask = np.hstack(([[True]]*len(items_per_row),
s_part[:, :-1] != s_part[:, 1:]))
indptr = np.where(mask.ravel())[0]
indptr = np.append(indptr, nnz)
return sps.csr_matrix((np.repeat([1], nnz), as_part.ravel(), indptr),
shape=(num_rows, partitions.shape[1],))
This returns the transpose of dummyvar(partitions). You could get the array without transposing simply by calling csc_matrix instead of csr_matrix and swapping the shape values. But since you only are after the product of the matrix with its transpose, and scipy converts everything to CSR format before multiplying, it is probably slightly faster like this. You can now do:
>>> dT = sparse_dummyvar(partitions)
>>> dT.T.dot(dT)
<10x10 sparse matrix of type '<type 'numpy.int32'>'
with 84 stored elements in Compressed Sparse Column format>
>>> dT.T.dot(dT).A
array([[3, 2, 1, 1, 1, 1, 1, 0, 1, 2],
[2, 3, 2, 0, 2, 0, 2, 1, 2, 1],
[1, 2, 3, 1, 1, 1, 3, 2, 1, 0],
[1, 0, 1, 3, 1, 3, 1, 1, 0, 2],
[1, 2, 1, 1, 3, 1, 1, 1, 2, 2],
[1, 0, 1, 3, 1, 3, 1, 1, 0, 2],
[1, 2, 3, 1, 1, 1, 3, 2, 1, 0],
[0, 1, 2, 1, 1, 1, 2, 3, 2, 0],
[1, 2, 1, 0, 2, 0, 1, 2, 3, 1],
[2, 1, 0, 2, 2, 2, 0, 0, 1, 3]])