Numpy remove duplicate columns with values greater than 0

Numpy remove duplicate columns with values greater than 0 - python

I've the following array.
array([[ 0, 0, 0, 0, 0, 3],
[ 4, 4, 0, 0, 0, 0],
[ 0, 0, 0, 23, 0, 0]])
I am looking to find the unique values column wise such that my result is.
array([[ 0, 0, 0, 0, 3],
[ 4, 0, 0, 0, 0],
[ 0, 0, 23, 0, 0]])
The unique should only be applied to columns without 0 values i.e all columns which has 0 as their value should remain. Also I've to make sure that the indices of the columns is not changed. They remain at their place.
I've already tried the following.
np.unique(a,axis=1, return_index=True)
But this gives me
(array([[ 0, 0, 0, 3],
[ 0, 0, 4, 0],
[ 0, 23, 0, 0]]), array([2, 3, 0, 5]))
There are two problems in this result. The column indices are moved and the columns with only 0 values are also merged.

This will accomplish what you want:
import numpy as np
import pandas as pd
x = np.array([[ 0, 0, 0, 0, 0, 3],
[ 4, 4, 0, 0, 0, 0],
[ 0, 0, 0, 23, 0, 0]])
df = pd.DataFrame(x.T)
row_sum = np.sum(df, axis=1)
df1 = df[row_sum != 0].drop_duplicates()
df0 = df[row_sum == 0]
y = pd.concat([df1, df0]).sort_index().values.T
y
array([[ 0, 0, 0, 0, 3],
[ 4, 0, 0, 0, 0],
[ 0, 0, 23, 0, 0]])
By summing the columns (or the rows after transposing) you can identify which ones contain all zeros, and filter them out before dropping the duplicates. Then you can re-combine them and sort by the index to get the desired output.

Related

How to count non zero rows in a N-d tensor?

I need to find the number of non zero rows and put them in a 1D tensor(kind of vector).
For an example:
tensor = [
[
[1, 2, 3, 4, 0, 0, 0],
[4, 5, 6, 7, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]
],
[
[4, 3, 2, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]
],
[
[0, 0, 0, 0, 0, 0, 0],
[4, 5, 6, 7, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]
]
]
the tensor shape will be [None,45,7] in a real application, but here it is [3,2,7].
So I need to find the number of non-zero rows in dimension 1 and keep them in a 1d tensor.
non_zeros = [2,1,1] #result for the above tensor
I need to do it in TensorFlow, if it is in NumPy, I would have done it.
Can anyone help me with this?
Thanks in advance

You can use tf.math.count_nonzero combined with tf.reduce_sum
>>> tf.math.count_nonzero(tf.reduce_sum(tensor,axis=2),axis=1)
<tf.Tensor: shape=(3,), dtype=int64, numpy=array([2, 1, 1])>

Try this code:
t = tf.math.not_equal(tensor, 0)
t = tf.reduce_any(t, -1)
t = tf.cast(t, tf.int32)
t = tf.reduce_sum(t, -1)

Replacement for numpy.apply_along_axis in CuPy

I have a NumPy-based neural network that I am trying to port to CuPy. I have a function as follows:
import numpy as np
def tensor_diag(x): return np.apply_along_axis(np.diag, -1, x)
# Usage: (x is a matrix, i.e. a 2-tensor)
def sigmoid_prime(x): return tensor_diag(sigmoid(x) * (1 - sigmoid(x)))
This works using NumPy, but CuPy does not have an analogue for the function (it is unsupported as of 8th May 2020). How can I emulate this behaviour in CuPy?

In [284]: arr = np.arange(24).reshape(2,3,4)
np.diag takes a 1d array, and returns a 2d with the values on the diagonal. apply_along_axis just iterates on all dimensions except the last, and passes the last, one array at a time to diag:
In [285]: np.apply_along_axis(np.diag,-1,arr)
Out[285]:
array([[[[ 0, 0, 0, 0],
[ 0, 1, 0, 0],
[ 0, 0, 2, 0],
[ 0, 0, 0, 3]],
[[ 4, 0, 0, 0],
[ 0, 5, 0, 0],
[ 0, 0, 6, 0],
[ 0, 0, 0, 7]],
[[ 8, 0, 0, 0],
[ 0, 9, 0, 0],
[ 0, 0, 10, 0],
[ 0, 0, 0, 11]]],
[[[12, 0, 0, 0],
[ 0, 13, 0, 0],
[ 0, 0, 14, 0],
[ 0, 0, 0, 15]],
[[16, 0, 0, 0],
[ 0, 17, 0, 0],
[ 0, 0, 18, 0],
[ 0, 0, 0, 19]],
[[20, 0, 0, 0],
[ 0, 21, 0, 0],
[ 0, 0, 22, 0],
[ 0, 0, 0, 23]]]])
In [286]: _.shape
Out[286]: (2, 3, 4, 4)
I could do the same mapping with:
In [287]: res = np.zeros((2,3,4,4),int)
In [288]: res[:,:,np.arange(4),np.arange(4)] = arr
check with the apply result:
In [289]: np.allclose(_285, res)
Out[289]: True
Or for a more direct copy of apply, use np.ndindex to generate all the i,j tuple pairs to iterate over the first 2 dimensions of arr:
In [298]: res = np.zeros((2,3,4,4),int)
In [299]: for ij in np.ndindex(2,3):
...: res[ij]=np.diag(arr[ij])
...:
In [300]: np.allclose(_285, res)
Out[300]: True

Numpy: Diff on non-adjacent values, in 2D

I'd like to take the difference of non-adjacent values within 2D numpy array along axis=-1 (per row). An array can consist of a large number of rows.
Each row is a selection of values along a timeline from 1 to N.
For N=12, the array could look like below 3x12 shape:
timeline = np.array([[ 0, 0, 0, 4, 0, 6, 0, 0, 9, 0, 11, 0],
[ 1, 0, 3, 4, 0, 0, 0, 0, 9, 0, 0, 12],
[ 0, 0, 0, 4, 0, 0, 0, 0, 9, 0, 0, 0]])
The desired result should look like: (size of array is intact and position is important)
diff = np.array([[ 0, 0, 0, 4, 0, 2, 0, 0, 3, 0, 2, 0],
[ 1, 0, 2, 1, 0, 0, 0, 0, 5, 0, 0, 3],
[ 0, 0, 0, 4, 0, 0, 0, 0, 5, 0, 0, 0]])
I am aware of the solution in 1D, Diff on non-adjacent values
imask = np.flatnonzero(timeline)
diff = np.zeros_like(timeline)
diff[imask] = np.diff(timeline[imask], prepend=0)
within which the last line can be replaced with
diff[imask[0]] = timeline[imask[0]]
diff[imask[1:]] = timeline[imask[1:]] - timeline[imask[:-1]]
and the first line can be replaced with
imask = np.where(timeline != 0)[0]
Attempting to generalise the 1D solution I can see imask = np.flatnonzero(timeline) is undesirable as rows becomes inter-dependent. Thus I am trying by using the alternative np.nonzero.
imask = np.nonzero(timeline)
diff = np.zeros_like(timeline)
diff[imask] = np.diff(timeline[imask], prepend=0)
However, this solution results in a connection between row's end values (inter-dependent).
array([[ 0, 0, 0, 4, 0, 2, 0, 0, 3, 0, 2, 0],
[-10, 0, 2, 1, 0, 0, 0, 0, 5, 0, 0, 3],
[ 0, 0, 0, -8, 0, 0, 0, 0, 5, 0, 0, 0]])
How can I make the "prepend" to start each row with a zero?

Wow. I did it... (It is interesting problem for me too..)
I made non_adjacent_diff function to be applied to every row, and apply it to every row using np.apply_along_axis.
Try this code.
timeline = np.array([[ 0, 0, 0, 4, 0, 6, 0, 0, 9, 0, 11, 0],
[ 1, 0, 3, 4, 0, 0, 0, 0, 9, 0, 0, 12],
[ 0, 0, 0, 4, 0, 0, 0, 0, 9, 0, 0, 0]])
def non_adjacent_diff(row):
not_zero_index = np.where(row != 0)
diff = row[not_zero_index][1:] - row[not_zero_index][:-1]
np.put(row, not_zero_index[0][1:], diff)
return row
np.apply_along_axis(non_adjacent_diff, 1, timeline)

Numpy re-index to first N natural numbers

I have a matrix that has a quite sparse index (the largest values in both rows and columns are beyond 130000), but only a few of those rows/columns actually have non-zero values.
Thus, I want to have the row and column indices shifted to only represent the non-zero ones, by the first N natural numbers.
Visually, I want a example matrix like this
1 0 1
0 0 0
0 0 1
to look like this
1 1
0 1
but only if all values in the row/column are zero.
Since I do have the matrix in a sparse format, I could simply create a dictionary, store every value by an increasing counter (for row and matrix separately), and get a result.
row_dict = {}
col_dict = {}
row_ind = 0
col_ind = 0
# el looks like this: (row, column, value)
for el in sparse_matrix:
if el[0] not in row_dict.keys():
row_dict[el[0]] = row_ind
row_ind += 1
if el[1] not in col_dict.keys():
col_dict[el[1]] = col_ind
col_ind += 1
# now recreate matrix with new index
But I was looking for maybe an internal function in NumPy. Also note that I do not really know how to word the question, so there might well be a duplicate out there that I do not know of; Any pointers in the right direction are appreciated.

You can use np.unique:
>>> import numpy as np
>>> from scipy import sparse
>>>
>>> A = np.random.randint(-100, 10, (10, 10)).clip(0, None)
>>> A
array([[6, 0, 5, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 7, 0, 0, 0, 0, 4, 9],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 4, 0],
[9, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 4, 0, 0, 0, 0, 0, 0]])
>>> B = sparse.coo_matrix(A)
>>> B
<10x10 sparse matrix of type '<class 'numpy.int64'>'
with 8 stored elements in COOrdinate format>
>>> runq, ridx = np.unique(B.row, return_inverse=True)
>>> cunq, cidx = np.unique(B.col, return_inverse=True)
>>> C = sparse.coo_matrix((B.data, (ridx, cidx)))
>>> C.A
array([[6, 5, 0, 0, 0],
[0, 0, 7, 4, 9],
[0, 0, 0, 4, 0],
[9, 0, 0, 0, 0],
[0, 0, 4, 0, 0]])

How to shift the columns of a 2D array multiple times, while still considering its original position?

Alright, so consider that I have a matrix m, as follows:
m = [[0, 1, 0, 0, 0, 1],
[4, 0, 0, 3, 2, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]]
My goal is to check each row of the matrix and see if the sum of that row is zero. If the sum is not zero, I want to shift the column that corresponds to that row to the end of the matrix. If the sum of the row is zero, nothing happens. So in the given matrix above the following should occur:
The program discovers that the 0th row has a sum that does not equal zero
The 0th column of the matrix is shifted to the end of the matrix, as follows:
m = [[1, 0, 0, 0, 1, 0],
[0, 0, 3, 2, 0, 4],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]]
The program checks the next row and does the same, shifting the column to the end of the matrix
m = [[0, 0, 0, 1, 0, 1],
[0, 3, 2, 0, 4, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]]
Each of the other rows are checked, but since all of their sums are zero no shift is made, and the final result is the matrix above.
The issue arises after shifting the columns of the matrix for the first time, once all of the values are shifted it becomes tricky to tell which column corresponds to the correct row.
I can't use numpy to solve this problem as I can only use the original Python 2 libraries.

Use a simple loop and when the sum is not equal to zero loop over rows again and append the popped first item to each row.
>>> from pprint import pprint
>>> m = [[0, 1, 0, 0, 0, 1],
[4, 0, 0, 3, 2, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]]
>>> for row in m:
# If all numbers are >= 0 then we can short-circuit this using `if any(row):`.
if sum(row) != 0:
for row in m:
row.append(row.pop(0))
...
>>> pprint(m)
[[0, 0, 0, 1, 0, 1],
[0, 3, 2, 0, 4, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]]
list.pop is O(N) operation, if you need something fast then use collections.deque.

deque can rotate elements.
from collections import deque
def rotate(matrix):
matrix_d = [deque(row) for row in matrix]
for row in matrix:
if sum(row) != 0:
for row_d in matrix_d:
row_d.rotate(-1)
return [list(row) for row in matrix_d]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy remove duplicate columns with values greater than 0 - python

Related

How to count non zero rows in a N-d tensor?

Replacement for numpy.apply_along_axis in CuPy

Numpy: Diff on non-adjacent values, in 2D

Numpy re-index to first N natural numbers

How to shift the columns of a 2D array multiple times, while still considering its original position?

Categories

Resources