Replace all but the first 1 in an array with 0 - python

I am trying to find a way to replace all of the duplicate 1 with 0. As an example:
[[0,1,0,1,0],
[1,0,0,1,0],
[1,1,1,0,1]]
Should become:
[[0,1,0,0,0],
[1,0,0,0,0],
[1,0,0,0,0]]
I found a similar problem, however the solution does not seem to work numpy: setting duplicate values in a row to 0

Assume array contains only zeros and ones, you can find the max value per row using numpy.argmax and then use advanced indexing to reassign the values on the index to a zeros array.
arr = np.array([[0,1,0,1,0],
[1,0,0,1,0],
[1,1,1,0,1]])
res = np.zeros_like(arr)
idx = (np.arange(len(res)), np.argmax(arr, axis=1))
res[idx] = arr[idx]
res
array([[0, 1, 0, 0, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0]])

Try looping through each row of the grid
In each row, find all the 1s. In particular you want their indices (positions within the row). You can do this with a list comprehension and enumerate, which automatically gives an index for each element.
Then, still within that row, go through every 1 except for the first, and set it to zero.
grid = [[0, 1, 0, 1, 0], [1, 0, 0, 1, 0], [1, 1, 1, 0, 1]]
for row in grid:
ones = [i for i, element in enumerate(row) if element==1]
for i in ones[1:]:
row[i] = 0
print(grid)
Gives: [[0, 1, 0, 0, 0], [1, 0, 0, 0, 0], [1, 0, 0, 0, 0]]

You can use cumsum:
(arr.cumsum(axis=1).cumsum(axis=1) == 1) * 1
this will create a cummulative sum, by then checking if a value is 1 you can find the first 1s

Related

Splitting a nump array at specific locations

Hey so i basically have a problem like this:
i have a numpy array which contains a matrix of values, for example:
Data = np.array([
[3, 0, 1, 5],
[0, 0, 0, 7],
[0, 3, 0, 0],
[0, 0, 0, 6],
[5, 1, 0, 0]])
Using another array i want to extract the specific values and sum them together, this is a bit hard to explain so ill just show an example:
values = np.array([3,1,3,4,2])
so this means we want the first 3 values of the first row, first value of the second row, first 3 values of the 3rd row, first 4 values of the 4th row and first 2 values of the the last row, so we only want this data:
final_data = np.array([
[3, 0, 1],
[0],
[0, 3, 0],
[0, 0, 0, 6],
[5, 1]])
then we want to get the sum amount of those values, in this case the sum value will be 19.
Is there any easy way to do this? also, the data isn't always the same size so i cant have any fixed variables.
An even better answer:
Data[np.arange(Data.shape[1])<values[:,None]].sum()
You can try:
sum([Data[i, :j].sum() for i, j in enumerate(values)])
You can accomplish this with advanced indexing. The advanced coordinates can be calculated separately before pulling them from the array.
Explicitly:
Data = np.array([
[3, 0, 1, 5],
[0, 0, 0, 7],
[0, 3, 0, 0],
[0, 0, 0, 6],
[5, 1, 0, 0]])
values = np.array([3,1,3,4,2])
X = [0,0,0,1,2,2,2,3,3,3,3,4,4]
Y = [0,1,2,0,0,1,2,0,1,2,3,0,1]
Data[X,Y]
Notice X is the number of times to access each row and Y is the column to access with each X. These can be calculated from values directly:
X = np.concatenate([[n]*i for n,i in enumerate(values)])
Y = np.concatenate([np.arange(i) for i in values])

Return the maximun of a matrix

I want to return the max value of a matrix.
For example this is my matrix:
matrix = [[0, 1, 10, 0, 0], [0, 0, 6, 0, 1], [0, 1, 4, 0, 0]]
I want to return the max so here '10'
This is my code but I have an error:
max = 0
for i in range(len(matrix)+1):
for j in range(len(matrix[0])+1):
if matrix[i][j] > matrix[i+1][j+1]:
max = matrix[i][j]
print(max)
Thanks in advance
There are several issues with your code, I suggest you use the max function:
matrix = [[0, 1, 10, 0, 0], [0, 0, 6, 0, 1], [0, 1, 4, 0, 0]]
result = max(max(row) for row in matrix)
print(result)
Output
10
you can try this.
matrix = [[0, 1, 10, 0, 0], [0, 0, 6, 0, 1], [0, 1, 4, 0, 0]]
max1 = 0
for sub_list in matrix:
for item in sub_list:
if item>max1:
max1=item
Multiple ways to do it:
Fix your method. In Python, lists are zero-based so you need to only iterate from i = 0 to len(matrix) - 1. Doing for i in range(len(matrix)): does this for you. You don't need to do range(len(matrix) + 1)). Also, you should only replace the current maxval if the element you're looking at is greater than maxval.
So,
maxval = -9999999
for i in range(len(matrix)):
for j in range(len(matrix[i])):
if matrix[i][j] > maxval:
maxval = matrix[i][j]
print(maxval)
# Out: 10
Or, a more pythonic way is to iterate over the elements instead of accessing them through their indices
maxval = -9999999
for row in matrix:
for element in row:
if element > maxval:
maxval = element
# maxval: 10
Notice I use maxval instead of max so as not to shadow python's inbuilt max() function.
Use numpy (if you're already using it for other things). Like wim mentioned in their comment, a numpy array is a much better way to store matrices instead of lists of lists. Why? See this question
import numpy as np
matrix = [[0, 1, 10, 0, 0], [0, 0, 6, 0, 1], [0, 1, 4, 0, 0]]
maxval = np.max(matrix)
# maxval: 10
Iterate over rows, create a list of max values in each row. Then find the max of this list
matrix = [[0, 1, 10, 0, 0], [0, 0, 6, 0, 1], [0, 1, 4, 0, 0]]
rowmax = [max(row) for row in matrix]
maxval = max(rowmax)
# or in one line:
maxval = max(max(row) for row in matrix)
Use map. This is essentially the same as the previous method.
matrix = [[0, 1, 10, 0, 0], [0, 0, 6, 0, 1], [0, 1, 4, 0, 0]]
maxval = max(map(max, matrix))

Creating n number of masked subarrays for all the n unique values in an array using python

I have an array created from a raster. This array has multiple unique values. I want to create new arrays for each unique value such that the places with that value are marked as '1' and the rest as '0'. I am using python for this.
A = [1, 1, 3, 2, 2, 1, 1, 3, 3] # Input array
b = numpy.unique(A) # gives unique values
a1 = [1, 1, 0, 0, 0, 1, 1, 0, 0] #new array for value 1
a2 = [0, 0, 0, 1, 1, 0, 0, 0, 0] #new array for value 2
a3 = [0, 0, 1, 0, 0, 0, 0, 1, 1] #new array for value 3
So basically the code would scan through the unique values, get the number of unique values and create individual arrays for each unique value.
I have used the numpy.unique() and numpy.zeros() to get the unique values in the array, and to create arrays that can be overwritten to the desired array, respectively. But I do not how to get the code to get the number of unique values and create that many new arrays.
I have been trying to do this with the for loop, but I haven't been successful. My concepts of developing such a nested for loopare not very clear yet.
You could do something like this:
>>> A = [1, 1, 3, 2, 2, 1, 1, 3, 3]
>>> result = [(A==unique_val).astype(int) for unique_val in np.unique(A)]
[array([1, 1, 0, 0, 0, 1, 1, 0, 0]), array([0, 0, 0, 1, 1, 0, 0, 0, 0]), array([0, 0, 1, 0, 0, 0, 0, 1, 1])]
The core part of the program being:
(A == unique_val).astype(int)
It's simply comparing the elements in numpy array with unique_val, each element return a boolean result. By using astype(int) we are converting the boolean result to an integer array.
You can do:
a1 = (A == b[0]) * 1
And, instead of b[0], create a loop using len(b) and iterate with b[i].
Easiest way is to do is with broadcasting:
locs = (A[None, :] == b[:, None]).astype(int)
out = {val: arr for val, arr in zip(list(b), list(locs))}

How to convert a series of index/category, into a classification array

How, to convert a series of indexes, into a 2-D array which expresses the category/classifier that's defined by the indexes values in list ?
e.g.:
import numpy as np
aList = [0,1,0,2]
anArray = np.array(aList)
resultArray = convertToCategories(anArray)
and the return value of convertToCategories() would be like:
[[1,0,0], # the 0th element of aList is index category 0
[0,1,0], # the 1st element of aList is index category 1
[1,0,0], # the 2nd element of aList is index category 0
[0,0,1]] # the 3rd element of aList is index category 2
In last resort, I could of course:
parse the list,
count the number of categories (it's contiguous/continuous, it could be simply to find the maximum)
create a zeroed array with the good size found
then reparse the list, so as to fill the array according the indices given by the list, with 1 (or True).
But I am wondering if there exists a more pythonic, or dedicated numpy, or pandas function to achieve this kind of transformation.
You can do something like this -
import numpy as np
# Size parameters
N = anArray.size
M = anArray.max()+1
# Setup output array
resultArray = np.zeros((N,M),int)
# Find out the linear indices where 1s would be put
idx = (np.arange(N)*M) + anArray
# Finally, put 1s at those places for the final output
resultArray.ravel()[idx] = 1
Sample run -
In [188]: anArray
Out[188]: array([0, 1, 0, 2, 4, 1, 3])
In [189]: resultArray
Out[189]:
array([[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[1, 0, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[0, 1, 0, 0, 0],
[0, 0, 0, 1, 0]])
Or, better just directly index into the output array with the row and column indices -
# Setup output array and put 1s at places indexed by row and column indices.
# Here, anArray would be the column indices and [0,1,....N-1] would be the row indices
resultArray = np.zeros((N,M),int)
resultArray[np.arange(N),anArray] = 1

Set rows of scipy.sparse matrix that meet certain condition to zeros

I wonder what is the best way to replaces rows that do not satisfy a certain condition with zeros for sparse matrices. For example (I use plain arrays for illustration):
I want to replace every row whose sum is greater than 10 with a row of zeros
a = np.array([[0,0,0,1,1],
[1,2,0,0,0],
[6,7,4,1,0], # sum > 10
[0,1,1,0,1],
[7,3,2,2,8], # sum > 10
[0,1,0,1,2]])
I want to replace a[2] and a[4] with zeros, so my output should look like this:
array([[0, 0, 0, 1, 1],
[1, 2, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 1, 1, 0, 1],
[0, 0, 0, 0, 0],
[0, 1, 0, 1, 2]])
This is fairly straight forward for dense matrices:
row_sum = a.sum(axis=1)
to_keep = row_sum >= 10
a[to_keep] = np.zeros(a.shape[1])
However, when I try:
s = sparse.csr_matrix(a)
s[to_keep, :] = np.zeros(a.shape[1])
I get this error:
raise NotImplementedError("Fancy indexing in assignment not "
NotImplementedError: Fancy indexing in assignment not supported for csr matrices.
Hence, I need a different solution for sparse matrices. I came up with this:
def zero_out_unfit_rows(s_mat, limit_row_sum):
row_sum = s_mat.sum(axis=1).T.A[0]
to_keep = row_sum <= limit_row_sum
to_keep = to_keep.astype('int8')
temp_diag = get_sparse_diag_mat(to_keep)
return temp_diag * s_mat
def get_sparse_diag_mat(my_diag):
N = len(my_diag)
my_diags = my_diag[np.newaxis, :]
return sparse.dia_matrix((my_diags, [0]), shape=(N,N))
This relies on the fact that if we set 2nd and 4th elements of the diagonal in the identity matrix to zero, then rows of the pre-multiplied matrix are set to zero.
However, I feel that there is a better, more scipynic, solution. Is there a better solution?
Not sure if it is very scithonic, but a lot of the operations on sparse matrices are better done by accessing the guts directly. For your case, I personally would do:
a = np.array([[0,0,0,1,1],
[1,2,0,0,0],
[6,7,4,1,0], # sum > 10
[0,1,1,0,1],
[7,3,2,2,8], # sum > 10
[0,1,0,1,2]])
sps_a = sps.csr_matrix(a)
# get sum of each row:
row_sum = np.add.reduceat(sps_a.data, sps_a.indptr[:-1])
# set values to zero
row_mask = row_sum > 10
nnz_per_row = np.diff(sps_a.indptr)
sps_a.data[np.repeat(row_mask, nnz_per_row)] = 0
# ask scipy.sparse to remove the zeroed entries
sps_a.eliminate_zeros()
>>> sps_a.toarray()
array([[0, 0, 0, 1, 1],
[1, 2, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 1, 1, 0, 1],
[0, 0, 0, 0, 0],
[0, 1, 0, 1, 2]])
>>> sps_a.nnz # it does remove the entries, not simply set them to zero
10

Categories