Related
I have a quite large m times n numpy matrix M filled with non-zero values and an array x of length m, where each entry indicates the row index, after which the matrix elements should be set to zero. So for example, if n=5 and x[i]=3, then the i-th row of the matrix be set to [M_i1, M_i2, M_i3, 0, 0].
If all entries of x had the same value k, I could simply use slicing with something like M[:,k:]=0, but I could not figure out an efficient way to this with different values for each row without looping over all rows and use slicing for each row.
I thougt about creating a matrix that looks like [[1]*x[1] + [0]*(n-x[1]),...,[1]*x[m] + [0]*(n-x[m])] and use it for boolean indexing but also don't know how to create this without looping.
The non-vectorized solution looks like this:
for i in range(m):
if x[i] < n:
M[i,x[i]:] = 0
with example input
M = np.array([[1,2,3],[4,5,6]])
m, n = 2, 3
x = np.array([1,2])
and output
array([[1, 0, 0],
[4, 5, 0]])
Does anyone have a vectorized solution for this problem?
Thank you very much!
You can use multi-dimensional boolean indexing:
M[x[:,None]<=np.arange(M.shape[1])] = 0
example:
M = [[7, 8, 4, 2, 3, 9, 1, 8, 4, 3],
[2, 1, 6, 1, 5, 2, 2, 2, 9, 2],
[6, 1, 6, 8, 4, 3, 6, 9, 2, 6],
[5, 4, 0, 8, 3, 0, 0, 1, 8, 7],
[8, 7, 8, 8, 9, 2, 0, 8, 0, 2]]
x = [4, 4, 0, 6, 2]
output:
[[7, 8, 4, 2, 0, 0, 0, 0, 0, 0],
[2, 1, 6, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[5, 4, 0, 8, 3, 0, 0, 0, 0, 0],
[8, 7, 0, 0, 0, 0, 0, 0, 0, 0]]
This looks like a mask-smearing exercise. At each row, you want to smear starting with the element at np.minimum(x[row], n):
mask = np.zeros(M.shape, bool)
mask[np.flatnonzero(x < n), x[x < n]] = True
M[np.cumsum(mask, axis=1, dtype=bool)] = 0
Let's say I have this numpy array:
array([[4, 5, 6, 8, 5, 6],
[5, 1, 1, 9, 0, 5],
[7, 0, 5, 8, 0, 5],
[9, 2, 3, 8, 2, 3],
[1, 2, 2, 9, 2, 8]])
And going row by row, I would like to see, by column, the cumulative count of the number that appears. So for this array, the result would be:
array([[0, 0, 0, 0, 0, 0], # (*0)
[0, 0, 0, 0, 0, 0], # (*1)
[0, 0, 0, 1, 1, 1], # (*2)
[0, 0, 0, 2, 0, 0], # (*3)
[0, 1, 0, 1, 1, 0]] # (*4)
(*0): first time each value appears
(*1): all values are different from the previous one (in the column)
(*2): For the last 3 columns, a 1 appears because there is already 1 value repetition.
(*3): For the 4th column, a 2 appears because it's the 3rd time that a 8 appears.
(*4): In the 4th column, a 1 appears because it's the 2nd time that a 9 appears in that column. Similarly, for the second and second to last column.
Any idea how to perform this?
Thanks!
Maybe there is a faster way using numpy ufuncs, however here is a solution using standard python:
from collections import defaultdict
import numpy as np
a = np.array([[4, 5, 6, 8, 5, 6],
[5, 1, 1, 9, 0, 5],
[7, 0, 5, 8, 0, 5],
[9, 2, 3, 8, 2, 3],
[1, 2, 2, 9, 2, 8]])
# define function
def get_count(array):
count = []
for row in array.T:
occurences = defaultdict(int)
rowcount = []
for n in row:
occurences[n] += 1
rowcount.append(occurences[n] - 1)
count.append(rowcount)
return np.array(count).T
Output:
>>> get_count(a)
array([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 1],
[0, 0, 0, 2, 0, 0],
[0, 1, 0, 1, 1, 0]])
I have a Numpy Array that with integer values 1 or 0 (can be cast as booleans if necessary). The array is square and symmetric (see note below) and I want a list of the indices where a 1 appears:
Note that array[i][j] == array[j][i] and array[i][i] == 0 by design. Also I cannot have any duplicates.
import numpy as np
array = np.array([
[0, 0, 1, 0, 1, 0, 1],
[0, 0, 1, 1, 0, 1, 0],
[1, 1, 0, 0, 0, 0, 1],
[0, 1, 0, 0, 1, 1, 0],
[1, 0, 0, 1, 0, 0, 1],
[0, 1, 0, 1, 0, 0, 0],
[1, 0, 1, 0, 1, 0, 0]
])
I would like a result that is like this (order of each sub-list is not important, nor is the order of each element within the sub-list):
[
[0, 2],
[0, 4],
[0, 6],
[1, 2],
[1, 3],
[1, 5],
[2, 6],
[3, 4],
[3, 5],
[4, 6]
]
Another point to make is that I would prefer not to loop over all indices twice using the condition j<i because the size of my array can be large but I am aware that this is a possibility - I have written an example of this using two for loops:
result = []
for i in range(array.shape[0]):
for j in range(i):
if array[i][j]:
result.append([i, j])
print(pd.DataFrame(result).sort_values(1).values)
# using dataframes and arrays for formatting but looking for
# 'result' which is a list
# Returns (same as above but columns are the opposite way round):
[[2 0]
[4 0]
[6 0]
[2 1]
[3 1]
[5 1]
[6 2]
[4 3]
[5 3]
[6 4]]
idx = np.argwhere(array)
idx = idx[idx[:,0]<idx[:,1]]
Another way:
idx = np.argwhere(np.triu(array))
output:
[[0 2]
[0 4]
[0 6]
[1 2]
[1 3]
[1 5]
[2 6]
[3 4]
[3 5]
[4 6]]
Comparison:
##bousof solution
def method1(array):
return np.vstack(np.where(np.logical_and(array, np.diff(np.ogrid[:array.shape[0],:array.shape[0]])[0]>=0))).transpose()[:,::-1]
#Also mentioned by #hpaulj
def method2(array):
return np.argwhere(np.triu(array))
def method3(array):
idx = np.argwhere(array)
return idx[idx[:,0]<idx[:,1]]
#The original method in question by OP(d-man)
def method4(array):
result = []
for i in range(array.shape[0]):
for j in range(i):
if array[i][j]:
result.append([i, j])
return result
#suggestd by #bousof in comments
def method5(array):
return np.vstack(np.where(np.triu(array))).transpose()
inputs = [np.random.randint(0,2,(n,n)) for n in [10,100,1000,10000]]
Seems like method1, method2 and method5 are slightly faster for large arrays while method3 is faster for smaller cases:
In [249]: arr = np.array([
...: [0, 0, 1, 0, 1, 0, 1],
...: [0, 0, 1, 1, 0, 1, 0],
...: [1, 1, 0, 0, 0, 0, 1],
...: [0, 1, 0, 0, 1, 1, 0],
...: [1, 0, 0, 1, 0, 0, 1],
...: [0, 1, 0, 1, 0, 0, 0],
...: [1, 0, 1, 0, 1, 0, 0]
...: ])
The most common way of getting indices on non-zeros (True) is with np.nonzero (aka np.where):
In [250]: idx = np.nonzero(arr)
In [251]: idx
Out[251]:
(array([0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 6, 6, 6]),
array([2, 4, 6, 2, 3, 5, 0, 1, 6, 1, 4, 5, 0, 3, 6, 1, 3, 0, 2, 4]))
This is a tuple - 2 arrays for a 2d array. It can be used directly to index the array (or anything like it): arr[idx] will give all 1s.
Apply np.transpose to that and get an array of 'pairs':
In [252]: np.argwhere(arr)
Out[252]:
array([[0, 2],
[0, 4],
[0, 6],
[1, 2],
[1, 3],
[1, 5],
[2, 0],
[2, 1],
[2, 6],
[3, 1],
[3, 4],
[3, 5],
[4, 0],
[4, 3],
[4, 6],
[5, 1],
[5, 3],
[6, 0],
[6, 2],
[6, 4]])
Using such an array to index arr is harder - requiring a loop and conversion to tuple.
To weed out the symmetric duplicates we could make a tri-lower array:
In [253]: np.tril(arr)
Out[253]:
array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0],
[1, 0, 0, 1, 0, 0, 0],
[0, 1, 0, 1, 0, 0, 0],
[1, 0, 1, 0, 1, 0, 0]])
In [254]: np.argwhere(np.tril(arr))
Out[254]:
array([[2, 0],
[2, 1],
[3, 1],
[4, 0],
[4, 3],
[5, 1],
[5, 3],
[6, 0],
[6, 2],
[6, 4]])
You can use numpy.where:
>>> np.vstack(np.where(np.logical_and(array, np.diff(np.ogrid[:array.shape[0],:array.shape[0]])[0]<=0))).transpose()
array([[2, 0],
[2, 1],
[3, 1],
[4, 0],
[4, 3],
[5, 1],
[5, 3],
[6, 0],
[6, 2],
[6, 4]])
np.diff(np.ogrid[:array.shape[0],:array.shape[0]])[0]<=0 is true only on the lower part of the matrix. If the order is important, you can get the same order as in the question using:
>>> np.vstack(np.where(np.logical_and(array, np.diff(np.ogrid[:array.shape[0],:array.shape[0]])[0]>=0))).transpose()[:,::-1]
array([[2, 0],
[4, 0],
[6, 0],
[2, 1],
[3, 1],
[5, 1],
[6, 2],
[4, 3],
[5, 3],
[6, 4]])
I have a 2D array with filled with some values (column 0) and zeros (rest of the columns). I would like to do pretty much the same as I do with MS excel but using numpy, meaning to put into the rest of the columns values from calculations based on the first column. Here it is a MWE:
import numpy as np
a = np.zeros(20, dtype=np.int8).reshape(4,5)
b = [1, 2, 3, 4]
b = np.array(b)
a[:, 0] = b
# don't change the first column
for column in a[:, 1:]:
a[:, column] = column[0]+1
The expected output:
array([[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4, 5, 6, 7, 8]], dtype=int8)
The resulting output:
array([[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0]], dtype=int8)
Any help would be appreciated.
Looping is slow and there is no need to loop to produce the array that you want:
>>> a = np.ones(20, dtype=np.int8).reshape(4,5)
>>> a[:, 0] = b
>>> a
array([[1, 1, 1, 1, 1],
[2, 1, 1, 1, 1],
[3, 1, 1, 1, 1],
[4, 1, 1, 1, 1]], dtype=int8)
>>> np.cumsum(a, axis=1)
array([[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4, 5, 6, 7, 8]])
What went wrong
Let's start, as in the question, with this array:
>>> a
array([[1, 0, 0, 0, 0],
[2, 0, 0, 0, 0],
[3, 0, 0, 0, 0],
[4, 0, 0, 0, 0]], dtype=int8)
Now, using the code from the question, let's do the loop and see what column actually is:
>>> for column in a[:, 1:]:
... print(column)
...
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
As you can see, column is not the index of the column but the actual values in the column. Consequently, the following does not do what you would hope:
a[:, column] = column[0]+1
Another method
If we want to loop (so that we can do something more complex), here is another approach to generating the desired array:
>>> b = np.array([1, 2, 3, 4])
>>> np.column_stack([b+i for i in range(5)])
array([[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4, 5, 6, 7, 8]])
Your usage of column is a little ambiguous: in for column in a[:, 1:], it is treated as a column and in the body, however, it is treated as index to the column. You can try this instead:
for column in range(1, a.shape[1]):
a[:, column] = a[:, column-1]+1
a
#array([[1, 2, 3, 4, 5],
# [2, 3, 4, 5, 6],
# [3, 4, 5, 6, 7],
# [4, 5, 6, 7, 8]], dtype=int8)
in matlab/ GNU Octave( which i am actually using ), I use this method to copy particular elements of a 2D array to another 2D array:
B(2:6, 2:6) = A
where
size(A) = (5, 5)
My question is, "How can this be achieved in python using numpy?"
currently, for example, I am using the following nested loop in python:
>>> import numpy as np
>>> a = np.int32(np.random.rand(5,5)*10)
>>> b = np.zeros((6,6), dtype = np.int32)
>>> print a
[[6 7 5 1 3]
[3 9 7 2 0]
[9 3 7 6 7]
[9 8 2 0 8]
[8 7 7 9 9]]
>>> print b
[[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]]
>>> for i in range(1,6):
for j in range(1,6):
b[i][j] = a[i-1][j-1]
>>> print b
[[0, 0, 0, 0, 0, 0],
[0, 6, 7, 5, 1, 3],
[0, 3, 9, 7, 2, 0],
[0, 9, 3, 7, 6, 7],
[0, 9, 8, 2, 0, 8],
[0, 8, 7, 7, 9, 9]]
Is there a better way to do this?
It's almost the same as the MATLAB:
b[1:6, 1:6] = a
The only thing is that Python uses 0-based indexing so the second element is 1 instead of 2.