I have a mask array: [0,0,0,0,1,1,0,0,1,1,0,1,0].
And a values array: [3,4,5,6,7]
Which is the best way that I can replace all value 1 in mask array into the values array?
Expected result: [0,0,0,0,3,4,0,0,5,6,0,7,0]
I am working with large array.
Assuming numpy, and a values length equal to the number of 1s.
Use boolean indexing:
mask = np.array([0,0,0,0,1,1,0,0,1,1,0,1,0])
values = [3,4,5,6,7]
mask[mask==1] = values
If values can be longer than the sum of 1s:
m = mask==1
mask[m] = values[:m.sum()]
Output:
array([0, 0, 0, 0, 3, 4, 0, 0, 5, 6, 0, 7, 0])
You can use iterator:
mask = [0,0,0,0,1,1,0,0,1,1,0,1,0]
nums = iter([3,4,5,6,7])
output = [next(nums) if m else 0 for m in mask]
print(output) # [0, 0, 0, 0, 3, 4, 0, 0, 5, 6, 0, 7, 0]
Related
I would like to know the fastest way to extract the indices of the first n non zero values per column in a 2D array.
For example, with the following array:
arr = [
[4, 0, 0, 0],
[0, 0, 0, 0],
[0, 4, 0, 0],
[2, 0, 9, 0],
[6, 0, 0, 0],
[0, 7, 0, 0],
[3, 0, 0, 0],
[1, 2, 0, 0],
With n=2 I would have [0, 0, 1, 1, 2] as xs and [0, 3, 2, 5, 3] as ys. 2 values in the first and second columns and 1 in the third.
Here is how it is currently done:
x = []
y = []
n = 3
for i, c in enumerate(arr.T):
a = c.nonzero()[0][:n]
if len(a):
x.extend([i]*len(a))
y.extend(a)
In practice I have arrays of size (405, 256).
Is there a way to make it faster?
Here is a method, although quite confusing as it uses a lot of functions, that does not require sorting the array (only a linear scan is necessary to get non null values):
n = 2
# Get indices with non null values, columns indices first
nnull = np.stack(np.where(arr.T != 0))
# split indices by unique value of column
cols_ids= np.array_split(range(len(nnull[0])), np.where(np.diff(nnull[0]) > 0)[0] +1 )
# Take n in each (max) and concatenate the whole
np.concatenate([nnull[:, u[:n]] for u in cols_ids], axis = 1)
outputs:
array([[0, 0, 1, 1, 2],
[0, 3, 2, 5, 3]], dtype=int64)
Here is one approach using argsort, it gives a different order though:
n = 2
m = arr!=0
# non-zero values first
idx = np.argsort(~m, axis=0)
# get first 2 and ensure non-zero
m2 = np.take_along_axis(m, idx, axis=0)[:n]
y,x = np.where(m2)
# slice
x, idx[y,x]
# (array([0, 1, 2, 0, 1]), array([0, 2, 3, 3, 5]))
Use dislocation comparison for the row results of the transposed nonzero:
>>> n = 2
>>> i, j = arr.T.nonzero()
>>> mask = np.concatenate([[True] * n, i[n:] != i[:-n]])
>>> i[mask], j[mask]
(array([0, 0, 1, 1, 2], dtype=int64), array([0, 3, 2, 5, 3], dtype=int64))
Im trying to create a function that will transform a regular Matrix into CSR form (I don't want to use the scipy.sparse one).
To do this, I'm using a nested for-loop to run through a given matrix to create a new matrix with three rows.
The first row ('Values') should contain all non-zero values. The second ('Cols') should contain the column index for each number in 'Values'. The third row should contain the index value in 'Values' for the first non-zero value on each row.
My question regards the second and third rows:
Is there a way of getting the column ID for the element 'i' in the for-loop?
M=array([[4,0,39],
[0,5,0],
[0,0,7]])
def Convert(x):
CSRMatrix = []
Values = []
Cols = []
Rows = []
for k in x:
for i in k:
if i != 0:
Values.append(i)
Cols.append({#the column index value of 'i'})
Rows.append[#theindex in 'Values' of the first non-zero element on each row]
CSRMatrix.append(Values)
CSRMatrix.append(Cols)
CSRMatrix.append(Rows)
return(CSRMatrix)
Convert(M)
I'm not sure of what you want exactly for Cols.append() because of the way you commented it in the code between curly braces.
Is it a dict containing the index:value of all non 0 value? Or a list of sets containing the indexes of all non 0 values (which would be weird), or is it all the indexes of each row in your array?
Anyway I put the 2 most likely candidates (dict and list of indexes for each row) test each one and delete the unwanted one and if none are right please add some more specifics:
import numpy as np
m = np.array([[4,0,39],
[0,5,0],
[0,0,7]])
def Convert(x):
CSRMatrix = []
Values = []
Cols = []
Rows = []
for num in x:
for i in range(len(num)):
if num[i] != 0:
Values.append(num[i])
Cols.append({i:num[i]}) # <- if dict. Remove if not what you wanted
Rows.append(i)
Cols.append(i) # <- list of all indexes in the array for each row. Remove if not what you wanted
CSRMatrix.append(Values)
CSRMatrix.append(Cols)
CSRMatrix.append(Rows)
return(CSRMatrix)
x = Convert(m)
print(x)
enumerate() passes an index for every iteration.
Thereby the second row can be easily created by appending num2.
For the third row you have to check again if you have already added a value in that row. If not append num2 and set the non_zero check to False. For the next row non_zero check is set to True again.
def Convert(x):
CSRMatrix = []
Values = []
Cols = []
Rows = []
for num, k in enumerate(x):
non_zero = True
for num2, i in enumerate(k):
if i != 0:
Values.append(i)
Cols.append(num2)
if non_zero:
Rows.append(num2)
non_zero = False
CSRMatrix.append(Values)
CSRMatrix.append(Cols)
CSRMatrix.append(Rows)
return (CSRMatrix)
Here is a numpythonic implementation, use the nonzero method to directly obtain the row and column index of non-zero elements, and then use a comparison to generate a mask. Finally, use nonzero for the mask to get the row indices:
>>> M = np.array([[ 4, 0, 39],
... [ 0, 5, 0],
... [ 0, 0, 7]])
>>> r, c = M.nonzero()
>>> mask = np.concatenate(([True], r[1:] != r[:-1]))
>>> [M[r, c], c, *mask.nonzero()]
[array([ 4, 39, 5, 7]), array([0, 2, 1, 2]), array([0, 2, 3])]
Test of a larger array:
>>> a = np.random.choice(10, size=(8, 8), p=[0.73] + [0.03] * 9)
>>> a
array([[0, 0, 0, 0, 8, 0, 0, 1],
[1, 0, 5, 4, 0, 0, 9, 0],
[0, 0, 9, 0, 0, 0, 0, 1],
[0, 0, 0, 8, 9, 0, 0, 4],
[0, 0, 5, 0, 0, 6, 0, 0],
[0, 8, 0, 0, 0, 0, 0, 9],
[0, 0, 0, 0, 0, 0, 0, 9],
[0, 9, 0, 0, 0, 4, 0, 0]])
>>> r, c = a.nonzero()
>>> mask = np.concatenate(([True], r[1:] != r[:-1]))
>>> pp([a[r, c], c, *mask.nonzero()])
[array([8, 1, 1, 5, 4, 9, 9, 1, 8, 9, 4, 5, 6, 8, 9, 9, 9, 4]),
array([4, 7, 0, 2, 3, 6, 2, 7, 3, 4, 7, 2, 5, 1, 7, 7, 1, 5], dtype=int64),
array([ 0, 2, 6, 8, 11, 13, 15, 16], dtype=int64)]
I have a Python matrix array for example like this one:
a = array([[0, 2, 1, 1.4142, 4, 7],
[3, 0, 1.4142, 9, 2, 0],
[1.4142, 0, 0, 1, 1, 3]])
I want to convert all the elements of this array being different to 1 or different to sqrt(2) (1.4142) to 0. That is:
a = array([[0, 0, 1, 1.4142, 0, 0],
[0, 0, 1.4142, 0, 0, 0],
[1.4142, 0, 0, 1, 1, 0]])
I have tried this
a[(a != 1).any() or not (np.isclose(a, np.sqrt(2))).any()] = 0
and some variations but I can't make it to work. Thx.
Just use masking -
m1 = np.isclose(a,1) # use a==1 for exact matches
m2 = np.isclose(a,np.sqrt(2))
a[~(m1 | m2)] = 0
You can try it:
np.where((a == 1.4142), a, a == 1)
why not to check sum and product of elements for both arrays? correct if I am wrong this should work for positive numbers.
I have an array created from a raster. This array has multiple unique values. I want to create new arrays for each unique value such that the places with that value are marked as '1' and the rest as '0'. I am using python for this.
A = [1, 1, 3, 2, 2, 1, 1, 3, 3] # Input array
b = numpy.unique(A) # gives unique values
a1 = [1, 1, 0, 0, 0, 1, 1, 0, 0] #new array for value 1
a2 = [0, 0, 0, 1, 1, 0, 0, 0, 0] #new array for value 2
a3 = [0, 0, 1, 0, 0, 0, 0, 1, 1] #new array for value 3
So basically the code would scan through the unique values, get the number of unique values and create individual arrays for each unique value.
I have used the numpy.unique() and numpy.zeros() to get the unique values in the array, and to create arrays that can be overwritten to the desired array, respectively. But I do not how to get the code to get the number of unique values and create that many new arrays.
I have been trying to do this with the for loop, but I haven't been successful. My concepts of developing such a nested for loopare not very clear yet.
You could do something like this:
>>> A = [1, 1, 3, 2, 2, 1, 1, 3, 3]
>>> result = [(A==unique_val).astype(int) for unique_val in np.unique(A)]
[array([1, 1, 0, 0, 0, 1, 1, 0, 0]), array([0, 0, 0, 1, 1, 0, 0, 0, 0]), array([0, 0, 1, 0, 0, 0, 0, 1, 1])]
The core part of the program being:
(A == unique_val).astype(int)
It's simply comparing the elements in numpy array with unique_val, each element return a boolean result. By using astype(int) we are converting the boolean result to an integer array.
You can do:
a1 = (A == b[0]) * 1
And, instead of b[0], create a loop using len(b) and iterate with b[i].
Easiest way is to do is with broadcasting:
locs = (A[None, :] == b[:, None]).astype(int)
out = {val: arr for val, arr in zip(list(b), list(locs))}
I have the following array
a = [1, 2, 3, 0, 0, 0, 0, 0, 0, 4, 5, 6, 0, 0, 0, 0, 9, 8, 7,0,10,11]
I would like to find the start and the end index of the array where the values are zeros consecutively. For the array above the output would be as follows
[3,8],[12,15],[19]
I want to achieve this as efficiently as possible.
Here's a fairly compact vectorized implementation. I've changed the requirements a bit, so the return value is a bit more "numpythonic": it creates an array with shape (m, 2), where m is the number of "runs" of zeros. The first column is the index of the first 0 in each run, and the second is the index of the first nonzero element after the run. (This indexing pattern matches, for example, how slicing works and how the range function works.)
import numpy as np
def zero_runs(a):
# Create an array that is 1 where a is 0, and pad each end with an extra 0.
iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
absdiff = np.abs(np.diff(iszero))
# Runs start and end where absdiff is 1.
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
return ranges
For example:
In [236]: a = [1, 2, 3, 0, 0, 0, 0, 0, 0, 4, 5, 6, 0, 0, 0, 0, 9, 8, 7, 0, 10, 11]
In [237]: runs = zero_runs(a)
In [238]: runs
Out[238]:
array([[ 3, 9],
[12, 16],
[19, 20]])
With this format, it is simple to get the number of zeros in each run:
In [239]: runs[:,1] - runs[:,0]
Out[239]: array([6, 4, 1])
It's always a good idea to check the edge cases:
In [240]: zero_runs([0,1,2])
Out[240]: array([[0, 1]])
In [241]: zero_runs([1,2,0])
Out[241]: array([[2, 3]])
In [242]: zero_runs([1,2,3])
Out[242]: array([], shape=(0, 2), dtype=int64)
In [243]: zero_runs([0,0,0])
Out[243]: array([[0, 3]])
You can use itertools to achieve your expected result.
from itertools import groupby
a= [1, 2, 3, 0, 0, 0, 0, 0, 0, 4, 5, 6, 0, 0, 0, 0, 9, 8, 7,0,10,11]
b = range(len(a))
for group in groupby(iter(b), lambda x: a[x]):
if group[0]==0:
lis=list(group[1])
print [min(lis),max(lis)]
Here is a custom function, not sure the most efficient but works :
def getZeroIndexes(li):
begin = 0
end = 0
indexes = []
zero = False
for ind,elt in enumerate(li):
if not elt and not zero:
begin = ind
zero = True
if not elt and zero:
end = ind
if elt and zero:
zero = False
if begin == end:
indexes.append(begin)
else:
indexes.append((begin, end))
return indexes