Related
I'm looking for a way to extract zones of ones in a binary numpy array to put different values, for instance, for the following array:
x=[[0,1,1,0,0,0],
[0,1,1,0,0,0],
[0,1,0,0,0,0],
[0,0,0,1,1,0],
[0,0,1,1,1,0],
[0,0,0,0,0,0]]
Expected result:
x=[[0,2,2,0,0,0],
[0,2,2,0,0,0],
[0,2,0,0,0,0],
[0,0,0,3,3,0],
[0,0,3,3,3,0],
[0,0,0,0,0,0]]
Use scipy.ndimage.label:
x=[[0,1,1,0,0,0],
[0,1,1,0,0,0],
[0,1,0,0,0,0],
[0,0,0,1,1,0],
[0,0,1,1,1,0],
[0,0,0,0,0,0]]
a = np.array(x)
from scipy.ndimage import label
b = label(a)[0]
output:
# b
array([[0, 1, 1, 0, 0, 0],
[0, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 0, 2, 2, 0],
[0, 0, 2, 2, 2, 0],
[0, 0, 0, 0, 0, 0]], dtype=int32)
to start labeling from 2:
b = (label(a)[0]+1)*a
output:
array([[0, 2, 2, 0, 0, 0],
[0, 2, 2, 0, 0, 0],
[0, 2, 0, 0, 0, 0],
[0, 0, 0, 3, 3, 0],
[0, 0, 3, 3, 3, 0],
[0, 0, 0, 0, 0, 0]])
I have an array a of ones and zeroes (it might be rather big)
a = np.array([[1, 0, 0, 1, 0, 0],
[1, 1, 0, 0, 1, 0],
[0, 1, 1, 0, 0, 1],
[0, 0, 0, 1, 1, 1])
in which the "upper" rows are more "important" in the sense that if there is 1 in any column of the i-th row, then all ones in that columns in the following rows must be zeroed.
So, the desired output should be:
array([[1, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 1, 0],
[0, 0, 1, 0, 0, 1],
[0, 0, 0, 0, 0, 0]])
In other words, there should only be single 1 per column.
I'm looking for a more numpy way to do this (i.e. minimising or, better, avoiding the loops).
Your array:
[[1, 0, 0, 1, 0, 0],
[1, 1, 0, 0, 1, 0],
[0, 1, 1, 0, 0, 1],
[0, 0, 0, 1, 1, 1]]
Transpose it with numpy:
a = np.transpose(your_array)
Now it looks like this:
[[1, 1, 0, 0],
[0, 1, 1, 0],
[0, 0, 1, 0],
[1, 0, 0, 1],
[0, 1, 0, 1],
[0, 0, 1, 1]]
Zero all the non-zero (and "not upper") elements row wise:
res = np.zeros(a.shape, dtype="int64")
idx = np.arange(res.shape[0])
args = a.astype(bool).argmax(1)
res[idx, args] = a[idx, args]
The output of res is this:
#### Output
[[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0]]
Re-transpose your array:
a = np.transpose(res)
[[1, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 1, 0],
[0, 0, 1, 0, 0, 1],
[0, 0, 0, 0, 0, 0]])
EDIT: Thanks to #The.B for the tip
An alternative solution is to do a forward fill followed by the cumulative sum and then replace all values which are not 1 with 0:
a = np.array([[1, 0, 0, 1, 0, 0],
[1, 1, 0, 0, 1, 0],
[0, 1, 1, 0, 0, 1],
[0, 0, 0, 1, 1, 1]])
ff = np.maximum.accumulate(a, axis=0)
cs = np.cumsum(ff, axis=0)
cs[cs > 1] = 0
Output in cs:
array([[1, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 1, 0],
[0, 0, 1, 0, 0, 1],
[0, 0, 0, 0, 0, 0]])
EDIT
This will do the same thing and should be slightly more efficient:
ff = np.maximum.accumulate(a, axis=0)
ff ^ np.pad(ff, ((1,0), (0,0)))[:-1]
Output:
array([[1, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 1, 0],
[0, 0, 1, 0, 0, 1],
[0, 0, 0, 0, 0, 0]])
And if you want to do the operations in-place to avoid temporary memory allocation:
out = np.zeros((a.shape[0]+1, a.shape[1]), dtype=a.dtype)
np.maximum.accumulate(a, axis=0, out=out[1:])
out[:-1] ^ out[1:]
Output:
array([[1, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 1, 0],
[0, 0, 1, 0, 0, 1],
[0, 0, 0, 0, 0, 0]])
You can traverse through each column of array and check if it is the first one -
If Not: Make it 0
for col in a.T:
f=0
for x in col:
if(x==1 and f==0):
f=1
else:
x=0
I have a tensor with three dimensions and three classes (0: background, 1: first class, 2: second class). I would like to find connected clusters and assign outlier's labels by performing a majority vote. A 2D example:
import numpy as np
data = np.array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 1, 2],
[1, 2, 0, 0, 2, 2, 2],
[0, 1, 0, 0, 0, 2, 0],
[0, 0, 0, 0, 0, 0, 0],])
should be changed to
data = np.array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 2, 2],
[1, 1, 0, 0, 2, 2, 2],
[0, 1, 0, 0, 0, 2, 0],
[0, 0, 0, 0, 0, 0, 0],])
It is enough to see connected regions as one cluster an count the appearence of the labels. I am not looking for any machine learning method.
You can use scipy.ndimage.measurements.label to find the connected components and then use np.bincount for the counting
from scipy.ndimage import measurements
lbl,ncl = measurements.label(data)
lut = np.bincount((data+2*lbl).ravel(),None,2*ncl+3)[1:].reshape(-1,2).argmax(1)+1
lut[0] = 0
lut[lbl]
# array([[0, 0, 0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0, 0, 0],
# [0, 1, 0, 0, 0, 0, 0],
# [1, 1, 1, 0, 0, 2, 2],
# [1, 1, 0, 0, 2, 2, 2],
# [0, 1, 0, 0, 0, 2, 0],
# [0, 0, 0, 0, 0, 0, 0]])
import numpy
import sympy
n = 7
k = 3
X = numpy.random.randn(n,k)
Px = X#numpy.linalg.inv(numpy.transpose(X)#X)#numpy.transpose(X) #X(X'X)^(-1)X'
print(sympy.Matrix(Px).rref())
As you may verify yourself, Px is singular. However, sympy.rref() returns this:
(Matrix([[1, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 1]]), (0, 1, 2, 3, 4, 5, 6))
Why doesn't it return the real rref? I read somewhere I could pass simplify=True, however it didn't make any difference.
In [49]: Px
Out[49]:
array([[ 0.5418898 , 0.44245552, 0.04973693, -0.06834885, -0.19086119,
-0.07003176, 0.06325021],...
[ 0.06325021, -0.11080081, 0.21656224, -0.07445145, -0.28634725,
0.06648907, 0.19199866]])
In [50]: np.linalg.det(Px)
Out[50]: 2.141647537907433e-67
In [51]: np.linalg.inv(Px)
Out[51]:
array([[-7.18788695e+15, 4.95655702e+15, 7.52738018e+15,
-4.40875311e+15, -1.64015565e+16, 2.63785320e+15,
-3.03465003e+16],
[ 1.59176426e+16, ....
[ 3.31636798e+16, -3.39094560e+16, -3.60287970e+16,
-1.27160460e+16, 2.14338015e+16, 3.32345350e+15,
3.60287970e+16]])
Your Px is close to singular, but not exactly so. Contrast that with
In [52]: M = np.arange(9).reshape(3,3)
In [53]: np.linalg.det(M)
Out[53]: 0.0
In [55]: np.linalg.inv(M)
LinAlgError: Singular matrix
In [56]: sympy.Matrix(M).rref()
Out[56]:
(Matrix([
[1, 0, -1],
[0, 1, 2],
[0, 0, 0]]), (0, 1))
Numerically speaking your Px is not singular, just close:
In [57]: sympy.Matrix(Px).rref()
Out[57]:
(Matrix([
[1, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 1]]), (0, 1, 2, 3, 4, 5, 6))
But with a custom iszerofunc:
In [58]: sympy.Matrix(Px).rref(iszerofunc=lambda x: abs(x)<1e-16)
Out[58]:
(Matrix([
[1, 0, 0, 0.647383887198708, -1.91409951634531, -1.43377991000974, 0.578981680134581],
[0, 1, 0, -0.839184067893959, 1.88998490600173, 1.43367640627271, -0.611620902311026],
[0, 0, 1, -0.962221703397948, 0.203783478612254, 1.45929622452135, 0.404548167005728],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]]),
(0, 1, 2))
I've got an np.array 219 by 219 with mostly 0s and 2% of nonzeros and I know want to create new arrays where each of the nonzero values has 90% of chance of becoming a zero.
I now know how to change the n-th non zero value to 0 but how to work with probabilities?
Probably this can be modified:
index=0
for x in range(0, 219):
for y in range(0, 219):
if (index+1) % 10 == 0:
B[x][y] = 0
index+=1
print(B)
You could use np.random.random to create an array of random numbers to compare with 0.9, and then use np.where to select either the original value or 0. Since each draw is independent, it doesn't matter if we replace a 0 with a 0, so we don't need to treat zero and nonzero values differently. For example:
In [184]: A = np.random.randint(0, 2, (8,8))
In [185]: A
Out[185]:
array([[1, 1, 1, 0, 0, 0, 0, 1],
[1, 1, 1, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 1, 0, 0, 0, 1],
[0, 1, 0, 1, 1, 1, 1, 0],
[1, 1, 0, 1, 1, 0, 0, 0],
[1, 0, 0, 1, 0, 0, 1, 0],
[1, 1, 0, 0, 0, 1, 0, 1]])
In [186]: np.where(np.random.random(A.shape) < 0.9, 0, A)
Out[186]:
array([[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0]])
# first method
prob=0.3
print(np.random.choice([2,5], (5,), p=[prob,1-prob]))
# second method (i prefer)
import random
import numpy as np
def randomZerosOnes(a,b, N, prob):
if prob > 1-prob:
n1=int((1-prob)*N)
n0=N-n1
else:
n0=int(prob*N)
n1=N-n0
zo=np.concatenate(([a for _ in range(n0)] ,[b for _ in range(n1)] ), axis=0 )
random.shuffle(zo)
return zo
zo=randomZerosOnes(2,5, N=5, prob=0.3)
print(zo)