I have a Python ndarray that is Boolean and has this shape: (32, 1600, 1600). Each layer (I visualize the 1600x1600 as a layer and the 32 as 32 layers), from 1-32, has a varying number of True and the Trues might be located at different index in its respective 2D layer. I want to "flatten" (don't know if that's the right term) this from 32 to 1, so that the resulting array is (1600, 1600) with every True carrying over to its corresponding place in the result. This is a very simplified example of my starting multidimensional array. The array is Boolean, with True/False, but I used 0 for False and 1 for True in this example:
array([[[0, 0, 0, 0, 0],
[0, 1, 0, 1, 0], #two "Trues"
[0, 1, 0, 1, 0], #two "Trues"
[0, 1, 0, 0, 0], #one "True"
[0, 0, 0, 0, 0]],
[[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 1, 0, 1, 0], #two "Trues"
[0, 1, 0, 1, 0], #two "Trues"
[0, 0, 0, 0, 0]]])
I want the final array to look like this. Every True location is carried over to its corresponding place in the new array. Since it's Boolean, it shouldn't be cumulative.
array([[[0, 0, 0, 0, 0],
[0, 1, 0, 1, 0], #two "Trues"
[0, 1, 0, 1, 0], #two "Trues"
[0, 1, 0, 1, 0], #two "Trues"
[0, 0, 0, 0, 0]]])
Since you have an array of booleans, let's create one.
a = np.array([[[0, 0, 0, 0, 0],
[0, 1, 0, 1, 0],
[0, 1, 0, 1, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 0, 0]],
[[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 1, 0, 1, 0],
[0, 1, 0, 1, 0],
[0, 0, 0, 0, 0]]], dtype=bool)
You want to perform the or operation along the first axis. To do this, you can simply take the sum along the zeroth axis, and find if the elements are greater than zero.
a.sum(axis=0) > 0
gives:
array([[False, False, False, False, False],
[False, True, False, True, False],
[False, True, False, True, False],
[False, True, False, True, False],
[False, False, False, False, False]])
To convert it to an integer array, simply multiply this by 1:
1 * (a.sum(axis=0) > 0)
gives:
array([[0, 0, 0, 0, 0],
[0, 1, 0, 1, 0],
[0, 1, 0, 1, 0],
[0, 1, 0, 1, 0],
[0, 0, 0, 0, 0]])
You could try this:
import numpy as np
arr = np.array([[[0, 0, 0, 0, 0],
[0, 1, 0, 1, 0],
[0, 1, 0, 1, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 0, 0]],
[[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 1, 0, 1, 0],
[0, 1, 0, 1, 0],
[0, 0, 0, 0, 0]]], dtype=bool)
result = 1*np.logical_or.reduce(arr, axis=0)
I believe this is what you're looking for, and I think it should be easy enough to convert this back to True / False as you described. If you need any help please comment.
arr = [[[0, 0, 0, 0, 0],
[0, 1, 0, 1, 0],
[0, 1, 0, 1, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 0, 0]],
[[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 1, 0, 1, 0],
[0, 1, 0, 1, 0],
[0, 0, 0, 0, 0]]]
new_arr = [ [ 1 if arr[0][i][j] == 1 or arr[1][i][j] == 1 else 0 for j in range(len(arr[0][0])) ] for i in range(len(arr[0])) ]
print(new_arr)
Output:
[[0, 0, 0, 0, 0],
[0, 1, 0, 1, 0],
[0, 1, 0, 1, 0],
[0, 1, 0, 1, 0],
[0, 0, 0, 0, 0]]
Related
I have an array a of ones and zeroes (it might be rather big)
a = np.array([[1, 0, 0, 1, 0, 0],
[1, 1, 0, 0, 1, 0],
[0, 1, 1, 0, 0, 1],
[0, 0, 0, 1, 1, 1])
in which the "upper" rows are more "important" in the sense that if there is 1 in any column of the i-th row, then all ones in that columns in the following rows must be zeroed.
So, the desired output should be:
array([[1, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 1, 0],
[0, 0, 1, 0, 0, 1],
[0, 0, 0, 0, 0, 0]])
In other words, there should only be single 1 per column.
I'm looking for a more numpy way to do this (i.e. minimising or, better, avoiding the loops).
Your array:
[[1, 0, 0, 1, 0, 0],
[1, 1, 0, 0, 1, 0],
[0, 1, 1, 0, 0, 1],
[0, 0, 0, 1, 1, 1]]
Transpose it with numpy:
a = np.transpose(your_array)
Now it looks like this:
[[1, 1, 0, 0],
[0, 1, 1, 0],
[0, 0, 1, 0],
[1, 0, 0, 1],
[0, 1, 0, 1],
[0, 0, 1, 1]]
Zero all the non-zero (and "not upper") elements row wise:
res = np.zeros(a.shape, dtype="int64")
idx = np.arange(res.shape[0])
args = a.astype(bool).argmax(1)
res[idx, args] = a[idx, args]
The output of res is this:
#### Output
[[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0]]
Re-transpose your array:
a = np.transpose(res)
[[1, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 1, 0],
[0, 0, 1, 0, 0, 1],
[0, 0, 0, 0, 0, 0]])
EDIT: Thanks to #The.B for the tip
An alternative solution is to do a forward fill followed by the cumulative sum and then replace all values which are not 1 with 0:
a = np.array([[1, 0, 0, 1, 0, 0],
[1, 1, 0, 0, 1, 0],
[0, 1, 1, 0, 0, 1],
[0, 0, 0, 1, 1, 1]])
ff = np.maximum.accumulate(a, axis=0)
cs = np.cumsum(ff, axis=0)
cs[cs > 1] = 0
Output in cs:
array([[1, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 1, 0],
[0, 0, 1, 0, 0, 1],
[0, 0, 0, 0, 0, 0]])
EDIT
This will do the same thing and should be slightly more efficient:
ff = np.maximum.accumulate(a, axis=0)
ff ^ np.pad(ff, ((1,0), (0,0)))[:-1]
Output:
array([[1, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 1, 0],
[0, 0, 1, 0, 0, 1],
[0, 0, 0, 0, 0, 0]])
And if you want to do the operations in-place to avoid temporary memory allocation:
out = np.zeros((a.shape[0]+1, a.shape[1]), dtype=a.dtype)
np.maximum.accumulate(a, axis=0, out=out[1:])
out[:-1] ^ out[1:]
Output:
array([[1, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 1, 0],
[0, 0, 1, 0, 0, 1],
[0, 0, 0, 0, 0, 0]])
You can traverse through each column of array and check if it is the first one -
If Not: Make it 0
for col in a.T:
f=0
for x in col:
if(x==1 and f==0):
f=1
else:
x=0
I have initialized a numpy nd array like the following
arr = np.zeros((6, 6))
This empty array is passed as an input argument to a function,
def fun(arr):
arr.append(1) # this works for arr = [] initialization
return arr
for i in range(0,12):
fun(arr)
But append doesn't work for nd array. I want to fill up the elements of the nd array row-wise.
Is there any way to use a python scalar index for the nd array? I could increment this index every time fun is called and append elements to arr
Any suggestions?
In [523]: arr = np.zeros((6,6),int)
In [524]: arr
Out[524]:
array([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]])
In [525]: arr[0] = 1
In [526]: arr
Out[526]:
array([[1, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]])
In [527]: arr[1] = [1,2,3,4,5,6]
In [528]: arr[2,3:] = 2
In [529]: arr
Out[529]:
array([[1, 1, 1, 1, 1, 1],
[1, 2, 3, 4, 5, 6],
[0, 0, 0, 2, 2, 2],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]])
I have a tensor with three dimensions and three classes (0: background, 1: first class, 2: second class). I would like to find connected clusters and assign outlier's labels by performing a majority vote. A 2D example:
import numpy as np
data = np.array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 1, 2],
[1, 2, 0, 0, 2, 2, 2],
[0, 1, 0, 0, 0, 2, 0],
[0, 0, 0, 0, 0, 0, 0],])
should be changed to
data = np.array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 2, 2],
[1, 1, 0, 0, 2, 2, 2],
[0, 1, 0, 0, 0, 2, 0],
[0, 0, 0, 0, 0, 0, 0],])
It is enough to see connected regions as one cluster an count the appearence of the labels. I am not looking for any machine learning method.
You can use scipy.ndimage.measurements.label to find the connected components and then use np.bincount for the counting
from scipy.ndimage import measurements
lbl,ncl = measurements.label(data)
lut = np.bincount((data+2*lbl).ravel(),None,2*ncl+3)[1:].reshape(-1,2).argmax(1)+1
lut[0] = 0
lut[lbl]
# array([[0, 0, 0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0, 0, 0],
# [0, 1, 0, 0, 0, 0, 0],
# [1, 1, 1, 0, 0, 2, 2],
# [1, 1, 0, 0, 2, 2, 2],
# [0, 1, 0, 0, 0, 2, 0],
# [0, 0, 0, 0, 0, 0, 0]])
import numpy
import sympy
n = 7
k = 3
X = numpy.random.randn(n,k)
Px = X#numpy.linalg.inv(numpy.transpose(X)#X)#numpy.transpose(X) #X(X'X)^(-1)X'
print(sympy.Matrix(Px).rref())
As you may verify yourself, Px is singular. However, sympy.rref() returns this:
(Matrix([[1, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 1]]), (0, 1, 2, 3, 4, 5, 6))
Why doesn't it return the real rref? I read somewhere I could pass simplify=True, however it didn't make any difference.
In [49]: Px
Out[49]:
array([[ 0.5418898 , 0.44245552, 0.04973693, -0.06834885, -0.19086119,
-0.07003176, 0.06325021],...
[ 0.06325021, -0.11080081, 0.21656224, -0.07445145, -0.28634725,
0.06648907, 0.19199866]])
In [50]: np.linalg.det(Px)
Out[50]: 2.141647537907433e-67
In [51]: np.linalg.inv(Px)
Out[51]:
array([[-7.18788695e+15, 4.95655702e+15, 7.52738018e+15,
-4.40875311e+15, -1.64015565e+16, 2.63785320e+15,
-3.03465003e+16],
[ 1.59176426e+16, ....
[ 3.31636798e+16, -3.39094560e+16, -3.60287970e+16,
-1.27160460e+16, 2.14338015e+16, 3.32345350e+15,
3.60287970e+16]])
Your Px is close to singular, but not exactly so. Contrast that with
In [52]: M = np.arange(9).reshape(3,3)
In [53]: np.linalg.det(M)
Out[53]: 0.0
In [55]: np.linalg.inv(M)
LinAlgError: Singular matrix
In [56]: sympy.Matrix(M).rref()
Out[56]:
(Matrix([
[1, 0, -1],
[0, 1, 2],
[0, 0, 0]]), (0, 1))
Numerically speaking your Px is not singular, just close:
In [57]: sympy.Matrix(Px).rref()
Out[57]:
(Matrix([
[1, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 1]]), (0, 1, 2, 3, 4, 5, 6))
But with a custom iszerofunc:
In [58]: sympy.Matrix(Px).rref(iszerofunc=lambda x: abs(x)<1e-16)
Out[58]:
(Matrix([
[1, 0, 0, 0.647383887198708, -1.91409951634531, -1.43377991000974, 0.578981680134581],
[0, 1, 0, -0.839184067893959, 1.88998490600173, 1.43367640627271, -0.611620902311026],
[0, 0, 1, -0.962221703397948, 0.203783478612254, 1.45929622452135, 0.404548167005728],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]]),
(0, 1, 2))
I've got an np.array 219 by 219 with mostly 0s and 2% of nonzeros and I know want to create new arrays where each of the nonzero values has 90% of chance of becoming a zero.
I now know how to change the n-th non zero value to 0 but how to work with probabilities?
Probably this can be modified:
index=0
for x in range(0, 219):
for y in range(0, 219):
if (index+1) % 10 == 0:
B[x][y] = 0
index+=1
print(B)
You could use np.random.random to create an array of random numbers to compare with 0.9, and then use np.where to select either the original value or 0. Since each draw is independent, it doesn't matter if we replace a 0 with a 0, so we don't need to treat zero and nonzero values differently. For example:
In [184]: A = np.random.randint(0, 2, (8,8))
In [185]: A
Out[185]:
array([[1, 1, 1, 0, 0, 0, 0, 1],
[1, 1, 1, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 1, 0, 0, 0, 1],
[0, 1, 0, 1, 1, 1, 1, 0],
[1, 1, 0, 1, 1, 0, 0, 0],
[1, 0, 0, 1, 0, 0, 1, 0],
[1, 1, 0, 0, 0, 1, 0, 1]])
In [186]: np.where(np.random.random(A.shape) < 0.9, 0, A)
Out[186]:
array([[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0]])
# first method
prob=0.3
print(np.random.choice([2,5], (5,), p=[prob,1-prob]))
# second method (i prefer)
import random
import numpy as np
def randomZerosOnes(a,b, N, prob):
if prob > 1-prob:
n1=int((1-prob)*N)
n0=N-n1
else:
n0=int(prob*N)
n1=N-n0
zo=np.concatenate(([a for _ in range(n0)] ,[b for _ in range(n1)] ), axis=0 )
random.shuffle(zo)
return zo
zo=randomZerosOnes(2,5, N=5, prob=0.3)
print(zo)