Related
Suppose we have a numpy array of numpy arrays of zeros as
arr1=np.zeros((len(Train),(L))
where Train is a (dataset) numpy array of arrays of integers of fixed length.
We also have another 1d numpy array, positions of length as len(Train).
Now we wish to add elements of Train to arr1 at the positions specified by positions.
One way is to use a for loop on the Train array as:
k=len(Train[0])
for i in range(len(Train)):
arr1[i,int(positions[i]):int((positions[i]+k))]=Train[i,0:k])]
However, going over the entire Train set using the explicit for loop is slow and I would like to optimize it.
Here is one way by generating all the indexes you want to assign to. Setup:
import numpy as np
n = 12 # Number of training samples
l = 8 # Number of columns in the output array
k = 4 # Number of columns in the training samples
arr = np.zeros((n, l), dtype=int)
train = np.random.randint(10, size=(n, k))
positions = np.random.randint(l - k, size=n)
Random example data:
>>> train
array([[3, 4, 3, 2],
[3, 6, 4, 1],
[0, 7, 9, 6],
[4, 0, 4, 8],
[2, 2, 6, 2],
[4, 5, 1, 7],
[5, 4, 4, 4],
[0, 8, 5, 3],
[2, 9, 3, 3],
[3, 3, 7, 9],
[8, 9, 4, 8],
[8, 7, 6, 4]])
>>> positions
array([3, 2, 3, 2, 0, 1, 2, 2, 3, 2, 1, 1])
Advanced indexing with broadcasting trickery:
rows = np.arange(n)[:, None] # Shape (n, 1)
cols = np.arange(k) + positions[:, None] # Shape (n, k)
arr[rows, cols] = train
output:
>>> arr
array([[0, 0, 0, 3, 4, 3, 2, 0],
[0, 0, 3, 6, 4, 1, 0, 0],
[0, 0, 0, 0, 7, 9, 6, 0],
[0, 0, 4, 0, 4, 8, 0, 0],
[2, 2, 6, 2, 0, 0, 0, 0],
[0, 4, 5, 1, 7, 0, 0, 0],
[0, 0, 5, 4, 4, 4, 0, 0],
[0, 0, 0, 8, 5, 3, 0, 0],
[0, 0, 0, 2, 9, 3, 3, 0],
[0, 0, 3, 3, 7, 9, 0, 0],
[0, 8, 9, 4, 8, 0, 0, 0],
[0, 8, 7, 6, 4, 0, 0, 0]])
Another, similar post called Flood Fill in Python is a very general question on flood fill and the answer only contains a broad pseudo code example. I'm look for an explicit solution with numpy or scipy.
Let's take this array for example:
a = np.array([
[0, 1, 1, 1, 1, 0],
[0, 0, 1, 2, 1, 1],
[0, 1, 1, 1, 1, 0]
])
For selecting element 0, 0 and flood fill with value 3, I'd expect:
[
[3, 1, 1, 1, 1, 0],
[3, 3, 1, 2, 1, 1],
[3, 1, 1, 1, 1, 0]
]
For selecting element 0, 1 and flood fill with value 3, I'd expect:
[
[0, 3, 3, 3, 3, 0],
[0, 0, 3, 2, 3, 3],
[0, 3, 3, 3, 3, 0]
]
For selecting element 0, 5 and flood fill with value 3, I'd expect:
[
[0, 1, 1, 1, 1, 3],
[0, 0, 1, 2, 1, 1],
[0, 1, 1, 1, 1, 0]
]
This should be a fairly basic operation, no? Which numpy or scipy method am I overlooking?
Approach #1
Module scikit-image offers the built-in to do the same with skimage.segmentation.flood_fill -
from skimage.morphology import flood_fill
flood_fill(image, (x, y), newval)
Sample runs -
In [17]: a
Out[17]:
array([[0, 1, 1, 1, 1, 0],
[0, 0, 1, 2, 1, 1],
[0, 1, 1, 1, 1, 0]])
In [18]: flood_fill(a, (0, 0), 3)
Out[18]:
array([[3, 1, 1, 1, 1, 0],
[3, 3, 1, 2, 1, 1],
[3, 1, 1, 1, 1, 0]])
In [19]: flood_fill(a, (0, 1), 3)
Out[19]:
array([[0, 3, 3, 3, 3, 0],
[0, 0, 3, 2, 3, 3],
[0, 3, 3, 3, 3, 0]])
In [20]: flood_fill(a, (0, 5), 3)
Out[20]:
array([[0, 1, 1, 1, 1, 3],
[0, 0, 1, 2, 1, 1],
[0, 1, 1, 1, 1, 0]])
Approach #2
We can use skimage.measure.label with some array-masking -
from skimage.measure import label
def floodfill_by_xy(a,xy,newval):
x,y = xy
l = label(a==a[x,y])
a[l==l[x,y]] = newval
return a
To make use of SciPy based label function - scipy.ndimage.measurements.label, it would mostly be the same -
from scipy.ndimage.measurements import label
def floodfill_by_xy_scipy(a,xy,newval):
x,y = xy
l = label(a==a[x,y])[0]
a[l==l[x,y]] = newval
return a
Note : These would work as in-situ edits.
Can someone help me out with skimage.measure.regionprops? The documentation was confusing to me in describing the list of properties that regionprops provides.
I would like to do the following:
Query a point (x,y) and return which labeled area the point belongs in.
Get an ndarray of all points within a labeled area.
Here is some code showing what I have so far:
import numpy as np
from skimage.measure import label
import matplotlib.pyplot as plt
arr = np.array([[1, 0, 1, 0, 0, 0, 1],
[1, 1, 1, 0, 0, 0, 1],
[0, 1, 1, 0, 0, 0, 1],
[0, 1, 1, 0, 0, 1, 1],
[0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 1, 1, 1, 1],
[1, 0, 0, 1, 1, 1, 1],
[1, 0, 0, 1, 1, 1, 1],
[1, 0, 0, 1, 1, 1, 1]])
img = label(arr)
plt.imshow(img)
plt.show()
Examples of what I want to do are making a query to arr[8][6] and knowing which label it is a part of (green) and to know all of the points that belong to an arbitrary label (like green).
The numeric label of any pixel can be retrieved by indexing img:
In [67]: row, col = 8, 6
In [68]: index = img[row, col]
In [69]: print(f'The label of pixel [{row}, {col}] is {index}')
The label of pixel [8, 6] is 2
The you could use NumPy's nonzero to get the coordinates of all the pixels with the same label:
In [70]: coords = np.nonzero(img == index)
In [71]: coords
Out[71]:
(array([0, 1, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8], dtype=int32),
array([6, 6, 6, 5, 6, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6], dtype=int32))
In [72]: out = np.zeros(shape = arr.shape + (3,), dtype=np.uint8)
In [73]: out[coords] = [0, 255, 0] # green
In [74]: plt.imshow(out)
Out[74]: <matplotlib.image.AxesImage at 0x11a2ec10>
How can i get the sorted indices of a numpy array (distance), only considering certain indices from another numpy array (val).
For example, consider the two numpy arrays val and distance below:
val = np.array([[10, 0, 0, 0, 0],
[0, 0, 10, 0, 10],
[0, 10, 10, 0, 0],
[0, 0, 0, 10, 0],
[0, 0, 0, 0, 0]])
distance = np.array([[4, 3, 2, 3, 4],
[3, 2, 1, 2, 3],
[2, 1, 0, 1, 2],
[3, 2, 1, 2, 3],
[4, 3, 2, 3, 4]])
the distances where val == 10 are 4, 1, 3, 1, 0, 2. I would like to get these sorted to be 0, 1, 1, 2, 3, 4 and return the respective indices from distance array.
Returning something like:
(array([2, 1, 2, 3, 1, 0], dtype=int64), array([2, 2, 1, 3, 4, 0], dtype=int64))
or:
(array([2, 2, 1, 3, 1, 0], dtype=int64), array([2, 1, 2, 3, 4, 0], dtype=int64))
since the second and third element both have distance '1', so i guess the indices can be interchangable.
Tried using combinations of np.where, np.argsort, np.argpartition, np.unravel_index but cant seem to get it working right
Here's one way with masking -
In [20]: mask = val==10
In [21]: np.argwhere(mask)[distance[mask].argsort()]
Out[21]:
array([[2, 2],
[1, 2],
[2, 1],
[3, 3],
[1, 4],
[0, 0]])
I have a 2D boolean numpy array that represents an image, on which I call skimage.measure.label to label each segmented region, giving me a 2D array of int [0,500]; each value in this array represents the region label for that pixel. I would like to now remove the smallest regions. For example, if my input array is shape (n, n), I would like all labeled regions of < m pixels to be subsumed into the larger surrounding regions. For example if n=10 and m=5, my input could be,
0, 0, 0, 0, 0, 0, 0, 1, 1, 1
0, 0, 0, 0, 0, 0, 0, 1, 1, 1
0, 0, 7, 8, 0, 0, 0, 1, 1, 1
0, 0, 0, 0, 0, 0, 0, 1, 1, 1
0, 0, 0, 0, 0, 2, 2, 2, 1, 1
4, 4, 4, 4, 2, 2, 2, 2, 1, 1
4, 6, 6, 4, 2, 2, 2, 3, 3, 3
4, 6, 6, 4, 5, 5, 5, 3, 3, 5
4, 4, 4, 4, 5, 5, 5, 5, 5, 5
4, 4, 4, 4, 5, 5, 5, 5, 5, 5
and the output is then,
0, 0, 0, 0, 0, 0, 0, 1, 1, 1
0, 0, 0, 0, 0, 0, 0, 1, 1, 1
0, 0, 0, 0, 0, 0, 0, 1, 1, 1 # 7 and 8 are replaced by 0
0, 0, 0, 0, 0, 0, 0, 1, 1, 1
0, 0, 0, 0, 0, 2, 2, 2, 1, 1
4, 4, 4, 4, 2, 2, 2, 2, 1, 1
4, 4, 4, 4, 2, 2, 2, 3, 3, 3 # 6 is gone, but 3 remains
4, 4, 4, 4, 5, 5, 5, 3, 3, 5
4, 4, 4, 4, 5, 5, 5, 5, 5, 5
4, 4, 4, 4, 5, 5, 5, 5, 5, 5
I've looked into skimage morphology operations, including binary closing, but none seem to work well for my use case. Any suggestions?
You can do this by performing a binary dilation on the boolean region corresponding to each label. By doing this you will find the number of neighbours for each region. Using this you can then replace values as needed.
For an example code:
import numpy as np
import scipy.ndimage
m = 5
arr = [[0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 7, 8, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 2, 2, 2, 1, 1],
[4, 4, 4, 4, 2, 2, 2, 2, 1, 1],
[4, 6, 6, 4, 2, 2, 2, 3, 3, 3],
[4, 6, 6, 4, 5, 5, 5, 3, 3, 5],
[4, 4, 4, 4, 5, 5, 5, 5, 5, 5],
[4, 4, 4, 4, 5, 5, 5, 5, 5, 5]]
arr = np.array(arr)
nval = np.max(arr) + 1
# Compute number of occurances of each number
counts, _ = np.histogram(arr, bins=range(nval + 1))
# Compute the set of neighbours for each number via binary dilation
c = np.array([scipy.ndimage.morphology.binary_dilation(arr == i)
for i in range(nval)])
# Loop over the set of arrays with bad count and update them to the most common
# neighbour
for i in filter(lambda i: counts[i] < m, range(nval)):
arr[arr == i] = np.argmax(np.sum(c[:, arr == i], axis=1))
Which gives the expected result:
>>> arr.tolist()
[[0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 2, 2, 2, 1, 1],
[4, 4, 4, 4, 2, 2, 2, 2, 1, 1],
[4, 4, 4, 4, 2, 2, 2, 3, 3, 3],
[4, 4, 4, 4, 5, 5, 5, 3, 3, 5],
[4, 4, 4, 4, 5, 5, 5, 5, 5, 5],
[4, 4, 4, 4, 5, 5, 5, 5, 5, 5]]