How to replace the list in numpy list? - python

I'm currently working on one project where I need to quantize the image. First, I'm reading the image using skimage, and the shape of it is (825, 1100, 3). Image array looks like this:
[[[ 43 78 48]
[ 43 78 48]
[ 43 78 48]
...
[ 5 24 18]
[ 5 24 18]
[ 4 23 17]]
[[ 43 78 48]
[ 43 78 48]
[ 43 78 48]
...
[ 5 24 18]
[ 5 24 18]
[ 4 23 17]]
[[ 43 78 48]
[ 43 78 48]
[ 43 78 48]
...
[ 5 24 18]
[ 4 23 17]
[ 4 23 17]]
...
[[ 99 143 45]
[ 99 143 45]
[ 98 142 44]
...
[102 145 38]
[100 146 38]
[100 146 38]]
[[ 99 143 45]
[ 99 143 45]
[ 99 143 45]
...
[103 146 39]
[100 146 38]
[ 99 145 37]]
[[ 97 142 41]
[ 98 143 42]
[ 99 144 43]
...
[100 146 38]
[ 99 145 37]
[ 99 145 37]]]
Then I apply K-means to quantize the image and decrease the colors in it, and I call that arrary less_colors which also has the same shape of (825, 1100, 3). The output is:
[[[ 29 48 30]
[ 29 48 30]
[ 29 48 30]
...
[ 29 48 30]
[ 29 48 30]
[ 29 48 30]]
[[ 29 48 30]
[ 29 48 30]
[ 29 48 30]
...
[ 29 48 30]
[ 29 48 30]
[ 29 48 30]]
[[ 29 48 30]
[ 29 48 30]
[ 29 48 30]
...
[ 29 48 30]
[ 29 48 30]
[ 29 48 30]]
...
[[111 137 58]
[111 137 58]
[111 137 58]
...
[111 137 58]
[111 137 58]
[111 137 58]]
[[111 137 58]
[111 137 58]
[111 137 58]
...
[111 137 58]
[111 137 58]
[111 137 58]]
[[111 137 58]
[111 137 58]
[111 137 58]
...
[111 137 58]
[111 137 58]
[111 137 58]]]
I have another variable called first which is a list that is [30, 48, 29].
I would like to change the row of less_colors array into a different array (let's say [0, 0, 0]) if it contains the array called first.
I have tried NumPy, but my code does not work.
less_colors[np.where((less_colors == first).all(axis=2))] = [0,0,0]
The complete code:
import cv2
img = io.imread('dog.jpg')
less_colors[(less_colors[:, :] == first).all(axis=2)] = [0, 0, 0]
io.imshow(less_colors)
plt.show()

Short answer:
This was already answered in comments, however, here goes the complete answer:
less_color[(less_color==first).all(axis=2)] = 0
What's goning on?
less_color==first returns a boolean mask which is True only for the indexes where the condition is met. This is a matrix with the same shape as the image.
Next, the .all(axis=2) operation make sure that the condition is met for all the channels (the second axis): you want to overwrite iff three channels contain same value. This also returns a boolean mask, but now with only two dimensions, telling if each coordinate [i,j] accomplish the criteria at the three channels.
Then, we are using this mask to select only those pixels in the less_colors array: less_color[(less_color==first).all(axis=2)]
Finally, we assign those pixels with the desired value, overriding them with 0; note that this is equivalent to [0, 0, 0] due to numpy's broadcasting mechanism.
Small working example
import numpy as np
# create a small image with ones
less_color = np.ones((5,5,3))
# change one pixel with a different value
less_color[1,1] = 30, 40, 29
# This other should kep as is, since only 2 out of three match the required value
less_color[2,2] = 30, 40, 290
print(less_color)
print('='*10)
# the following line actually solves the question
less_color[(less_color==[30, 40, 29]).all(axis=2)] = 0
# check it out:
print(less_color)
Common error:
less_color[less_color==first] = 0 is not enough since it will also replace pixels with partial-matching, for instance, pixels with values like [10, 10, 29] will end up as [10, 10, 0] while they must not be changed.
Thanks #Aaron for your original and quickly answer.

So you want to map a new value to an old value. For your very case it is:
arr[np.all(arr == old_value, axis=-1)] = new_value
But you can create a general function to apply any mapping to any ndarray as follows:
def ndarray_apply_mapping(
arr, mapping, mask_function=lambda arr, target: arr == target
):
res = arr.copy()
for old_value, new_value in mapping.items():
mask = mask_function(arr, old_value)
res[mask] = new_value
return res
It will work on simpler cases:
import numpy as np
arr = np.array([0, 1, 2, 3, 4, 5])
mapping = {1: 10, 3: 30, 5: 50}
res = ndarray_apply_mapping(arr, mapping)
assert np.all(res == [0, 10, 2, 30, 4, 50])
But also on more complicated cases as yours.
Let's say you have an array with a limited set of RGB values (or cluster labels resulting from k-means, or whatever):
import numpy as np
H, W, C = 8, 16, 3
vmin, vmax = 0, 255
num_values = 10
values = np.random.randint(vmin, vmax, size=(num_values, C))
values_rnd_idxs = np.random.randint(0, num_values, size=(H, W))
arr = values[values_rnd_idxs]
assert arr.shape == (H, W, C)
And you have a mapping from some of those values to new values:
new_values = np.random.randint(vmin, vmax, size=(num_values // 3, C))
mapping = {tuple(old): tuple(new) for old, new in zip(values, new_values)}
You can use this mapping as follows:
res = ndarray_apply_mapping(
arr,
mapping,
mask_function=lambda arr, target: np.all(arr == target, axis=-1),
)
Plotting to see the result:
import matplotlib.pyplot as plt
fig, (ax_old, ax_new, ax_same) = plt.subplots(ncols=3)
ax_old.imshow(arr)
ax_new.imshow(res)
ax_same.imshow((res == arr).all(axis=-1), vmin=0, vmax=1, cmap="gray")
ax_old.set_title("Old")
ax_new.set_title("New")
ax_same.set_title("Matches")
plt.show()

I should have caught it earlier just from your example data, but [30, 48, 29] does not exist in your example data:
[[ 29 48 30]
[ 29 48 30]
[ 29 48 30]
...
[ 29 48 30]
[ 29 48 30]
[ 29 48 30]]
...
[[111 137 58]
[111 137 58]
[111 137 58]
Somewhere along the line you inverted the color channels (RGB to BGR), and tried to compare a BGR color against RGB data. The match and replace line I suggested in the comments only needs a small modification if you want to keep the first variable in reverse order:
less_colors[(less_colors[:,:] == first[::-1]).all(axis=2)] = [0,0,0]

Related

Getting each column in a 3d numpy array

I converted an image from RBG to CieLab, now I need to use the value of the cielab to calculate some equations.
I have been trying to get the value of each column in the array. For example if I have:
List =
[[[ 65 234 169]
[203 191 245]
[ 36 58 196]
[207 208 143]
[251 208 187]]
[[ 79 69 237]
[ 13 124 42]
[104 165 82]
[170 178 178]
[ 66 42 210]]
[[ 40 163 219]
[142 37 140]
[ 75 205 143]
[246 30 221]
[ 16 98 102]]]
How can I get it to give me the values of each columns like:
1st_column =
65
203
36
207
251
79
13
104
170
66
40
142
75
246
16
Thank you.
Try:
>>> m[:, :, 0]
array([[ 65, 203, 36, 207, 251],
[ 79, 13, 104, 170, 66],
[ 40, 142, 75, 246, 16]])
As suggested by #mozway, you can use the ellipsis syntax: m[..., 0].
To know more, read How do you use the ellipsis slicing syntax in Python?
You can also flatten your array:
>>> m[:, :, 0].flatten()
array([ 65, 203, 36, 207, 251, 79, 13, 104, 170, 66, 40, 142, 75, 246, 16])

In python with numpy, how can I update array from another array depend on column that exists in both?

So I have a source array like this:
[[ 9 85 32 100]
[ 7 80 30 100]
[ 2 90 16 100]
[ 6 120 22 100]
[ 5 105 17 100]
[ 0 100 33 100]
[ 3 110 22 100]
[ 4 80 22 100]
[ 8 115 19 100]
[ 1 95 28 100]]
and I want to update the array with this one, depend on the first column
[[ 3 110 22 105]
[ 5 105 17 110]
[ 1 95 28 115]]
to be like this
[[ 9 85 32 100]
[ 7 80 30 100]
[ 2 90 16 100]
[ 6 120 22 100]
[ 5 105 17 110]
[ 0 100 33 100]
[ 3 110 22 105]
[ 4 80 22 100]
[ 8 115 19 100]
[ 1 95 28 115]]
but I can't find a function in NumPy can do this directly, so currently have no way to do that better than this method I wrote:
def update_ary_with_ary(source, updates):
for x in updates:
index_of_col = np.argwhere(source[:,0] == x[0])
source[index_of_col] = x
This function makes a loop so it's not professional and not have high performance so I will use this until some-one give me a better way with NumPy laps, I don't want a solution from another lap, just Numpy
Assuming your source array is s and update array is u, and assuming that s and u are not huge, you can do:
update_row_ids = np.nonzero(s[:,0] == u[:,0].reshape(-1,1))[1]
s[update_row_ids] = u
Testing:
import numpy as np
s = np.array(
[[ 9, 85, 32, 100],
[ 7, 80, 30, 100],
[ 2, 90, 16, 100],
[ 6, 120, 22, 100],
[ 5, 105, 17, 100],
[ 0, 100, 33, 100],
[ 3, 110, 22, 100],
[ 4, 80, 22, 100],
[ 8, 115, 19, 100],
[ 1, 95, 28, 100]])
u = np.array(
[[ 3, 110, 22, 105],
[ 5, 105, 17, 110],
[ 1, 95, 28, 115]])
update_row_ids = np.nonzero(s[:,0] == u[:,0].reshape(-1,1))[1]
s[update_row_ids] = u
print(s)
This prints:
[[ 9 85 32 100]
[ 7 80 30 100]
[ 2 90 16 100]
[ 6 120 22 100]
[ 5 105 17 110]
[ 0 100 33 100]
[ 3 110 22 105]
[ 4 80 22 100]
[ 8 115 19 100]
[ 1 95 28 115]]
Edit:
OP has provided the following additional details:
The "source array" is "huge".
Each row in the "update array" matches
exactly one row in the "source array".
Based on this additional detail, the following alternative solution might provide a better performance, especially if the source array does not have its rows sorted on the first column:
sorted_idx = np.argsort(s[:,0])
pos = np.searchsorted(s[:,0],u[:,0],sorter=sorted_idx)
update_row_ids = sorted_idx[pos]
s[update_row_ids] = u
fountainhead your answer works correctly and yes it's full used Numpy laps, but in the performance test, it's rise the time on processing 50K rows in my simulation program in double!! from 22 seconds to 44 seconds!! I don't know why!! but your answer helps me to get the right answer on only this line:
source[updates[:,0]] = updates
# or
s[u[:,0]] = u
so when I use this its lower processing time from for 100K rows to only 0.5 seconds and then let me process more like 1M rows for only 5 seconds, am already learning python and data mining am shocked from these numbers, it's never happing before on other languages I play on the huge array like regular variables. you can see that on my GitHub.
https://github.com/qahmad81/war_simulation
fountainhead you should take the answer but visited should know the best answer to use.

get coordinate of each pixel in an image

i have an image read as a numpy array A shape(n,m,3)
A =
array([[[ 21, 38, 32],
[ 29, 46, 38],
[ 35, 52, 42],
...,
and i would to transform it in order to get the index/coordinate of each element in a new axis
B =
array([[[ 21, 38, 32, 0, 0],
[ 29, 46, 38, 0, 1],
[ 35, 52, 42, 0, 2],
...,
# in the form
B =
array([[[ R, G, B, px, py],
where
px= row index of the pixel
py= column index of the pixel
I coded this
B=np.zeros((n,m,5))
for x in range(n):
for y in range(m):
row=list(A[x,y,:])+[x,y]
B[x,y]=row
but it's taking to much time to iterate
have you a better way?
best regards
If you want an answer without imports:
array = np.array(img)
print(array.shape)
# (1080, 1920, 3)
zeros = np.zeros(array.shape[:2])
x_and_y = (np.dstack([(zeros.T + np.arange(0, array.shape[0])).T,
zeros + np.arange(0, array.shape[1])])
.astype('uint32'))
print(np.dstack([array, x_and_y]))
Outputting:
[[[39 86 101 0 0]
[39 86 101 0 1]
[39 86 101 0 2]
...
[11 114 123 0 1917]
[13 121 128 0 1918]
[13 121 128 0 1919]]
[[39 86 101 1 0]
[39 86 101 1 1]
[39 86 101 1 2]
...
[7 110 119 1 1917]
[19 127 134 1 1918]
[17 125 132 1 1919]]
...
[[46 136 154 1078 0]
[49 139 157 1078 1]
[46 143 159 1078 2]
...
[30 105 119 1078 1917]
[30 105 119 1078 1918]
[30 105 119 1078 1919]]
[[46 136 154 1079 0]
[49 139 157 1079 1]
[46 143 159 1079 2]
...
[30 105 119 1079 1917]
[30 105 119 1079 1918]
[30 105 119 1079 1919]]]
What I would do is to create the coordinate arrays and concatenate:
# random A
np.random.seed(1)
A = np.random.randint(0,256, (3,2,3))
from itertools import product
coords = np.array(list(product(np.arange(A.shape[0]),
np.arange(A.shape[1])))
).reshape(A.shape[:2]+(-1,))
B = np.concatenate((A,coords), axis=-1)
Output:
array([[[ 37, 235, 140, 0, 0]],
[[ 72, 255, 137, 1, 0]],
[[203, 133, 79, 2, 0]]])

Numpy slicing a fixed length on two axis based on different starting index given by two arrays

For example, I have nparray:
a = np.arange(48).reshape((3,4,4))
'''
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
[[16 17 18 19]
[20 21 22 23]
[24 25 26 27]
[28 29 30 31]]
[[32 33 34 35]
[36 37 38 39]
[40 41 42 43]
[44 45 46 47]]]
'''
I have two arrays that used as the starting point of slicing on axis=1, axis=2 respectively:
b1 = [0,1,2]
b2 = [1,0,0]
I want to achieve, a slicing like:
a[:,b1:b1+2, b2:b2+2] # but this syntax is wrong
To get
[
[
[1,2]
[5,6]
]
[
[20 21]
[24 25]
]
[
[40 41]
[44 45]
]
]
Please let me know if you know the proper syntax for doing this?
you can use the built-in functions enumerate with zip:
list(a[i][f:f+2, s:s+2].tolist() for i, (f, s) in enumerate(zip(b1, b2)))
output:
[[[1, 2], [5, 6]], [[20, 21], [24, 25]], [[40, 41], [44, 45]]]

How to get the N maximum values per row in a numpy ndarray?

We know how to do it when N = 1
import numpy as np
m = np.arange(15).reshape(3, 5)
m[xrange(len(m)), m.argmax(axis=1)] # array([ 4, 9, 14])
What is the best way to get the top N, when N > 1? (say, 5)
Doing a partial sort using np.partition can be much cheaper than a full sort:
gen = np.random.RandomState(0)
x = gen.permutation(100)
# full sort
print(np.sort(x)[-10:])
# [90 91 92 93 94 95 96 97 98 99]
# partial sort such that the largest 10 items are in the last 10 indices
print(np.partition(x, -10)[-10:])
# [90 91 93 92 94 96 98 95 97 99]
If you need the largest N items to be sorted, you can call np.sort on the last N elements in your partially sorted array:
print(np.sort(np.partition(x, -10)[-10:]))
# [90 91 92 93 94 95 96 97 98 99]
This can still be much faster than a full sort on the whole array, provided your array is sufficiently large.
To sort across each row of a two-dimensional array you can use the axis= arguments to np.partition and/or np.sort:
y = np.repeat(np.arange(100)[None, :], 5, 0)
gen.shuffle(y.T)
# partial sort, followed by a full sort of the last 10 elements in each row
print(np.sort(np.partition(y, -10, axis=1)[:, -10:], axis=1))
# [[90 91 92 93 94 95 96 97 98 99]
# [90 91 92 93 94 95 96 97 98 99]
# [90 91 92 93 94 95 96 97 98 99]
# [90 91 92 93 94 95 96 97 98 99]
# [90 91 92 93 94 95 96 97 98 99]]
Benchmarks:
In [1]: %%timeit x = np.random.permutation(10000000)
...: np.sort(x)[-10:]
...:
1 loop, best of 3: 958 ms per loop
In [2]: %%timeit x = np.random.permutation(10000000)
np.partition(x, -10)[-10:]
....:
10 loops, best of 3: 41.3 ms per loop
In [3]: %%timeit x = np.random.permutation(10000000)
np.sort(np.partition(x, -10)[-10:])
....:
10 loops, best of 3: 78.8 ms per loop
Why not do something like:
np.sort(m)[:,-N:]
partition, sort, argsort etc take an axis parameter
Let's shuffle some values
In [161]: A=np.arange(24)
In [162]: np.random.shuffle(A)
In [163]: A=A.reshape(4,6)
In [164]: A
Out[164]:
array([[ 1, 2, 4, 19, 12, 11],
[20, 5, 13, 21, 22, 3],
[10, 6, 16, 18, 17, 8],
[23, 9, 7, 0, 14, 15]])
Partition:
In [165]: A.partition(4,axis=1)
In [166]: A
Out[166]:
array([[ 2, 1, 4, 11, 12, 19],
[ 5, 3, 13, 20, 21, 22],
[ 6, 8, 10, 16, 17, 18],
[14, 7, 9, 0, 15, 23]])
the 4 smallest values of each row are first, the 2 largest last; slice to get an array of the 2 largest:
In [167]: A[:,-2:]
Out[167]:
array([[12, 19],
[21, 22],
[17, 18],
[15, 23]])
Sort is probably slower, but on a small array like this probably doesn't matter much. Plus it lets you pick any N.
In [169]: A.sort(axis=1)
In [170]: A
Out[170]:
array([[ 1, 2, 4, 11, 12, 19],
[ 3, 5, 13, 20, 21, 22],
[ 6, 8, 10, 16, 17, 18],
[ 0, 7, 9, 14, 15, 23]])

Categories