I wish to subtract 2 gray human faces from each other to see the difference, but I encounter a problem that subtracting e.g. [4] - [6] gives [254] instead of [-2] (or difference: [2]).
print(type(face)) #<type 'numpy.ndarray'>
print(face.shape) #(270, 270)
print(type(nface)) #<type 'numpy.ndarray'>
print(nface.shape) #(270, 270)
#This is what I want to do:
sface = face - self.nface #or
sface = np.subtract(face, self.nface)
Both don't give negative numbers but instead subtract the rest after 0 from 255.
Output example of sface:
[[ 8 255 8 ..., 0 252 3]
[ 24 18 14 ..., 255 254 254]
[ 12 12 12 ..., 0 2 254]
...,
[245 245 251 ..., 160 163 176]
[249 249 252 ..., 157 163 172]
[253 251 247 ..., 155 159 173]]
My question:
How do I get sface to be an numpy.ndarray (270,270) with either negative values after subtracting or the difference between each point in face and nface? (So not numpy.setdiff1d, because this returns only 1 dimension instead of 270x270)
Working
From the answer of #ajcr I did the following (abs() for showing subtracted face):
face_16 = face.astype(np.int16)
nface_16 = nface.astype(np.int16)
sface_16 = np.subtract(face_16, nface_16)
sface_16 = abs(sface_16)
sface = sface_16.astype(np.int8)
It sounds like the dtype of the array is uint8. All the numbers will be interpreted as integers in the range 0-255. Here, -2 is equal to 256 - 2, hence the subtraction results in 254.
You need to recast the arrays to a dtype which supports negative integers, e.g. int16 like this ...
face = face.astype(np.int16)
...and then subtract.
This is a problem with your datatype in the numpy array. You have a uint8 inside it, which seems to wrap around
Have a look at nfac.dtype whichwill show it to you. You have to convert it prior to your calculation operation. Use numpy.ndarray.astype to convert it or have a look at In-place type conversion of a NumPy array
Related
I have a python array that i got using
array = np.arange(2,201,2).reshape(25,4)
which gave me this:
[[ 2 4 6 8]
[ 18 20 22 24]
[ 34 36 38 40]
[ 50 52 54 56]
[ 66 68 70 72]
[ 82 84 86 88]
[ 98 100 102 104]
[114 116 118 120]
[130 132 134 136]
[146 148 150 152]
[162 164 166 168]
[178 180 182 184]
[194 196 198 200]]
but now i'm instructed to select only the values below 50 from "array", add 5 to these values, and then multiply by 2. The other values should remain unchanged and everything should be saved as "array". This is a school assignment so I don't have the output but basically the output should be the array in the same 25x4 shape and the first ~3 rows will be changed (since those are the ones under 50) and the other rows/values will be the same (since they're over 50). I've tried the following code:
for i in array:
if array < 50:
print((i+5)*2)
else:
print(i)
and I'm getting an error that says -
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
any help would be greatly appreciated since I can't find any other articles with similar questions
There are 2 ways to address this question. A Python one and a numpy one (numpy is not Python...).
Python way:
You have a sequence of sequence containers. You can use a double iteration to test the values one at a time and replace the ones that have to be:
for row in array: # iterate over the rows
for i, val in enumerate(row): # then the values in the row
if val <=50: # test them
row[i] = (val + 5) * 2 # and replace
This works as soon as the outer iteration gives you a direct access to the row container. This is true for both Python containers (lists) and numpy arrays but may not be guaranteed for any type of containers. The super safe way would be to keep the indexes and directly modify array:
for i in range(len(array)):
for j in range(len(array[i])):
if array[i][j]< 50:
array[i][j] = (array[i][j] + 2) * 5
Numpy way:
The power of numpy is to provide high speed iterations on its arrays. In numpy wordings it is called vectorization. You should first extract the relevant indexes and then change the values in one single vectorized operation:
ix = np.where(array < 50)
array[ix] = (array[ix] + 5) * 2
For large arrays, this second way should be at least one magnitude order faster than the first one.
For your question, the correct way is the one that matches your current lesson, either Python or numpy...
import numpy as np
array = np.arange(2,201,2).reshape(25,4)
values = [ (element+5)*2 if element < 50 else element for innerList in array for element in innerList ]
print(values)
import numpy as np
a = np.uint8([255])
b = np.uint8([255])
print(a+b)
Result: array([254], dtype=uint8)
uint8 can save the range 0..255 .
So example you write np.uint8([256]) -> array([0], dtype=uint8)
In your case a+b=np.uint8([510])=np.uint8([510-256])=np.uint8([254])
uint8 is an unsigned integer represented using 8 bits. The range of numbers you can represent using 8 bits is [0, 255].
255 in 8-bit binary is 1111 1111.
When you add 1111 1111 to 1111 1111, you get 1 1111 1110 (= 510). But since you have only 8 bits to represent (since you're using uint8), the leftmost 1 in the result cannot be stored, rendering the result as 1111 1110, which is 254.
I am currently working with a numpy array containing pixel color information of shape (width, height, 3).
I would like to be able to replace colors in the original array, with colors described in a new array and to do this WITHOUT loops if possible.
I have tried indexing it with fancy index's but was not quite sure how to index properly.
Optimally the function would like something like this:
>>>import numpy as np
>>>imgdat = np.random.randint(0, 255, size=(2, 2, 3))
>>>imgdat
[[[138 149 41]
[100 186 136]]
[[181 202 169]
[205 247 195]]]
>>>pixels = np.random.randint(0, imgdat.shape[0], size=(2, 2))
>>>pixels
[[1,0]
[0,1]]
>>>colors = np.random.randint(0, 255, size=(2, 3))
>>>colors
[[ 16 229 138]
[ 86 76 209]]
######apply the function#######
>>>filledimgdat = fillPixels(imgdat, pixels, colors)
>>>filledimgdat
[[[ 138 149 41]
[ 86 76 209]]
[[ 16 229 138]
[205 247 195]]]
EDIT:
Because my originally description was a little unclear, what I am trying to do is replace specific colors in imgdat at specific index's. If anybody can think of a better way to format the datatypes or handle the info to simplify an operation like this, that would also be welcome.
The situation is as follows:
I have a 2D numpy array. Its shape is (1002, 1004). Each element contains a value between 0 and Inf. What I now want to do is determine the first 1000 maximum values and store the corresponding indices in to a list named x and a list named y. This is because I want to plot the maximum values and the indices actually correspond to real time x and y position of the value.
What I have so far is:
x = numpy.zeros(500)
y = numpy.zeros(500)
for idx in range(500):
x[idx] = numpy.unravel_index(full.argmax(), full.shape)[0]
y[idx] = numpy.unravel_index(full.argmax(), full.shape)[1]
full[full == full.max()] = 0.
print os.times()
Here full is my 2D numpy array. As can be seen from the for loop, I only determine the first 500 maximum values at the moment. This however already takes about 5 s. For the first 1000 maximum values, the user time should actually be around 0.5 s. I've noticed that a very time consuming part is setting the previous maximum value to 0 each time. How can I speed things up?
Thank you so much!
If you have numpy 1.8, you can use the argpartition function or method.
Here's a script that calculates x and y:
import numpy as np
# Create an array to work with.
np.random.seed(123)
full = np.random.randint(1, 99, size=(8, 8))
# Get the indices for the largest `num_largest` values.
num_largest = 8
indices = (-full).argpartition(num_largest, axis=None)[:num_largest]
# OR, if you want to avoid the temporary array created by `-full`:
# indices = full.argpartition(full.size - num_largest, axis=None)[-num_largest:]
x, y = np.unravel_index(indices, full.shape)
print("full:")
print(full)
print("x =", x)
print("y =", y)
print("Largest values:", full[x, y])
print("Compare to: ", np.sort(full, axis=None)[-num_largest:])
Output:
full:
[[67 93 18 84 58 87 98 97]
[48 74 33 47 97 26 84 79]
[37 97 81 69 50 56 68 3]
[85 40 67 85 48 62 49 8]
[93 53 98 86 95 28 35 98]
[77 41 4 70 65 76 35 59]
[11 23 78 19 16 28 31 53]
[71 27 81 7 15 76 55 72]]
x = [0 2 4 4 0 1 4 0]
y = [6 1 7 2 7 4 4 1]
Largest values: [98 97 98 98 97 97 95 93]
Compare to: [93 95 97 97 97 98 98 98]
You could loop through the array as #Inspired suggests, but looping through NumPy arrays item-by-item tends to lead to slower-performing code than code which uses NumPy functions since the NumPy functions are written in C/Fortran, while the item-by-item loop tends to use Python functions.
So, although sorting is O(n log n), it may be quicker than a Python-based one-pass O(n) solution. Below np.unique performs the sort:
import numpy as np
def nlargest_indices(arr, n):
uniques = np.unique(arr)
threshold = uniques[-n]
return np.where(arr >= threshold)
full = np.random.random((1002,1004))
x, y = nlargest_indices(full, 10)
print(full[x, y])
print(x)
# [ 2 7 217 267 299 683 775 825 853]
print(y)
# [645 621 132 242 556 439 621 884 367]
Here is a timeit benchmark comparing nlargest_indices (above) to
def nlargest_indices_orig(full, n):
full = full.copy()
x = np.zeros(n)
y = np.zeros(n)
for idx in range(n):
x[idx] = np.unravel_index(full.argmax(), full.shape)[0]
y[idx] = np.unravel_index(full.argmax(), full.shape)[1]
full[full == full.max()] = 0.
return x, y
In [97]: %timeit nlargest_indices_orig(full, 500)
1 loops, best of 3: 5 s per loop
In [98]: %timeit nlargest_indices(full, 500)
10 loops, best of 3: 133 ms per loop
For timeit purposes I needed to copy the array inside nlargest_indices_orig, lest full get mutated by the timing loop.
Benchmarking the copying operation:
def base(full, n):
full = full.copy()
In [102]: %timeit base(full, 500)
100 loops, best of 3: 4.11 ms per loop
shows this added about 4ms to the 5s benchmark for nlargest_indices_orig.
Warning: nlargest_indices and nlargest_indices_orig may return different results if arr contains repeated values.
nlargest_indices finds the n largest values in arr and then returns the x and y indices corresponding to the locations of those values.
nlargest_indices_orig finds the n largest values in arr and then returns one x and y index for each large value. If there is more than one x and y corresponding to the same large value, then some locations where large values occur may be missed.
They also return indices in a different order, but I suppose that does not matter for your purpose of plotting.
If you want to know the indices of the n max/min values in the 2d array, my solution (for largest is)
indx = divmod((-full).argpartition(num_largest,axis=None)[:3],full.shape[0])
This finds the indices of the largest values from the flattened array and then determines the index in the 2d array based on the remainder and mod.
Nevermind. Benchmarking shows the unravel method is twice as fast at least for num_largest = 3.
I'm afraid that the most time-consuming part is recalculating maximum. In fact, you have to calculate maximum of 1002*1004 numbers 500 times which gives you 500 million comparisons.
Probably you should write your own algorithm to find the solution in one pass: keep only 1000 greatest numbers (or their indices) somewhere while scanning your 2D array (without modifying the source array). I think that some sort of a binary heap (have a look at heapq) would suit for the storage.
Is there a prettier way to do this? Specifically, are these max values available through the numpy API? I haven't been able to find them in the API, although they are easily found here in the docs.
MAX_VALUES = {np.uint8: 255, np.uint16: 65535, np.uint32: 4294967295, \
np.uint64: 18446744073709551615}
try:
image = MAX_VALUES[image.dtype] - image
except KeyError:
raise ValueError, "Image must be array of unsigned integers."
Packages like PIL and cv2 provide convenient tools for inverting an image, but at this point in the code I have a numpy array -- more sophisticated analysis follows -- and I'd like to stick with numpy.
Try doing
image ^= MAX_VALUES[image.dtype]
By the way, you do not need to define MAX_VALUES yourself. NumPy has them built-in:
import numpy as np
h, w = 100, 100
image = np.arange(h*w).reshape((h,w)).astype(np.uint8)
max_val = np.iinfo(image.dtype).max
print(max_val)
# 255
image ^= max_val
print(image)
# [[255 254 253 ..., 158 157 156]
# [155 154 153 ..., 58 57 56]
# [ 55 54 53 ..., 214 213 212]
# ...,
# [ 27 26 25 ..., 186 185 184]
# [183 182 181 ..., 86 85 84]
# [ 83 82 81 ..., 242 241 240]]