Fast way to apply function to each row of a numpy array - python

Suppose I have some nearest neighbor classifier. For a new observation it computes the distance between the new observation and all observations in the "known" data set. It returns the class label of the observation, that has the smallest distance to the new observation.
import numpy as np
known_obs = np.random.randint(0, 10, 40).reshape(8, 5)
new_obs = np.random.randint(0, 10, 80).reshape(16, 5)
labels = np.random.randint(0, 2, 8).reshape(8, )
def my_dist(x1, known_obs, axis=0):
return (np.square(np.linalg.norm(x1 - known_obs, axis=axis)))
def nn_classifier(n, known_obs, labels, axis=1, distance=my_dist):
return labels[np.argmin(distance(n, known_obs, axis=axis))]
def classify_batch(new_obs, known_obs, labels, classifier=nn_classifier, distance=my_dist):
return [classifier(n, known_obs, labels, distance=distance) for n in new_obs]
print(classify_batch(new_obs, known_obs, labels, nn_classifier, my_dist))
For performance reasons I would like to avoid the for loop in the classify_batch function. Is there a way to use numpy operations to apply the nn_classifier function to each row of new_obs?
I already tried apply_along_axis but as often mentioned it is convenient but not fast.

The key to avoiding the loop is to express the action on the (16,8) array of 'distances'. The labels[] and argmin steps just cloud the issue.
If I set labels = np.arange(8), then this
arr = np.array([my_dist(n, known_obs, axis=1) for n in new_obs])
print(arr)
print(np.argmin(arr, axis=1))
produces the same thing. It still has a list comprehension, but we are closer to 'source'.
[[ 32. 115. 22. 116. 162. 86. 161. 117.]
[ 106. 31. 142. 164. 92. 106. 45. 103.]
[ 44. 135. 94. 18. 94. 50. 87. 135.]
[ 11. 92. 57. 67. 79. 43. 118. 106.]
[ 40. 67. 126. 98. 50. 74. 75. 175.]
[ 78. 61. 120. 148. 102. 128. 67. 191.]
[ 51. 48. 57. 133. 125. 35. 110. 14.]
[ 47. 28. 93. 91. 63. 49. 32. 88.]
[ 61. 86. 23. 141. 159. 85. 146. 22.]
[ 131. 70. 155. 149. 129. 127. 44. 138.]
[ 97. 138. 87. 117. 223. 77. 130. 122.]
[ 151. 78. 211. 161. 131. 115. 46. 164.]
[ 13. 50. 31. 69. 59. 43. 80. 40.]
[ 131. 108. 157. 161. 207. 85. 102. 146.]
[ 39. 106. 67. 23. 61. 67. 70. 88.]
[ 54. 51. 74. 68. 42. 86. 35. 65.]]
[2 1 3 0 0 1 7 1 7 6 5 6 0 5 3 6]
With
print((new_obs[:,None,:] - known_obs[None,:,:]).shape)
I get a (16,8,5) array. So can I apply the linalg.norm on the last axis?
This seems to do the trick
np.square(np.linalg.norm(diff, axis=-1))
So together:
diff = (new_obs[:,None,:] - known_obs[None,:,:])
dist = np.square(np.linalg.norm(diff, axis=-1))
idx = np.argmin(dist, axis=1)
print(idx)

Related

why I cannot reshape or resize my numpy array

I have the following output for a
[ 1. 3. 5. 7. 9. 11. 13. 15. 17. 19. 21. 23. 25. 27.
29. 31. 33. 35. 37. 39. 41. 43. 45. 47. 97. 99. 101. 103.
105. 107. 109. 111. 113. 115. 117. 119. 121. 123. 125. 127. 129. 131.
133. 135. 137. 139. 141. 143.]
I want to reshape it to the below
[[1. 3. 5. 7. 9. 11. 13. 15.]
[17. 19. 21. 23. 25. 27. 29. 31.]
[33. 35. 37. 39. 41. 43. 45. 47.]
[97. 99. 101. 103. 105. 107. 109. 111.]
[113. 115. 117. 119. 121. 123. 125. 127.]
[129. 131. 133. 135. 137. 139. 141. 143.]]
I tried to use a.resize(6, 8), but it gives me this error: "resize only works on single-segment arrays"
Also, when I am trying to use a.reshape(6, 8), it gives me the same array.
I don't understand what is the reason for that as I have tested another array and worked well.
try a.reshape((8, 6))
notice the double parentheses
a = np.array([1., 3., 5., 7., 9., 11., 13., 15., 17., 19., 21., 23., 25., 27.,
29., 31., 33., 35., 37., 39., 41., 43., 45., 47., 97., 99., 101., 103.,
105., 107., 109., 111., 113., 115., 117., 119., 121., 123., 125., 127., 129., 131.,
133., 135., 137., 139., 141., 143.])
print(a.reshape((8, 6)))
out:
[[ 1. 3. 5. 7. 9. 11.]
[ 13. 15. 17. 19. 21. 23.]
[ 25. 27. 29. 31. 33. 35.]
[ 37. 39. 41. 43. 45. 47.]
[ 97. 99. 101. 103. 105. 107.]
[109. 111. 113. 115. 117. 119.]
[121. 123. 125. 127. 129. 131.]
[133. 135. 137. 139. 141. 143.]]
Process finished with exit code 0
do notice that for the output you requested, the dimensions should be
a.reshape((6,8))
out:
[[ 1. 3. 5. 7. 9. 11. 13. 15.]
[ 17. 19. 21. 23. 25. 27. 29. 31.]
[ 33. 35. 37. 39. 41. 43. 45. 47.]
[ 97. 99. 101. 103. 105. 107. 109. 111.]
[113. 115. 117. 119. 121. 123. 125. 127.]
[129. 131. 133. 135. 137. 139. 141. 143.]]
Process finished with exit code 0
you can read about NumPy's reshape here: reshape documentation
Try
b = a.reshape((8,6))
and keep in mind 2 things, for future use of similar methods:
the reshape method takes a tuple as input, in that case (8,6) , calling b = a.reshape(8,6) gives 2 int arguments to the method instead of the tuple it expects. always pay attention to the expected values. you can investigate that by just hovering over a function in pycharm and most editors.
in numpy, many methods do not manipulate the given object but rather return a new value for you to use.
it is healthy to always check for that in documentation, in order to avoid catastrophic heartbreaks, trust me.

Reading an Array File in python

I have this file which has an array of data written to it:
[[[ 32. 28. 28. ... 24. 24. 24.]
[ 30. 29. 29. ... 24. 24. 24.]
[ 29. 29. 28. ... 24. 24. 24.]
...
[137. 138. 129. ... 34. 34. 34.]
[140. 139. 128. ... 31. 34. 34.]
[136. 135. 122. ... 30. 30. 33.]]
[[ 40. 40. 40. ... 33. 33. 33.]
[ 38. 38. 37. ... 33. 33. 33.]
[ 37. 37. 37. ... 33. 33. 33.]
...
[140. 137. 132. ... 41. 43. 42.]
[139. 136. 129. ... 42. 43. 43.]
[140. 139. 133. ... 40. 42. 43.]]
[[ 10. 8. 7. ... 4. 4. 4.]
[ 8. 7. 7. ... 4. 4. 4.]
[ 7. 6. 6. ... 4. 4. 4.]
...
[101. 103. 94. ... 12. 13. 13.]
[105. 104. 92. ... 12. 13. 13.]
[ 99. 99. 99. ... 9. 10. 11.]]]
I do not know how to read from this file and use it within my code. Any help would be great! I have this within my code so far:
# Read and pre-process input images
n, c, h, w = net.inputs[input_blob].shape
images = np.ndarray(shape=(n, c, h, w))
for i in range(n):
image = cv2.imread(args.input[i])
if image.shape[:-1] != (h, w):
log.warning("Image {} is resized from {} to {}".format(args.input[i], image.shape[:-1], (h, w)))
image = cv2.resize(image, (w, h))
# Swapping Red and Blue channels
#image[:, :, [0, 2]] = image[:, :, [2, 0]]
# Change data layout from HWC to CHW
image = image.transpose((2, 0, 1))
images[i] = image
eoim = image
eoim16 = eoim.astype(np.float16)
val = []
preprocessed_image_path = 'C:/Users/Owner/Desktop/Ubotica/IOD/cloud_detect/'
formated_image_file = "output_patch_fp"
f = open(preprocessed_image_path + "/" + formated_image_file + ".txt", 'r')
val = f
print(f)
print(val)
# divide by 255 to get value in range 0->1 if necessary (depends on input pixel format)
if(eoim16.max()>1.0):
eoim16 = np.divide(eoim16,255)
print(eoim16)
#f.close()
#print(val)
#val = np.reshape(val, (3,512,512))
eoim16 = np.ndarray(shape=(c, h, w))
#res = val
# calling the instance method using the object cloudDetector
res = cloudDetector.infer(eoim16)
res = res[out_blob]
But when I try to print out val and f (just to see if the data matches and is actually being read within my code nothing appears. Is there any way to solve this so that my array reads into val and I can use the data within my code? Much appreciated!
Try using the eval function. It takes strings and interprets them as Python code.
a = eval(fileData)
print(a)

How to set bounds when minimizing using scipy

I have some data in a numpy array.
I would like to scale the data using a linear function according to the following rules:
The mean is as close to 65 as possible
The smallest value is at least 50
For my first attempt I made a scoring function:
import numpy as np
from scipy.optimize import minimize
def score(x):
return abs(np.mean(x[0]*data+x[1]) - 65) + abs(x[0]*np.min(data)+x[1] - 50)
I have added on abs(x[0]*np.min(data)+x[1] - 50) as a vain attempt to get it to satisfy rule 2.
I then tried:
x0 = [0.85,0]
res = minimize(score,x0)
np.set_printoptions(suppress=True)
print res
This gives:
fun: 4.8516444911893615
hess_inv: array([[ 0.0047, -0.1532],
[-0.1532, 5.2375]])
jac: array([-50.9628, -2. ])
message: 'Desired error not necessarily achieved due to precision loss.'
nfev: 580
nit: 2
njev: 142
status: 2
success: False
x: array([0.7408, 1.4407])
In other words the optimization failed.
I would also like to set bounds for the coefficients, e.g. bounds = [(0.7,1.3),(-5,5)].
My question is, what is the correct way to run the optimization with the boundary condition that the scaled smallest value is at least 50? Also, how can I make it so that the optimization runs without failure?
Consider the following:
import numpy as np
from scipy.optimize import minimize
data = np.array([ 59. , 59.5, 61. , 61.5, 62.5, 63. , 63. , 65.5, 66.5,
67. , 68. , 69. , 69.5, 70.5, 70.5, 70.5, 71. , 72. ,
72. , 73.5, 73.5, 74. , 75. , 75.5, 78. , 79. , 79. ,
79. , 79.5, 80.5, 80.5, 80.5, 80.5, 80.5, 82.5, 82.5,
82.5, 83. , 83. , 83. , 83. , 83. , 83.5, 83.5, 84. ,
84.5, 84.5, 84.5, 86. , 86. , 86. , 86.5, 86.5, 87.5,
88. , 88. , 88.5, 89. , 90. , 90.5, 90.5, 90.5, 91. ,
91.5, 91.5, 92. , 92. , 93. , 93. , 93. , 93.5, 93.5,
94. , 94. , 94. , 94. , 94. , 94. , 94.5, 94.5, 94.5,
94.5, 95.5, 95.5, 95.5, 95.5, 95.5, 95.5, 96. , 96. ,
96. , 96.5, 96.5, 96.5, 98. , 98. , 98. , 98. , 98. ,
98. , 98. , 98. , 98.5, 98.5, 98.5, 98.5, 98.5, 100. ,
100. , 100. , 100. ])
def scale(data, coeffs):
m,b = coeffs
return (m * data) + b
def score(coeffs):
scaled = scale(data, coeffs)
# Penalty components
p_1 = abs(np.mean(scaled) - 65)
p_2 = max(0, (50 - np.min(scaled)))
return p_1 + p_2
res = minimize(score, (0.85, 0.0), method = 'Powell')
#np.set_printoptions(suppress=True)
print(res)
post = scale(data, res.x)
print(np.mean(post))
print(np.min(post))
print(score(res.x))
Outputs:
direc: array([[ -3.05475495e-02, 2.62047576e+00],
[ 7.54828106e-07, -6.47892698e-05]])
fun: 1.4210854715202004e-14
message: 'Optimization terminated successfully.'
nfev: 360
nit: 8
status: 0
success: True
x: array([ 0.55914442, 17.02691959])
print(np.mean(post)) # 65.0
print(np.min(post)) # 50.0164406291
print(score(res.x)) # 1.42108547152e-14
A few things:
I added a scale helper function to clean up the code a bit, since I use it in the score function as well as at the end to show the scaled data.
The score function was fixed and broken out into two separate penalties (one for each requirement) for clarity. It computes the scaled vector once (and calls it scaled), then computes the penalty components.
Note: This score function has an odd non-smooth area around min(data) = 50 because of the max call. This may cause issues with some optimization methods.
I used the Powell algorithm because I had used it before and it worked in a similar problem with using a min/max operator. Wikipedia says:
The method is useful for calculating the local minimum of a continuous but complex function, especially one without an underlying mathematical definition, because it is not necessary to take derivatives
Someone more familiar with the optimization methods may be able to suggest a better alternative.
(Edit) Lastly, with respect to your question about boundary conditions. Usually, when we talk about boundary conditions we're talking about the boundary of the independent variable, the vector we're optimizing (here, elements of coeffs or x) -- for example, "x[0] must be less than 0", or "x[1] must be between 0 and 1" -- not what you seem to be looking for.
Sorry if I'm understanding you wrong, but just scaling the data according to those 2 rules is straight forward linear algebra:
e = np.mean(data)
m = e - np.min(data)
data * (65-50)/m + (65 - e*(65-50)/m)
# i.e. (data-e) * (65-50)/m + 65
This has exactly mean 65 and minimum 50.

"m x n" dimensional gradient-style array in Python

I checked out
gradient descent using python and numpy
but it didn't solve my problem.
I'm trying to get familiar with image-processing and I want to generate a few test arrays to mess around with in Python.
Is there a method (like np.arange) to create a m x n array where the inner entries form some type of gradient?
I did an example of a naive method for generating the desired output.
Excuse my generality of the term gradient, I'm using it in it's simple meaning as smooth transition in color.
#!/usr/bin/python
import numpy as np
import matplotlib.pyplot as plt
#Set up parameters
m = 15
n = 10
A_placeholder = np.zeros((m,n))
V_m = np.arange(0,m).astype(np.float32)
V_n = np.arange(0,n).astype(np.float32)
#Iterate through combinations
for i in range(m):
m_i = V_m[i]
for j in range(n):
n_j = V_n[j]
A_placeholder[i,j] = m_i * n_j #Some combination
#Relabel
A_gradient = A_placeholder
A_placeholder = None
#Print data
print A_gradient
#[[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
[ 0. 2. 4. 6. 8. 10. 12. 14. 16. 18.]
[ 0. 3. 6. 9. 12. 15. 18. 21. 24. 27.]
[ 0. 4. 8. 12. 16. 20. 24. 28. 32. 36.]
[ 0. 5. 10. 15. 20. 25. 30. 35. 40. 45.]
[ 0. 6. 12. 18. 24. 30. 36. 42. 48. 54.]
[ 0. 7. 14. 21. 28. 35. 42. 49. 56. 63.]
[ 0. 8. 16. 24. 32. 40. 48. 56. 64. 72.]
[ 0. 9. 18. 27. 36. 45. 54. 63. 72. 81.]
[ 0. 10. 20. 30. 40. 50. 60. 70. 80. 90.]
[ 0. 11. 22. 33. 44. 55. 66. 77. 88. 99.]
[ 0. 12. 24. 36. 48. 60. 72. 84. 96. 108.]
[ 0. 13. 26. 39. 52. 65. 78. 91. 104. 117.]
[ 0. 14. 28. 42. 56. 70. 84. 98. 112. 126.]]
#Show Image
plt.imshow(A_gradient)
plt.show()
I've tried np.gradient but it didn't give me the desired output.
#print np.gradient(np.array([V_m,V_n]))
#Traceback (most recent call last):
# File "Untitled.py", line 19, in <module>
# print np.gradient(np.array([V_m,V_n]))
# File "/Users/Mu/anaconda/lib/python2.7/site-packages/numpy/lib/function_base.py", line 1458, in gradient
# out[slice1] = (y[slice2] - y[slice3])
#ValueError: operands could not be broadcast together with shapes (10,) (15,)
A_placeholder[i,j] = m_i * n_j
Any operation like that can be expressed in numpy using broadcasting
A = np.arange(m)[:, None] * np.arange(n)[None, :]

Getting all points of a given connected component rapidly

Scikit-Image has quite a few methods available for blob detection:
Laplacian of Gaussian (LoG)
Difference of Gaussian (DoG)
Determinant of Hessian (DoH)
All three return an array that contains a single point within the bounds of the found components:
>>> from skimage import data, feature
>>> img = data.coins()
>>> feature.blob_doh(img)
array([[ 121. , 271. , 30. ],
[ 123. , 44. , 23.55555556],
[ 123. , 205. , 20.33333333],
[ 124. , 336. , 20.33333333],
[ 126. , 101. , 20.33333333],
[ 126. , 153. , 20.33333333],
[ 156. , 302. , 30. ],
[ 185. , 348. , 30. ],
[ 192. , 212. , 23.55555556],
[ 193. , 275. , 23.55555556],
[ 195. , 100. , 23.55555556],
[ 197. , 44. , 20.33333333],
[ 197. , 153. , 20.33333333],
[ 260. , 173. , 30. ],
[ 262. , 243. , 23.55555556],
[ 265. , 113. , 23.55555556],
[ 270. , 363. , 30. ]])
I'd like to use that information to produce lists that contains the coordinates of all the points in a given component.
I could just iterate through the whole image myself starting with the seeds and just collect all the points in a dict with the key being the point provide by blob detection, but I imagine it would rather slow unless I'm using cython(more than willing to be wrong about this, as I'm fairly new to python). More truthfully, I simply think there is probably a better way then just doing it myself.

Categories