How to use np.cumsum to speed up mean average precision calculation? - python

I have,
scores = np.array([[0.9, 0.8, 0.6, 0.5, 0.4], [0.5, 0.4, 0.31, 0.21, 0.4 ]])
labels = np.array([[1, 0, 1, 1, 0], [0, 0, 0, 1, 1]])
I want to calculate at K map, which I wrote an algo as below,
k=3
mean_ap = 0
n = len(scores)
for i in range(n):
cum = ap = 0.0
idx = np.argsort(-scores[i])
used_label = labels[i][idx][:k]
m = sum(labels[i])
for j, label in enumerate(used_label):
cum += label
ap += cum * label / (j + 1)
mean_ap += ap / min(m, k)
val = mean_ap / n
It basically gives calculation formula like below:
(1 + 0 + 2 / 3) / 3 + ( 0 + 0 + 1 / 3) / 2
Any suggestion that I could use np.cumsum to speed up my algo?
I assume it has been optimized and I don't see any enhancement room here?
Thanks in advance.

hope this can help you (I try to avoid for loops):
k = 3
n = len(scores)
m = labels.sum(axis=1)
idx = np.argsort(-scores)
used_label = labels[:,idx][np.arange(0,n),np.arange(0,n),:k]
val = (np.cumsum(used_label, axis=1)*used_label /
np.arange(1,k+1) /
np.min([m,np.repeat(k,n)],axis=0).reshape(-1,1)).sum(axis=1).sum() / n

Related

How to iterate through a NumPy array while filtering cells values by index and performing math on it

I am trying to filter/loop and perform math within the same iteration but cant seem to find the right answer. I have a numpy array, that is size 6, 2, and consists of two values that I want to minus together, however I want the values filtered before the minus process commences.
So if the value is greater than in the other column, then the lowest value should be subtracted from the high value, and vice versa. Also this needs to happen in a loop which iterates through the array while performing the filtering and math.
This is my code example:
#minus price
print('minus price trying appending')
minus_p_orgp1 = np.append(dif_p_times1, fp, axis=0)
print(minus_p_orgp1)
for ii, vv in enumerate(minus_p_orgp1):
print('greater')
greater_1 = np.all(ii > 0, axis=0)
greater_0 = np.all(ii <= 0, axis=0)
if greater_1 < greater_0:
iit = greater_0 - greater_1
if greater_1 > greater_0:
iit = greater_1 - greater_0
print(iit, ii, vv)
ssss = np.zeros(minus_p_orgp1.size - 1)
for i in range(len(minus_p_orgp1) - 1):
if minus_p_orgp1[i] < minus_p_orgp1[i]:
ssss[i] = minus_p_orgp1[i + 1] - minus_p_orgp1[i]
elif minus_p_orgp1[i + 1] > minus_p_orgp1[i]:
ssss[i] = minus_p_orgp1[i] - minus_p_orgp1[i + 1]
print(ssss)
This is a print of the array where the upper vector is def_p_time1, and lower vector is fp:
minus price trying appending
[[79340.33057205 78379.24102508 72188.80527274 76557.26239563
72857.90423589 71137.7943199 ]
[43528.22 43705. 43931.07 44571.24
44330.43 44465.64 ]]
What can I do to achieve my goal?
I have also tried to do the process with just having the array being two separate vectors with size 6, 1. But that also seems very difficult, let me know what you think.
I have also just tried this; however it just prints out zeros when running the code:
trii = np.array([[0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1]])
print(trii)
print(minus_p_orgp1[~(trii >= 1)])
print('it works')
itt = minus_p_orgp1[~(trii >= 1)]
itt1 = minus_p_orgp1[~(trii >= 0)]
sssss = np.zeros(dif_p_times1.size - 1)
ssss = np.zeros(minus_p_orgp1.size - 1)
for i in range(len(dif_p_times1) - 1):
for ii in range(len(fp) - 1):
if itt < itt1:
sssss[i] = itt[i] + itt1[i + 1]
ssss[i, ii] = fp[ii + 1] - dif_p_times1[i]
elif itt > itt1:
sssss[i] = itt[i + 1] + itt1[i]
ssss[i, ii] = dif_p_times1[i] - fp[ii + 1]
print(sssss)
[[0 0 0 0 0 0]
[1 1 1 1 1 1]]
[63455.70703442 68744.47486851 77804.44752373 79686.34612013
69322.78250338 83255.08459329]
it does something
[0. 0. 0. 0. 0.]
Here is a new attempt, however it still doesn't work:
ssss = np.zeros(minus_p_orgp1.size - 1)
x = minus_p_orgp1[::2]
y = minus_p_orgp1[::-2]
z = ssss[::2]
for z, x, y in range(len(minus_p_orgp1) - 1):
if x[i + 1] < y[i]:
z[i] = y[i + 1] - x[i]
elif x[i + 1] > y[i]:
z[i] = x[i + 1] - y[i]
print(z)
Is there a way to iterate through multidimensional lists which has real values not created from the reshape/arrange functions and still being able to perform boolean filtering on individual cells, from where you perform a math operation?
I have looked at nditer, and numba, but all seems to do a version where they transpose the dimensions of a 2d array and reduce the values, which with real values only returned me 1 value, and not an array.
Looking at your first block
minus_p_orgp1 = np.append(dif_p_times1, fp, axis=0)
print(minus_p_orgp1)
looks like minus_p_orgp1 is a (2,6) array, just a row join of the 2 arrays.
for ii, vv in enumerate(minus_p_orgp1):
print('greater')
greater_1 = np.all(ii > 0, axis=0)
greater_0 = np.all(ii <= 0, axis=0)
if greater_1 < greater_0:
iit = greater_0 - greater_1
if greater_1 > greater_0:
iit = greater_1 - greater_0
print(iit, ii, vv)
vv is, iteratively, the 2 rows of minus_p_orgp1. ii is, iteratively, 0 and 1. So this np.all(ii>0...) business is just testing whether iiis 0 or. It sets iit in each loop, but doesn't save the value anywhere. At the end of the loop it has the last value, but so what?
In the following size is 12 (2*6), so ssss in np.zeros(11)
But the iteration is over 2-1, i.e. it just evaluates for i=0:
ssss = np.zeros(minus_p_orgp1.size - 1)
for i in range(len(minus_p_orgp1) - 1):
if minus_p_orgp1[i] < minus_p_orgp1[i]:
ssss[i] = minus_p_orgp1[i + 1] - minus_p_orgp1[i]
elif minus_p_orgp1[i + 1] > minus_p_orgp1[i]:
ssss[i] = minus_p_orgp1[i] - minus_p_orgp1[i + 1]
print(ssss)
minus_p_orgp1[i] < minus_p_orgp1[i] is an array of 6 False. I expect that to raise an ambiguity error when used in the if clause.
minus_p_orgp1[i + 1] > minus_p_orgp1[i] makes more sense, but still can't be used in elif. Are you taking the difference across columns or rows?
The initial word description is a bit vague, but it sure sounds like you just want the positive difference between two arrays:
In [68]: x = np.array([1,3,2,4]); y = np.array([2,1,3,3])
In [69]: x-y
Out[69]: array([-1, 2, -1, 1])
In [70]: y-x
Out[70]: array([ 1, -2, 1, -1])
In [71]: np.abs(x-y)
Out[71]: array([1, 2, 1, 1])
an iterative equivalent (which would work with lists just as well):
In [72]: z = np.zeros(4, int)
In [73]: for i in range(4):
...: if x[i]>y[i]:
...: z[i] = x[i]-y[i]
...: else:
...: z[i] = y[i]-x[i]
...:
In [74]: z
Out[74]: array([1, 2, 1, 1])
or simply:
In [75]: [abs(i-j) for i,j in zip(x,y)]
Out[75]: [1, 2, 1, 1]
As per hpaulj's answer, this worked:
[abs(i-j) for i,j in zip(x,y)]

Gradient Descent Problem with smallest/simplest data on planet Earth

I want to implement the Gradient Descent Algorithm on this simple data but I am facing problems. It would be great if someone points me in the right direction. The answer should be 7 for x=6 but I'm not getting there.
X = [1, 2, 3, 4]
Y = [2, 3, 4, 5]
m_gradient = 0
b_gradient = 0
m, b = 0, 0
learning_rate = 0.1
N = len(Y)
for p in range(100):
for idx in range(len(Y)):
x = X[idx]
y = Y[idx]
hyp = (m * x) + b
m_gradient += -(2/N) * x * (y - hyp)
b_gradient += -(2/N) * (y - hyp)
m = m - (m_gradient * learning_rate)
b = b - (b_gradient * learning_rate)
print(b+m*6)
You are calculating the gradients incorrectly for all but the first iteration. You need to set both gradients to 0 in the outer for loop.
X = [1, 2, 3, 4]
Y = [2, 3, 4, 5]
m_gradient = 0
b_gradient = 0
m, b = 0, 0
learning_rate = 0.1
N = len(Y)
for p in range(100):
for idx in range(len(Y)):
x = X[idx]
y = Y[idx]
hyp = (m * x) + b
m_gradient += -(2/N) * x * (y - hyp)
b_gradient += -(2/N) * (y - hyp)
m = m - (m_gradient * learning_rate)
b = b - (b_gradient * learning_rate)
m_gradient, b_gradient = 0, 0
print(b+m*6)
For example consider b_gradient. Before first iteration b_gradient = 0 and is calculated as 0 + -0.5*(y0 - (m*x0 +b)) + -0.5(y1 - (m*x1 +b)) + -0.5(y2 - (m*x2 + b)) + -0.5(y3 - (m*x3 + b)), where x0 and y0 are X[0] and Y[0], respectively.
After the first iteration the value of b_gradient is -7, this is correct.
The problem starts with the second iteration. Instead of calculating b_gradient as the sum of (-0.5(yn - (m*xn + b)) for 0 <= n <= 3, you calculated it as the previous value of b_gradient plus the sum of (-0.5(yn - (m*xn + b)) for 0 <= n <= 3.
After the second iteration the value of b_gradient is -2.6, this is incorrect. The correct value is 4.4, note that 4.4 - 7 = -2.6.
It seems you want coefficients for Linear Regression using Gradient Descent. Some more data points, a slightly smaller learning rate, training for more epochs by looking at the loss will help reduce error.
As input size gets larger the code below will give slightly off results. The above mentioned methods such as training for more epoch will give correct results for larger range of numbers.
Vectorized Version
import numpy as np
X = np.array([1, 2, 3, 4, 5, 6, 7])
Y = np.array([2, 3, 4, 5, 6, 7, 8])
w_gradient = 0
b_gradient = 0
w, b = 0.5, 0.5
learning_rate = .01
loss = 0
EPOCHS = 2000
N = len(Y)
for i in range(EPOCHS):
# Predict
Y_pred = (w * X) + b
# Loss
loss = np.square(Y_pred - Y).sum() / (2.0 * N)
if i % 100 == 0:
print(loss)
# Backprop
grad_y_pred = (2 / N) * (Y_pred - Y)
w_gradient = (grad_y_pred * X).sum()
b_gradient = (grad_y_pred).sum()
# Optimize
w -= (w_gradient * learning_rate)
b -= (b_gradient * learning_rate)
print("\n\n")
print("LEARNED:")
print(w, b)
print("\n")
print("TEST:")
print(np.round(b + w * (-2)))
print(np.round(b + w * 0))
print(np.round(b + w * 1))
print(np.round(b + w * 6))
print(np.round(b + w * 3000))
# Expected: 30001, but gives 30002.
# Training for 3000 epochs will give expected result.
# For simple demo with less training data and small input range 2000 in enough
print(np.round(b + w * 30000))
Output
LEARNED:
1.0000349103409163 0.9998271260509328
TEST:
-1.0
1.0
2.0
7.0
3001.0
30002.0
Loop Version
import numpy as np
X = np.array([1, 2, 3, 4, 5, 6, 7])
Y = np.array([2, 3, 4, 5, 6, 7, 8])
w_gradient = 0
b_gradient = 0
w, b = 0.5, 0.5
learning_rate = .01
loss = 0
EPOCHS = 2000
N = len(Y)
for i in range(EPOCHS):
w_gradient = 0
b_gradient = 0
loss = 0
for j in range(N):
# Predict
Y_pred = (w * X[j]) + b
# Loss
loss += np.square(Y_pred - Y[j]) / (2.0 * N)
# Backprop
grad_y_pred = (2 / N) * (Y_pred - Y[j])
w_gradient += (grad_y_pred * X[j])
b_gradient += (grad_y_pred)
# Optimize
w -= (w_gradient * learning_rate)
b -= (b_gradient * learning_rate)
# Print loss
if i % 100 == 0:
print(loss)
print("\n\n")
print("LEARNED:")
print(w, b)
print("\n")
print("TEST:")
print(np.round(b + w * (-2)))
print(np.round(b + w * 0))
print(np.round(b + w * 1))
print(np.round(b + w * 6))
print(np.round(b + w * 3000))
# Expected: 30001, but gives 30002.
# Training for 3000 epochs will give expected result.
# For simple demo with less training data and small input range 2000 in enough
print(np.round(b + w * 30000))
Output
LEARNED:
1.0000349103409163 0.9998271260509328
TEST:
-1.0
1.0
2.0
7.0
3001.0
30002.0

How can I create or change polynomial coefficients

I have this code and I want to edit it to do something else:
def pol(poly, n, x):
result = poly[0]
#Using Horner's method
for i in range(1, n):
result = result * x + poly[i]
return result
#Let us evaluate value of
#ax^3 - bx^2 - x - 10 for x = 1
poly = [5, 9, -1, -10]
x = 1
n = len(poly)
print("Value of polynomial is: ", pol(poly, n, x))
I wonder how can I can change the coefficients of the polynomial. And this code just calculates:
x^3 and x^2
How can I make this code calculate for example this polynomial:
p(x) = 5x^10 + 9x - 7x - 10
or any polynomial in Python?
Your code should work, you just need to present the correct input. For
p(x) = 5x^10 + 9x - 7x - 10
you should provide:
poly2 = [5, 0, 0, 0, 0, 0, 0, 0, 0, 9-7, 10]
Alternate pol - implementation:
def pol(poly, x):
n = len(poly) # no need to provide it at call
rp = poly[::-1] # [-10, -1, 9, 5] so they correlate with range(n) as power
print("Poly:",poly, "for x =",x)
result = 0
for i in range(n):
val = rp[i] * x**i
print(rp[i],' * x^', i, ' = ', val, sep='') # debug output
result += val
return result
x = 2 # 1 is a bad test candidate - no differences for 1**10 vs 1**2
# 5x^3 + 9x^2 - x - 10 for x = 1
poly = [5, 9, -1, -10]
print("Value of polynomial is: ", pol(poly, x))
# p(x) = 5x^10 + 9x - 7x - 10
poly2 = [5, 0, 0, 0, 0, 0, 0, 0, 0, 9-7, 10]
print("Value of polynomial is: ", pol(poly2, x))
Output:
Poly: [5, 9, -1, -10] for x = 2
-10 * x^0 = -10
-1 * x^1 = -2
9 * x^2 = 36
5 * x^3 = 40
Value of polynomial is: 64
Poly: [5, 0, 0, 0, 0, 0, 0, 0, 0, 2, -10] for x = 2
-10 * x^0 = -10
2 * x^1 = 4
0 * x^2 = 0
0 * x^3 = 0
0 * x^4 = 0
0 * x^5 = 0
0 * x^6 = 0
0 * x^7 = 0
0 * x^8 = 0
0 * x^9 = 0
5 * x^10 = 5120
Value of polynomial is: 5114

IndexError: index 3 is out of bounds for axis 0 with size 3

So this is my code to plot a Bézier curve:
def bezier(a):
n = np.shape(a)[0]-1
# initialise arrays
B = np.zeros([101, 2])
terms = np.zeros([n+1, 2])
# create an array of values for t from 0 to 1 in 101 steps
t = np.linspace(0, 1, 101)
# loop through all t values
for i in range(0, 101):
#calculate terms inside sum in equation 13
for j in range(0, n + 1):
# YOUR CODE HERE
terms[j,:] = ((1 - t[i]) ** 3 * a[0,:] \
+ 3 * t[i] * (1-t[i]) ** 2 * a[1,:] \
+ 3 * t[i] ** 2 * (1-t[i]) * a[2,:]
+ t[i] ** 3 * a[3,:])
#sum terms to find Bezier curve
B[i, :] = sum(terms, 0)
# plot Bezier
pl.plot(B[:, 0], B[:, 1])
# plot control points
pl.plot(a[:, 0], a[:, 1],'ko')
# plot control polygon
pl.plot(a[:, 0], a[:, 1],'k')
return B
And when I try to pass it some control points:
a = np.array([[0, 0], [0.5, 1], [1, 0]])
B = bezier(a)
I receive this IndexError:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-16-fce87c9f1c04> in <module>()
1 a = np.array([[0, 0], [0.5, 1], [1, 0]])
----> 2 B = bezier(a)
<ipython-input-13-3bb3bb02cc87> in bezier(a)
11 for j in range(0, n + 1):
12 # YOUR CODE HERE
---> 13 terms[j,:] = ((1 - t[i]) ** 3 * a[0,:] + 3 * t[i] * (1-t[i]) ** 2 * a[1,:] + 3 * t[i] ** 2 * (1-t[i]) * a[2,:] + t[i] ** 3 * a[3,:])
14 #sum terms to find Bezier curve
15 B[i, :] = sum(terms, 0)
IndexError: index 3 is out of bounds for axis 0 with size 3
So I figure it is trying to access something outside the container but I can't see where it is I need to change the code.
You array a = np.array([[0, 0], [0.5, 1], [1, 0]] does not have element with index 3. Add another point to the array. You need four points for a Bezier curve, anyway.

Fast way to round array values based on condition

I have an array like this:
a = np.array([
[0.02, 1.01, 4.01, 3.00, 5.12],
[2.11, 1.50, 3.98, 0.52, 5.01]])
and a "condition" array:
c = np.array([0, 1, 4, 5])
I want to round a[i][j]=c[k] if c[k] - const < a[i][j] < c[k] + const, otherwise a[i][j] = 0
For example, if const = 0.05. The result could be:
a_result = [[0 1 4 0 0]
[0 0 4 0 5]]
The navie way is to use 3 for loop to check for each a[i][j] and c[k]. However, it's very slow when a is big. Do we have a fast "python way" to do this?
For loop (slow) solution:
a_result = np.full(a.shape, 0)
const = 0.05
mh, mw = a.shape
for i in range(mh-1):
for j in range(mw-1):
for k in range(1, len(c)):
if a[i][j] > (c[k] - const) and a[i][j] < (c[k] + const):
a_result[i][j] = c[k]
Approach #1
One vectorized approach would be with broadcasting -
c[(np.abs(a - c[:,None,None]) < const).argmax(0)]
Sample run -
In [312]: a
Out[312]:
array([[ 0.02, 1.01, 4.01, 3. , 5.12],
[ 2.11, 1.5 , 3.98, 0.52, 5.01]])
In [313]: c
Out[313]: array([0, 1, 4, 5])
In [314]: c[(np.abs(a - c[:,None,None]) < const).argmax(0)]
Out[314]:
array([[0, 1, 4, 0, 0],
[0, 0, 4, 0, 5]])
Approach #2
Another one that would be closer to what we had in the question, but vectorized, like so -
mask = ((c[:,None,None] - const) < a) & (a < (c[:,None,None] + const))
out = c[mask.argmax(0)]
Approach #3
Here's another with memory efficiency in mind, based on this post -
idx = np.searchsorted(c, a, side="left").clip(max=c.size-1)
mask = (idx > 0) & \
( (idx == len(xx)) | (np.fabs(yy - xx[idx-1]) < np.fabs(yy - xx[idx])) )
idx0 = idx-mask
out = xx[idx0]
out[np.abs(c[idx0] - a) >= const] = 0

Categories