Say I have two lists of data as follows:
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = [1, 2, 3, 4, 5, 6, 8, 10, 12, 14]
That is, it's pretty clear that merely fitting a line to this data doesn't work, but instead the slope changed at a point in the data. (Obviously, one can pinpoint from this data set pretty easily where that change is, but it's not as clear in the set I'm working with so let's ignore that.) Something with the derivative, I'm guessing, but the point here is I want to treat this as a free parameter where I say "it's this point, +/- this uncertainty, and here is the linear slope before and after this point."
Note, I can do this with an array if it's easier. Thanks!
Here is a plot of your data:
You need to find two slopes (== taking two derivatives). First, find the slope between every two points (using numpy):
import numpy as np
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10],dtype=np.float)
y = np.array([1, 2, 3, 4, 5, 6, 8, 10, 12, 14],dtype=np.float)
m = np.diff(y)/np.diff(x)
print (m)
# [ 1. 1. 1. 1. 1. 2. 2. 2. 2.]
Clearly, slope changes from 1 to 2 in the sixth interval (between sixth and seventh points). Then take the derivative of this array, which tells you when the slope changes:
print (np.diff(m))
[ 0. 0. 0. 0. 1. 0. 0. 0.]
To find the index of the non-zero value:
idx = np.nonzero(np.diff(m))[0]
print (idx)
# 4
Since we took one derivative with respect to x, and indices start from zero in Python, idx+2 tells you that the slope is different before and after the sixth point.
I'm not sure to understand very well what you want but you can see the evolution this way (derivative):
>>> y = [1, 2, 3, 4, 5, 6, 8, 10, 12, 14]
>>> dy=[y[i+1]-y[i] for i in range(len(y)-1)]
>>> dy
[1, 1, 1, 1, 1, 2, 2, 2, 2]
and then find the point where it change (second derivative):
>>> dpy=[dy[i+1]-dy[i] for i in range(len(dy)-1)]
>>> dpy
[0, 0, 0, 0, 1, 0, 0, 0]
if you want the index of this point :
>>> dpy.index(1)
4
that can give you the value of the last point before change of slope :
>>> change=dpy.index(1)
>>> y[change]
5
In your y = [1, 2, 3, 4, 5, 6, 8, 10, 12, 14] the change happen at the index [4] (list indexing start to 0) and the value of y at this point is 5.
You can calculate the slope as the difference between each pair of points (the first derivative). Then check where the slope changes (the second derivative). If it changes, append the index location to idx, the collection of points where the slope changes.
Note that the first point does not have a unique slope. The second pair of points will give you the slope, but you need the third pair before you can measure the change in slope.
idx = []
prior_slope = float(y[1] - y[0]) / (x[1] - x[0])
for n in range(2, len(x)): # Start from 3rd pair of points.
slope = float(y[n] - y[n - 1]) / (x[n] - x[n - 1])
if slope != prior_slope:
idx.append(n)
prior_slope = slope
>>> idx
[6]
Of course this could be done more efficiently in Pandas or Numpy, but I am just giving you a simple Python 2 solution.
A simple conditional list comprehension should also be pretty efficient, although it is more difficult to understand.
idx = [n for n in range(2, len(x))
if float(y[n] - y[n - 1]) / (x[n] - x[n - 1])
!= float(y[n - 1] - y[n - 2]) / (x[n - 1] - x[n - 2])]
Knee point might be a potential solution.
from kneed import KneeLocator
import numpy as np
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
y = np.array([1, 2, 3, 4, 5, 6, 8, 10, 12, 14])
kn = KneeLocator(x, y, curve='convex', direction='increasing')
# You can use array y to automatically determine 'convex' and 'increasing' if y is well-behaved
idx = (np.abs(x - kn.knee)).argmin()
>>> print(x[idx], y[idx])
6 6
Related
Say I have multiple different vectors of the same length
Example:
1: [1, 2, 3, 4]
2: [5, 6, 7, 8]
3: [3, 8, 9, 10]
4: [6, 9, 12, 3]
And I want to figure out the optimal integer coefficients for these vectors such that the sum of the vectors is closest to a respective specified goal vector.
Goal Vector: [55,101,115,60]
Assuming the combination only involves adding arrays together (no subtraction), how would I go about doing this? Are there any Python libraries (numpy, scikit, etc.) that would help me do this? I suspect that it is a linear algebra solution.
Example Combination Answer: [3, 3, 3, 1, 2, 4, 1, 1, 1, 2, 3, 4]
where each of the values are one of those arrays. (This is just a random example)
You could write your problem as a system of linear-equations:
arr1[0] + b*arr2[0] + c*arr3[0] + d*arr4[0] = res[0]
a*arr1[1] + b*arr2[1] + c*arr3[1] + d*arr4[1] = res[1]
a*arr1[2] + b*arr2[2] + c*arr3[2] + d*arr4[2] = res[2]
a*arr1[3] + b*arr2[3] + c*arr3[3] + d*arr4[3] = res[3]
#For all positive a,b,c,d.
Which you could then solve, if there is an exact solution.
If there is no exact solution, there is a scipy method to calculate the non-negative least squares solution to a linear matrix equation called scipy.optimize.nnls.
from scipy import optimize
import numpy as np
arr1 = [1, 2, 3, 4]
arr2 = [5, 6, 7, 8]
arr3 = [3, 8, 9, 10]
arr4 = [6, 9, 12, 3]
res = [55,101,115,60]
a = np.array([
[arr1[0], arr2[0], arr3[0], arr4[0]],
[arr1[1], arr2[1], arr3[1], arr4[1]],
[arr1[2], arr2[2], arr3[2], arr4[2]],
[arr1[3], arr2[3], arr3[3], arr4[3]]
])
solution,_ = optimize.nnls(a,res)
print(solution)
print('Coefficients before Rounding', solution)
solution = solution.round()
print('Coefficients after Rounding', solution)
print('Resuls', [arr1[i]*solution[0] + arr2[i]*solution[1] + arr3[i]*solution[2] + arr4[i]*solution[3] for i in range(4)])
This would print
Coefficients before Rounding [0. 0.1915493 3.83943662 6.98826291]
Coefficients after Rounding [0. 0. 4. 7.]
Resuls [54.0, 95.0, 120.0, 61.0]
Pretty close, isn't it?
It could indeed happen that this is not the perfect solution. But as discussed in this thread "integer problems are not even simple to solve" (#seberg)
Is there a more efficient way in determining the averages of a certain area in a given numpy array? For simplicity, lets say I have a 5x5 array:
values = np.array([[0, 1, 2, 3, 4],
[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4, 5, 6, 7, 8]])
I would like to get the averages of each coordinate, with a specified area size, assuming the array wraps around. Lets say the certain area is size 2, thus anything around a certain point within distance 2 will be considered. For example, to get the average of the area from coordinate (2,2), we need to consider
2,
2, 3, 4,
2, 3, 4, 5, 6
4, 5, 6,
6,
Thus, the average will be 4.
For coordinate (4, 4) we need to consider:
6,
6, 7, 3,
6, 7, 8, 4, 5
3, 4, 0,
5,
Thus the average will be 4.92.
Currently, I have the following code below. But since I have a for loop I feel like it could be improved. Is there a way to just use numpy built in functions?
Is there a way to use np.vectorize to gather the subarrays (area), place it all in an array, then use np.einsum or something.
def get_average(matrix, loc, dist):
sum = 0
num = 0
size, size = matrix.shape
for y in range(-dist, dist + 1):
for x in range(-dist + abs(y), dist - abs(y) + 1):
y_ = (y + loc.y) % size
x_ = (x + loc.x) % size
sum += matrix[y_, x_]
num += 1
return sum/num
class Coord():
def __init__(self, x, y):
self.x = x
self.y = y
values = np.array([[0, 1, 2, 3, 4],
[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4, 5, 6, 7, 8]])
height, width = values.shape
averages = np.zeros((height, width), dtype=np.float16)
for r in range(height):
for c in range(width):
loc = Coord(c, r)
averages[r][c] = get_average(values, loc, 2)
print(averages)
Output:
[[ 3.07617188 2.92382812 3.5390625 4.15234375 4. ]
[ 2.92382812 2.76953125 3.38476562 4. 3.84570312]
[ 3.5390625 3.38476562 4. 4.6171875 4.4609375 ]
[ 4.15234375 4. 4.6171875 5.23046875 5.078125 ]
[ 4. 3.84570312 4.4609375 5.078125 4.921875 ]]
This solution is less efficient (slower) than yours but is just an example using numpy.ma module.
Required libraries:
import numpy as np
import numpy.ma as ma
Define methods to do the job:
# build the shape of the area as a rhomboid
def rhomboid2(dim):
size = 2*dim + 1
matrix = np.ones((size,size))
for y in range(-dim, dim + 1):
for x in range(-dim + abs(y), dim - abs(y) + 1):
matrix[(y + dim) % size, (x + dim) % size] = 0
return matrix
# build a mask using the area shaped
def mask(matrix_shape, rhom_dim):
mask = np.zeros(matrix_shape)
bound = 2*rhom_dim+1
rhom = rhomboid2(rhom_dim)
mask[0:bound, 0:bound] = rhom
# roll to set the position of the rhomboid to 0,0
mask = np.roll(mask,-rhom_dim, axis = 0)
mask = np.roll(mask,-rhom_dim, axis = 1)
return mask
Then, iterate to build the result:
mask_ = mask((5,5), 2) # call the mask sized as values array with a rhomboid area of size 2
averages = np.zeros_like(values, dtype=np.float16) # initialize the recipient
# iterate over the mask to calculate the average
for y in range(len(mask_)):
for x in range(len(mask_)):
masked = ma.array(values, mask = mask_)
averages[y,x] = np.mean(masked)
mask_ = np.roll(mask_, 1, axis = 1)
mask_ = np.roll(mask_, 1, axis = 0)
Which returns
# [[3.076 2.924 3.54 4.152 4. ]
# [2.924 2.77 3.385 4. 3.846]
# [3.54 3.385 4. 4.617 4.46 ]
# [4.152 4. 4.617 5.23 5.08 ]
# [4. 3.846 4.46 5.08 4.92 ]]
As shown in the following code, I have a chunk list x and the full list h. I want to reassign back the values stored in x in the correct positions of h.
index = 0
for t1 in range(lbp, ubp):
h[4 + t1] = x[index]
index = index + 1
Does anyone know how to write it in a single line/expression?
Disclaimer: This is part of a bigger project and I simplified the questions as much as possible. You can expect the matrix sizes to be correct but if you think I am missing something please ask for it. For testing you can use the following variable values:
h = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
x = [20, 21]
lbp = 2
ubp = 4
You can use slice assignment to expand on the left-hand side and assign your x list directly to the indices of h, e.g.:
h = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
x = [20, 21]
lbp = 2
ubp = 4
h[4 + lbp:4 + ubp] = x # or better yet h[4 + lbp:4 + lbp + len(x)] = x
print(h)
# [1, 2, 3, 4, 5, 6, 20, 21, 9, 10]
I'm not really sure why are you adding 4 to the indexes in your loop nor what lbp and ubp are supposed to mean, tho. Keep in mind that when you select a range like this, the list you're assigning to the range has to be of the same length as the range.
import numpy
square = numpy.reshape(range(0,16),(4,4))
square
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In the above array, how do I access the primary diagonal and secondary diagonal of any given element? For example 9.
by primary diagonal, I mean - [4,9,14],
by secondary diagonal, I mean - [3,6,9,12]
I can't use numpy.diag() cause it takes the entire array to get the diagonal.
Base on your description, with np.where, np.diagonal and np.fliplr
import numpy as np
x,y=np.where(square==9)
np.diagonal(square, offset=-(x-y))
Out[382]: array([ 4, 9, 14])
x,y=np.where(np.fliplr(square)==9)
np.diagonal(np.fliplr(square), offset=-(x-y))
# base on the op's comment it should be np.diagonal(np.fliplr(square), offset=-(x-y))
Out[396]: array([ 3, 6, 9, 12])
For the first diagonal, use the fact that both x_coordiante and y_coordinate increase with 1 each step:
def first_diagonal(x, y, length_array):
if x < y:
return zip(range(x, length_array), range(length_array - x))
else:
return zip(range(length_array - y), range(y, length_array))
For the secondary diagonal, use the fact that the x_coordinate + y_coordinate = constant.
def second_diagonal(x, y, length_array):
tot = x + y
return zip(range(tot+1), range(tot, -1, -1))
This gives you two lists you can use to access your matrix.
Of course, if you have a non square matrix these functions will have to be reshaped a bit.
To illustrate how to get the desired output:
a = np.reshape(range(0,16),(4,4))
first = first_diagonal(1, 2, len(a))
second = second_diagonal(1,2, len(a))
primary_diagonal = [a[i[0]][i[1]] for i in first]
secondary_diagonal = [a[i[0]][i[1]] for i in second]
print(primary_diagonal)
print(secondary_diagonal)
this outputs:
[4, 9, 14]
[3, 6, 9, 12]
I tried implementing the distance measure shown in the image, in Python as such:
import numpy as np
A = [1, 2, 3, 4, 5, 6, 7, 8, 1]
B = [1, 2, 3, 2, 4, 6, 7, 8, 2]
A = np.asarray(A).flatten()
B = np.asarray(B).flatten()
x = np.sum(1 - np.divide((1 + np.minimum(A, B)), (1 + np.maximum(A, B))))
print("Distance: {}".format(x))
but after testing, it doesn't seem to be the right approach. The maximum value returned if there's no similarity at all between the given vectors should be 1, with 0 as perfect similiarity. A and B in the image are both vectors with size m.
Edit: forgot to add that I ignored the part for min(A, B) < 0 as that wont ever happen for my intentions
This should work. First, we create a matrix AB by stacking the columns and calculate the minimum vector AB_min and maximum vector AB_max out of that. Then, we compute D as you defined it, making use of numpy.where to specify the two conditions. After that, we sum the elements to get the D_proposed as you defined it. It gives a value of 0.9 for this example.
import numpy as np
A = [1, 2, 3, 4, 5, 6, 7, 8, 1]
B = [1, 2, 3, 2, 4, 6, 7, 8, 2]
AB = np.column_stack((A,B))
AB_min = np.min(AB,1)
AB_max = np.max(AB,1)
print AB_min
print AB_max
D = np.where(AB_min >= 0.,\
1. - (1. + AB_min) / (1. + AB_max),\
1. - (1. + AB_min + abs(AB_min)) / (1. + AB_max + abs(AB_min)))
print D
D_proposed = np.sum(D)
print D_proposed