Find the point of intersection of two linear equations using Numpy - python
The objective is to find the point of intersection of two linear equations. These two linear equation are derived using the Numpy polyfit functions.
Given two time series (xLeft, yLeft) and (xRight, yRight), the linear least suqares fit to each of them was calculated using polyfit as shown below:
xLeft = [
6168, 6169, 6170, 6171, 6172, 6173, 6174, 6175, 6176, 6177,
6178, 6179, 6180, 6181, 6182, 6183, 6184, 6185, 6186, 6187
]
yLeft = [
0.98288751, 1.3639959, 1.7550986, 2.1539073, 2.5580614,
2.9651523, 3.3727503, 3.7784295, 4.1797948, 4.5745049,
4.9602985, 5.3350167, 5.6966233, 6.0432272, 6.3730989,
6.6846867, 6.9766307, 7.2477727, 7.4971657, 7.7240791
]
xRight = [
6210, 6211, 6212, 6213, 6214, 6215, 6216, 6217, 6218, 6219,
6220, 6221, 6222, 6223, 6224, 6225, 6226, 6227, 6228, 6229,
6230, 6231, 6232, 6233, 6234, 6235, 6236, 6237, 6238, 6239,
6240, 6241, 6242, 6243, 6244, 6245, 6246, 6247, 6248, 6249,
6250, 6251, 6252, 6253, 6254, 6255, 6256, 6257, 6258, 6259,
6260, 6261, 6262, 6263, 6264, 6265, 6266, 6267, 6268, 6269,
6270, 6271, 6272, 6273, 6274, 6275, 6276, 6277, 6278, 6279,
6280, 6281, 6282, 6283, 6284, 6285, 6286, 6287, 6288]
yRight=[
7.8625913, 7.7713094, 7.6833806, 7.5997391, 7.5211883,
7.4483986, 7.3819046, 7.3221073, 7.2692747, 7.223547,
7.1849418, 7.1533613, 7.1286001, 7.1103559, 7.0982385,
7.0917811, 7.0904517, 7.0936642, 7.100791, 7.1111741,
7.124136, 7.1389918, 7.1550579, 7.1716633, 7.1881566,
7.2039142, 7.218349, 7.2309117, 7.2410989, 7.248455,
7.2525721, 7.2530937, 7.249711, 7.2421637, 7.2302341,
7.213747, 7.1925621, 7.1665707, 7.1356878, 7.0998487,
7.0590014, 7.0131001, 6.9621005, 6.9059525, 6.8445964,
6.7779589, 6.7059474, 6.6284504, 6.5453324, 6.4564347,
6.3615761, 6.2605534, 6.1531439, 6.0391097, 5.9182019,
5.7901659, 5.6547484, 5.5117044, 5.360805, 5.2018456,
5.034656, 4.8591075, 4.6751242, 4.4826899, 4.281858,
4.0727611, 3.8556159, 3.6307325, 3.3985188, 3.1594861,
2.9142516, 2.6635408, 2.4081881, 2.1491354, 1.8874279,
1.6242117,1.3607255,1.0982931,0.83831298
]
left_line = np.polyfit(xleft, yleft, 1)
right_line = np.polyfit(xRight, yRight, 1)
In this case, polyfit outputs the coeficients m and b for y = mx + b, respectively.
The intersection of the two linear equations then can be calculated as follows:
x0 = -(left_line[1] - right_line[1]) / (left_line[0] - right_line[0])
y0 = x0 * left_line[0] + left_line[1]
However, I wonder whether there exist Numpy build-in approach to calculate the last two steps?
Not exactly a built-in approach, but you can simplify the problem. Say I have lines given my y = m1 * x + b1 and y = m2 * x + b2. You can trivially find an equation for the difference, which is also a line:
y = (m1 - m2) * x + (b1 - b2)
Notice that this line will have a root at the intersection of the two original lines, if they intersect. You can use the numpy.polynomial.Polynomial class to perform these operations:
>>> (np.polynomial.Polynomial(left_line[::-1]) - np.polynomial.Polynomial(right_line[::-1])).roots()
array([6192.0710885])
Notice that I had to swap the order of the coefficients, since Polynomial expects smallest to largest, while np.polyfit returns the opposite. In fact, np.polyfit is not recommended. Instead, you can get Polynomial obejcts directly using np.polynomial.Polynomial.fit class method. Your code would then look like:
left_line = np.polynomial.Polynomial.fit(xLeft, yLeft, 1, domain=[-1, 1])
right_line = np.polynomial.Polynomial.fit(xRight, yRight, 1, domain=[-1, 1])
x0 = (left_line - right_line).roots()
y0 = left_line(x0)
The domain is mapped to the window [-1, 1]. If you do not specify a domain, the peak-to-peak of the x-values will be used instead. You do not want this, since it will result in a mapping of the input values. Instead, we explicitly specify that the domain [-1, 1] maps to the same window. An alternative would be to use the default domain and set e.g. window=[xLeft.min(), xLeft.max()]. The problem with this approach is that it would then create different domains for the polynomials, preventing the operation left_line - right_line.
See https://numpy.org/doc/stable/reference/routines.polynomials.classes.html for more information.
You can model it as a linear system and use simple linear algebra:
def get_intersection(m1,b1,m2,b2):
A = np.array([[-m1, 1], [-m2, 1]])
b = np.array([[b1], [b2]])
# you have to solve linear System AX = b where X = [x y]'
X = np.linalg.pinv(A) # b
x, y = np.round(np.squeeze(X), 4)
return x, y # returns point of intersection (x,y) with 4 decimal precision
m1,b1,m2,b2 = left_line(0), left_line(1), right_line(0), right_line(1)
print(get_intersection(m1,b1,m2,b2))
As an example, for lines y - x = 1, and y + x = 1, we expect the intersection as (0,1):
m1,b1,m2,b2 = 1, 1, -1, 1
print(get_intersection(m1,b1,m2,b2))
Output: (0.0, 1.0) as expected.
Related
Discrepancy between log_prob and manual calculation
I want to define multivariate normal distribution with mean [1, 1, 1] and variance covariance matrix with 0.3 on diagonal. After that I want to calculate log likelihood on datapoints [2, 3, 4] By torch distributions import torch import torch.distributions as td input_x = torch.tensor([2, 3, 4]) loc = torch.ones(3) scale = torch.eye(3) * 0.3 mvn = td.MultivariateNormal(loc = loc, scale_tril=scale) mvn.log_prob(input_x) tensor(-76.9227) From scratch By using formula for log likelihood: We obtain tensor: first_term = (2 * np.pi* 0.3)**(3) first_term = -np.log(np.sqrt(first_term)) x_center = input_x - loc tmp = torch.matmul(x_center, scale.inverse()) tmp = -1/2 * torch.matmul(tmp, x_center) first_term + tmp tensor(-24.2842) where I used fact that My question is - what's the source of this discrepancy?
You are passing the covariance matrix to the scale_tril instead of covariance_matrix. From the docs of PyTorch's Multivariate Normal scale_tril (Tensor) – lower-triangular factor of covariance, with positive-valued diagonal So, replacing scale_tril with covariance_matrix would yield the same results as your manual attempt. In [1]: mvn = td.MultivariateNormal(loc = loc, covariance_matrix=scale) In [2]: mvn.log_prob(input_x) Out[2]: tensor(-24.2842) However, it's more efficient to use scale_tril according to the authors: ...Using scale_tril will be more efficient: You can calculate the lower choelsky using torch.linalg.cholesky In [3]: mvn = td.MultivariateNormal(loc = loc, scale_tril=torch.linalg.cholesky(scale)) In [4]: mvn.log_prob(input_x) Out[4]: tensor(-24.2842)
Python: Dendogram with Scipy doesn´t work
I want to use the dendogram of scipy. I have the following data: I have a list with seven different means. For example: Y = [71.407452200146807, 0, 33.700136456196823, 1112.3757110973756, 31.594949722819372, 34.823881975554166, 28.36368420190157] Each mean is calculate for a different user. For example: X = ["user1", "user2", "user3", "user4", "user5", "user6", "user7"] My aim is to display the data described above with the help of a dendorgram. I tried the following: Y = [71.407452200146807, 0, 33.700136456196823, 1112.3757110973756, 31.594949722819372, 34.823881975554166, 28.36368420190157] X = ["user1", "user2", "user3", "user4", "user5", "user6", "user7"] # Attempt with matrix #X = np.concatenate((X, Y),) #Z = linkage(X) Z = linkage(Y) # Plot the dendogram with the results above dendrogram(Z, leaf_rotation=45., leaf_font_size=12. , show_contracted=True) plt.style.use("seaborn-whitegrid") plt.title("Dendogram to find clusters") plt.ylabel("Distance") plt.show() But it says: ValueError: Length n of condensed distance matrix 'y' must be a binomial coefficient, i.e.there must be a k such that (k \choose 2)=n)! I already tried to convert my data into a matrix. With: # Attempt with matrix #X = np.concatenate((X, Y),) #Z = linkage(X) But that doesn´t work too! Are there any suggestions? Thanks :-)
The first argument of linkage is either an n x m array, representing n points in m-dimensional space, or a one-dimensional array containing the condensed distance matrix. These are two very different meanings! The first is the raw data, i.e. the observations. The second format assumes that you have already computed all the distances between your observations, and you are providing these distances to linkage, not the original points. It looks like you want the first case (raw data), with m = 1. So you must reshape the input to have shape (n, 1). Replace this: Z = linkage(Y) with: Z = linkage(np.reshape(Y, (len(Y), 1)))
So you are using 7 observations in Y len(Y) = 7. But as per documentation of Linkage, the number of observations len(Y) should be such that. {n \choose 2} = len(Y) which means 1/2 * (n -1) * n = len(Y) so length of Y should be such that n is a valid integer.
Derivative of patsy dmatrix with respect to a specific variable
Edit: I now have a candidate solution to my question (see toy example below) -- if you can think of something more robust, please let me know. I just found out about python's patsy package for creating design matrices from R-style formulas, and it looks great. My question is this: given a patsy formula, e.g. "(x1 + x2 + x3)**2", is there an easy way to create a design matrix containing the derivative with respect to a particular variable, e.g. "x1"? Here's a toy example: import numpy as np import pandas as pd import patsy import sympy import sympy.parsing.sympy_parser as sympy_parser n_obs = 200 df = pd.DataFrame(np.random.uniform(size=(n_obs, 3)), columns=["x1", "x2", "x3"]) df.describe() design_matrix = patsy.dmatrix("(I(7*x1) + x2 + x3)**2 + I(x1**2) + I(x1*x2*x3)", df) design_matrix.design_info.column_names ## ['Intercept', 'I(7 * x1)', 'x2', 'x3', 'I(7 * x1):x2', 'I(7 * x1):x3', 'x2:x3', 'I(x1 ** 2)', 'I(x1 * x2 * x3)'] x1, x2, x3 = sympy.symbols("x1 x2 x3") def diff_wrt_x1(string): return str(sympy.diff(sympy_parser.parse_expr(string), x1)) colnames_to_differentiate = [colname.replace(":", "*").replace("Intercept", "1").replace("I", "") for colname in design_matrix.design_info.column_names] derivatives_wrt_x1 = [diff_wrt_x1(colname) for colname in colnames_to_differentiate] def get_column(string): try: return float(string) * np.ones((len(df), 1)) # For cases like string == "7" except ValueError: return patsy.dmatrix("0 + I(%s)" % string, df) # For cases like string == "x2*x3" derivative_columns = tuple(get_column(derivative_string) for derivative_string in derivatives_wrt_x1) design_matrix_derivative = np.hstack(derivative_columns) design_matrix_derivative[0] # Contains [0, 7, 0, 0, 7*x2, 7*x3, 0, 2*x1, x2*x3] design_matrix_derivative_manual = np.zeros_like(design_matrix_derivative) design_matrix_derivative_manual[:, 1] = 7.0 design_matrix_derivative_manual[:, 4] = 7*df["x2"] design_matrix_derivative_manual[:, 5] = 7*df["x3"] design_matrix_derivative_manual[:, 7] = 2*df["x1"] design_matrix_derivative_manual[:, 8] = df["x2"] * df["x3"] np.all(np.isclose(design_matrix_derivative, design_matrix_derivative_manual)) # True! The code generates a design matrix with columns [1, 7*x1, x2, x3, 7*x1*x2, 7*x1*x3, x2*x3, x1^2, x1*x2*x3]. Suppose I want a new formula which differentiates design_matrix with respect to x1. The desired result is a matrix of the same shape as design_matrix, but whose columns are [0, 7, 0, 0, 7*x2, 7*x3, 0, 2*x1, x2*x3]. Is there a programmatic way to do that? I've tried searching the patsy docs as well as stackoverflow and I don't see anything. Of course I can create the derivative matrix manually, but it would be great to have a function that does it (and that doesn't have to be updated when I change the formula to, say, "(x1 + x2 + x3 + x4)**2 + I(x1**3)").
Python Rbf gives singular matrix error with no duplicate coordinates, why?
Very similar to RBF interpolation fails: LinAlgError: singular matrix but I think the problem is different, as I have no duplicated coordinates. Toy example: import numpy as np import scipy.interpolate as interp coords = (np.array([-1, 0, 1]), np.array([-2, 0, 2]), np.array([-1, 0, 1])) coords_mesh = np.meshgrid(*coords, indexing="ij") fn_value = np.power(coords_mesh[0], 2) + coords_mesh[1]*coords_mesh[2] # F(x, y, z) coords_array = np.vstack([x.flatten() for x in coords_mesh]).T # Columns are x, y, z unique_coords_array = np.vstack({tuple(row) for row in coords_array}) unique_coords_array.shape == coords_array.shape # True, i.e. no duplicate coords my_grid_interp = interp.RegularGridInterpolator(points=coords, values=fn_value) my_grid_interp(np.array([0, 0, 0])) # Runs without error my_rbf_interp = interp.Rbf(*[x.flatten() for x in coords_mesh], d=fn_value.flatten()) ## Error: numpy.linalg.linalg.LinAlgError: singular matrix -- why? What am I missing? The example above uses the function F(x, y, z) = x^2 + y*z. I'd like to use Rbf to approximate that function. As far as I can tell there are no duplicate coordinates: compare unique_coords_array to coords_array.
I believe the problem is your input: my_rbf_interp = interp.Rbf(*[x.flatten() for x in coords_mesh],d=fn_value.flatten()) Should you change to: x,y,z = [x.flatten() for x in coords_mesh] my_rbf_interp = interp.Rbf(x,y,z,fn_value.flatten()) And it should work. I think your original formulation is repeating lines in the matrix that goes for solve and thus having a very similar problem to duplicates (i.e. Singular Matrix). Also if you would do: d = fn_value.flatten() my_rbf_interp = interp.Rbf(*(x,y,z,d)) It should work also.
Find the position of a lowest difference between numpy arrays
I've got two musical files: one lossless with little sound gap (at this time it's just silence but it could be anything: sinusoid or just some noise) at the beginning and one mp3: In [1]: plt.plot(y[:100000]) Out[1]: In [2]: plt.plot(y2[:100000]) Out[2]: This lists are similar but not identical so I need to cut this gap, to find the first occurrence of one list in another with lowest delta error. And here's my solution (5.7065 sec.): error = [] for i in range(25000): y_n = y[i:100000] y2_n = y2[:100000-i] error.append(abs(y_n - y2_n).mean()) start = np.array(error).argmin() print(start, error[start]) #23057 0.0100046 Is there any pythonic way to solve this? Edit: After calculating the mean distance between special points (e.g. where data == 0.5) I reduce the area of search from 25000 to 2000. This gives me reasonable time of 0.3871s: a = np.where(y[:100000].round(1) == 0.5)[0] b = np.where(y2[:100000].round(1) == 0.5)[0] mean = int((a - b[:len(a)]).mean()) delta = 1000 error = [] for i in range(mean - delta, mean + delta): ...
What you are trying to do is a cross-correlation of the two signals. This can be done easily using signal.correlate from the scipy library: import scipy.signal import numpy as np # limit your signal length to speed things up lim = 25000 # do the actual correlation corr = scipy.signal.correlate(y[:lim], y2[:lim], mode='full') # The offset is the maximum of your correlation array, # itself being offset by (lim - 1): offset = np.argmax(corr) - (lim - 1) You might want to take a look at this answer to a similar problem.
Let's generate some data first N = 1000 y1 = np.random.randn(N) y2 = y1 + np.random.randn(N) * 0.05 y2[0:int(N / 10)] = 0 In these data, y1 and y2 are almost the same (note the small added noise), but the first 10% of y2 are empty (similarly to your example) We can now calculate the absolute difference between the two vectors and find the first element for which the absolute difference is below a sensitivity threshold: abs_delta = np.abs(y1 - y2) THRESHOLD = 1e-2 sel = abs_delta < THRESHOLD ix_start = np.where(sel)[0][0] fig, axes = plt.subplots(3, 1) ax = axes[0] ax.plot(y1, '-') ax.set_title('y1') ax.axvline(ix_start, color='red') ax = axes[1] ax.plot(y2, '-') ax.axvline(ix_start, color='red') ax.set_title('y2') ax = axes[2] ax.plot(abs_delta) ax.axvline(ix_start, color='red') ax.set_title('abs diff') This method works if the overlapping parts are indeed "almost identical". You will have to think of smarter alignment ways if the similarity is low.
I think what you are looking for is correlation. Here is a small example. import numpy as np equal_part = [0, 1, 2, 3, -2, -4, 5, 0] y1 = equal_part + [0, 1, 2, 3, -2, -4, 5, 0] y2 = [1, 2, 4, -3, -2, -1, 3, 2]+y1 np.argmax(np.correlate(y1, y2, 'same')) Out: 7 So this returns the time-difference, where the correlation between both signals is at its maximum. As you can see, in the example the time difference should be 8, but this depends on your data... Also note that both signals have the same length.