The objective is to find the point of intersection of two linear equations. These two linear equation are derived using the Numpy polyfit functions.
Given two time series (xLeft, yLeft) and (xRight, yRight), the linear least suqares fit to each of them was calculated using polyfit as shown below:
xLeft = [
6168, 6169, 6170, 6171, 6172, 6173, 6174, 6175, 6176, 6177,
6178, 6179, 6180, 6181, 6182, 6183, 6184, 6185, 6186, 6187
]
yLeft = [
0.98288751, 1.3639959, 1.7550986, 2.1539073, 2.5580614,
2.9651523, 3.3727503, 3.7784295, 4.1797948, 4.5745049,
4.9602985, 5.3350167, 5.6966233, 6.0432272, 6.3730989,
6.6846867, 6.9766307, 7.2477727, 7.4971657, 7.7240791
]
xRight = [
6210, 6211, 6212, 6213, 6214, 6215, 6216, 6217, 6218, 6219,
6220, 6221, 6222, 6223, 6224, 6225, 6226, 6227, 6228, 6229,
6230, 6231, 6232, 6233, 6234, 6235, 6236, 6237, 6238, 6239,
6240, 6241, 6242, 6243, 6244, 6245, 6246, 6247, 6248, 6249,
6250, 6251, 6252, 6253, 6254, 6255, 6256, 6257, 6258, 6259,
6260, 6261, 6262, 6263, 6264, 6265, 6266, 6267, 6268, 6269,
6270, 6271, 6272, 6273, 6274, 6275, 6276, 6277, 6278, 6279,
6280, 6281, 6282, 6283, 6284, 6285, 6286, 6287, 6288]
yRight=[
7.8625913, 7.7713094, 7.6833806, 7.5997391, 7.5211883,
7.4483986, 7.3819046, 7.3221073, 7.2692747, 7.223547,
7.1849418, 7.1533613, 7.1286001, 7.1103559, 7.0982385,
7.0917811, 7.0904517, 7.0936642, 7.100791, 7.1111741,
7.124136, 7.1389918, 7.1550579, 7.1716633, 7.1881566,
7.2039142, 7.218349, 7.2309117, 7.2410989, 7.248455,
7.2525721, 7.2530937, 7.249711, 7.2421637, 7.2302341,
7.213747, 7.1925621, 7.1665707, 7.1356878, 7.0998487,
7.0590014, 7.0131001, 6.9621005, 6.9059525, 6.8445964,
6.7779589, 6.7059474, 6.6284504, 6.5453324, 6.4564347,
6.3615761, 6.2605534, 6.1531439, 6.0391097, 5.9182019,
5.7901659, 5.6547484, 5.5117044, 5.360805, 5.2018456,
5.034656, 4.8591075, 4.6751242, 4.4826899, 4.281858,
4.0727611, 3.8556159, 3.6307325, 3.3985188, 3.1594861,
2.9142516, 2.6635408, 2.4081881, 2.1491354, 1.8874279,
1.6242117,1.3607255,1.0982931,0.83831298
]
left_line = np.polyfit(xleft, yleft, 1)
right_line = np.polyfit(xRight, yRight, 1)
In this case, polyfit outputs the coeficients m and b for y = mx + b, respectively.
The intersection of the two linear equations then can be calculated as follows:
x0 = -(left_line[1] - right_line[1]) / (left_line[0] - right_line[0])
y0 = x0 * left_line[0] + left_line[1]
However, I wonder whether there exist Numpy build-in approach to calculate the last two steps?
Not exactly a built-in approach, but you can simplify the problem. Say I have lines given my y = m1 * x + b1 and y = m2 * x + b2. You can trivially find an equation for the difference, which is also a line:
y = (m1 - m2) * x + (b1 - b2)
Notice that this line will have a root at the intersection of the two original lines, if they intersect. You can use the numpy.polynomial.Polynomial class to perform these operations:
>>> (np.polynomial.Polynomial(left_line[::-1]) - np.polynomial.Polynomial(right_line[::-1])).roots()
array([6192.0710885])
Notice that I had to swap the order of the coefficients, since Polynomial expects smallest to largest, while np.polyfit returns the opposite. In fact, np.polyfit is not recommended. Instead, you can get Polynomial obejcts directly using np.polynomial.Polynomial.fit class method. Your code would then look like:
left_line = np.polynomial.Polynomial.fit(xLeft, yLeft, 1, domain=[-1, 1])
right_line = np.polynomial.Polynomial.fit(xRight, yRight, 1, domain=[-1, 1])
x0 = (left_line - right_line).roots()
y0 = left_line(x0)
The domain is mapped to the window [-1, 1]. If you do not specify a domain, the peak-to-peak of the x-values will be used instead. You do not want this, since it will result in a mapping of the input values. Instead, we explicitly specify that the domain [-1, 1] maps to the same window. An alternative would be to use the default domain and set e.g. window=[xLeft.min(), xLeft.max()]. The problem with this approach is that it would then create different domains for the polynomials, preventing the operation left_line - right_line.
See https://numpy.org/doc/stable/reference/routines.polynomials.classes.html for more information.
You can model it as a linear system and use simple linear algebra:
def get_intersection(m1,b1,m2,b2):
A = np.array([[-m1, 1], [-m2, 1]])
b = np.array([[b1], [b2]])
# you have to solve linear System AX = b where X = [x y]'
X = np.linalg.pinv(A) # b
x, y = np.round(np.squeeze(X), 4)
return x, y # returns point of intersection (x,y) with 4 decimal precision
m1,b1,m2,b2 = left_line(0), left_line(1), right_line(0), right_line(1)
print(get_intersection(m1,b1,m2,b2))
As an example, for lines y - x = 1, and y + x = 1, we expect the intersection as (0,1):
m1,b1,m2,b2 = 1, 1, -1, 1
print(get_intersection(m1,b1,m2,b2))
Output: (0.0, 1.0) as expected.
I need to calculate all unit vectors between two sets of points.
Currently I have this:
def all_unit_vectors(points_a, points_b):
results = np.zeros((len(points_a) * len(points_b), 3), dtype=np.float32)
count = 0
for pt_a in points_a:
for pt_b in points_b:
results[count] = (pt_a - pt_b)/np.linalg.norm([pt_a - pt_b])
count += 1
return results
in_a = np.array([[51.34, 63.68, 7.98],
[53.16, 63.23, 7.19],
[77.50, 62.55, 4.23],
[79.54, 62.73, 3.61]])
in_b = np.array([[105.58, 61.09, 5.50],
[107.37, 60.66, 6.50],
[130.73, 58.30, 12.33],
[132.32, 58.48, 13.38]])
results = all_unit_vectors(in_a, in_b)
print(results)
which (correctly) outputs:
[[-0.368511 0.01759667 0.01684932]
[-0.3777128 0.02035861 0.00997707]
[-0.47964868 0.03250422 -0.02628129]
[-0.4851439 0.03115273 -0.03235091]
[-0.3551545 0.01449887 0.01145004]
[-0.3644423 0.01727756 0.00463872]
[-0.46762046 0.02971985 -0.03098581]
[-0.4732132 0.02839518 -0.03700341]
[-0.17814296 0.00926242 -0.00805704]
[-0.18821244 0.01190899 -0.01430339]
[-0.3044056 0.02430441 -0.04632135]
[-0.31113514 0.0230996 -0.05193153]
[-0.16408844 0.0103343 -0.01190965]
[-0.1741932 0.01295652 -0.01808905]
[-0.29113463 0.02519489 -0.04959355]
[-0.29793915 0.02399093 -0.05515092]]
Can the loops in all_unit_vectors() be vectorized?
norm is calculated as root of sum squared, you can implement your own norm calculation as follows, and then vectorize your solution with broadcasting:
diff = (in_a[:, None] - in_b).reshape(-1, 3)
norm = ((in_a[:, None] ** 2 + in_b ** 2).sum(2) ** 0.5).reshape(-1, 1)
diff / norm
gives:
[[-0.36851098 0.01759667 0.01684932]
[-0.3777128 0.02035861 0.00997706]
[-0.47964868 0.03250422 -0.02628129]
[-0.4851439 0.03115273 -0.03235091]
[-0.35515452 0.01449887 0.01145004]
[-0.36444229 0.01727756 0.00463872]
[-0.46762047 0.02971985 -0.03098581]
[-0.4732132 0.02839518 -0.03700341]
[-0.17814297 0.00926242 -0.00805704]
[-0.18821243 0.01190899 -0.01430339]
[-0.30440561 0.02430441 -0.04632135]
[-0.31113513 0.0230996 -0.05193153]
[-0.16408845 0.0103343 -0.01190965]
[-0.1741932 0.01295652 -0.01808905]
[-0.29113461 0.02519489 -0.04959355]
[-0.29793917 0.02399093 -0.05515092]]
Play.
consider the below (3, 13) np.array
from scipy.stats import linregress
a = [-0.00845,-0.00568,-0.01286,-0.01302,-0.02212,-0.01501,-0.02132,-0.00783,-0.00942,0.00158,-0.00016,0.01422,0.01241]
b = [0.00115,0.00623,0.00160,0.00660,0.00951,0.01258,0.00787,0.01854,0.01462,0.01479,0.00980,0.00607,-0.00106]
c = [-0.00233,-0.00467,0.00000,0.00000,-0.00952,-0.00949,-0.00958,-0.01696,-0.02212,-0.01006,-0.00270,0.00763,0.01005]
array = np.array([a,b,c])
yvalues = pd.to_datetime(['2019-12-15','2019-12-16','2019-12-17','2019-12-18','2019-12-19','2019-12-22','2019-12-23','2019-12-24',\
'2019-12-25','2019-12-26','2019-12-29','2019-12-30','2019-12-31'], errors='coerce')
I can run the OLS regression on one column at a time successfully, as in below:
out = linregress(array[0], y=yvalues.to_julian_date())
print(out)
LinregressResult(slope=329.141087037396, intercept=2458842.411731361, rvalue=0.684426534581417, pvalue=0.009863937200252878, stderr=105.71465449878443)
However, what i wish to accomplish is to: run the regression on the matrix array with 'y' variable (yvalues) being constant for all columns -in one go (loop is possible solution but tiresome). I tried to extend 'yvalues' to match array shape with (np.tile). but is seems not to be the right approach. thank you all for your help.
IIUC you are looking for something like the following list comprehension in a vectorized way:
out = [linregress(array[i], y=yvalues.to_julian_date()) for i in range(array.shape[0])]
out
[LinregressResult(slope=329.141087037396, intercept=2458842.411731361, rvalue=0.684426534581417, pvalue=0.009863937200252876, stderr=105.71465449878443),
LinregressResult(slope=178.44888292241782, intercept=2458838.7056912296, rvalue=0.1911788042719021, pvalue=0.5315353013148307, stderr=276.24376878908953),
LinregressResult(slope=106.86168938856262, intercept=2458840.7656617565, rvalue=0.17721031419860186, pvalue=0.5624701260912525, stderr=178.940293876864)]
To be honest I've never seen what you are looking for implemented using scipy or statsmodels functionalities.
Therefore we can implement it ourselves exploiting numpy broadcasting:
x = array
y = np.array(yvalues.to_julian_date())
# mean of our inputs and outputs
x_mean = np.mean(x, axis=1)
y_mean = np.mean(y)
#total number of values
n = x.shape[1]
# using the formula to calculate the slope and intercept
n = np.sum((x - x_mean[:,np.newaxis]) * (y - y_mean)[np.newaxis,:], axis=1)
d = np.sum((x - x_mean[:,np.newaxis])**2, axis=1)
slopes = n/d
intercepts = y_mean - slopes*x_mean
slopes
array([329.14108704, 178.44888292, 106.86168939])
intercepts
array([2458842.41173136, 2458838.70569123, 2458840.76566176])
I want to calculate root mean square of a function in Python. My function is in a simple form like y = f(x). x and y are arrays.
I tried Numpy and Scipy Docs and couldn't find anything.
I'm going to assume that you want to compute the expression given by the following pseudocode:
ms = 0
for i = 1 ... N
ms = ms + y[i]^2
ms = ms / N
rms = sqrt(ms)
i.e. the square root of the mean of the squared values of elements of y.
In numpy, you can simply square y, take its mean and then its square root as follows:
rms = np.sqrt(np.mean(y**2))
So, for example:
>>> y = np.array([0, 0, 1, 1, 0, 1, 0, 1, 1, 1]) # Six 1's
>>> y.size
10
>>> np.mean(y**2)
0.59999999999999998
>>> np.sqrt(np.mean(y**2))
0.7745966692414834
Do clarify your question if you mean to ask something else.
You could use the sklearn function
from sklearn.metrics import mean_squared_error
rmse = mean_squared_error(y_actual,[0 for _ in y_actual], squared=False)
numpy.std(x) tends to rms(x) in cases of mean(x) value tends to 0 (thanks to #Seb), like it can be with sound records, vibrations, and other signals of fluctuations from zero.
rms = lambda x_seq: (sum(x*x for x in x_seq)/len(x_seq))**(1/2)
In case you'd like to frame your array before compute RMS, this is a numpy solution:
nframes = 1000
rms = np.array([
np.sqrt(np.mean(arr**2))
for arr in np.array_split(arr,nframes)
])
If you'd like to specify frame length instead of frame counts, you'd do this first:
frame_length = 200
arr_length = arr.shape[0]
nframes = arr_length // frame_length +1