Can someone explain to me the code that is in the documentation specifically this:
Interpolation with periodic x-coordinates:
x = [-180, -170, -185, 185, -10, -5, 0, 365]
xp = [190, -190, 350, -350]
fp = [5, 10, 3, 4]
np.interp(x, xp, fp, period=360)
array([7.5 , 5. , 8.75, 6.25, 3. ,
3.25, 3.5 , 3.75])
I did a trial like this
import matplotlib.pyplot as plt
import numpy as np
x = [-180, -170, -185, 185, -10, -5, 0, 365]
xp = [190, -190, 350, -350]
fp = [5, 10, 3, 4]
y=np.interp(x, xp, fp, period=360)
print(x)
print(y)
plt.grid()
plt.plot(xp, fp)
#plt.scatter(x,y,marker="o",color="green")
plt.plot(x,y,'o')
plt.show()
and it shows like this
How the orange points can be considered "interpolations" is beyond me. They are not even in the curve
EDIT: Thanks to Warren Weckesser for the detailed explanation!
A plot to see it better
The numbers used in the example that demonstrates the use of period in the interp docstring can be a bit difficult to interpret in a plot. Here's what is happening...
The period is 360, and the given "known" points are
xp = [190, -190, 350, -350]
fp = [ 5, 10, 3, 4]
Note that the values in xp span an interval longer than 360. Let's consider the interval [0, 360) to be the fundamental domain of the interpolator. If we map the given points to the fundamental domain, they are:
xp1 = [190, 170, 350, 10]
fp1 = [ 5, 10, 3, 4]
Now for a periodic interpolator, we can imagine this data being extended periodically in the positive and negative directions, e.g.
xp_ext = [..., 190-360, 170-360, 350-360, 10-360, 190, 170, 350, 10, 190+360, 170+360, 350+360, 10+360, ...]
fp_ext = [..., 5, 10, 3, 4, 5, 10, 3, 4, 5, 10, 3, 4, ...]
It is this extended data that interp is interpolating.
Here's a script that replaces the array x from the example with a dense set of points. With this dense set, the plot of y = np.interp(x, xp, fp, period=360) should make clearer what is going on:
xp = [190, -190, 350, -350]
fp = [5, 10, 3, 4]
x = np.linspace(-360, 720, 1200)
y = np.interp(x, xp, fp, period=360)
plt.plot(x, y, '--')
plt.plot(xp, fp, 'ko')
plt.grid(True)
Each "corner" in the plot is at a point in the periodically extended version of (xp, fp).
Related
I'm able to calculate a rolling correlation coefficient for a 1D-array (data against [0, 1, 2, 3, 4]) using a loop.
I'm looking for a smarter solution using numpy (not pandas).
Here is my current code:
import numpy as np
data = np.array([10,5,8,9,15,22,26,11,15,16,18,7,4,8,-2,-3,-4,-6,-2,0,10,0,5,8])
x = np.zeros_like(data).astype('float32')
length = 5
for i in range(length, data.shape[0]):
x[i] = np.corrcoef(data[i - length:i], np.arange(length))[0, 1]
print(x)
x gives :
[ 0. 0. 0. 0. 0. 0.607 0.959 0.98 0.328 -0.287
-0.61 -0.314 -0.18 -0.8 -0.782 -0.847 -0.811 -0.825 -0.869 -0.283
0.566 0.863 0.643 0.454]
Any solution without the loop please?
Use a numpy.lib.stride_tricks.sliding_window_view (available in numpy v1.20.0+)
swindow = np.lib.stride_tricks.sliding_window_view(data, (length,))
which gives a view on the data array that looks like so:
array([[10, 5, 8, 9, 15],
[ 5, 8, 9, 15, 22],
[ 8, 9, 15, 22, 26],
[ 9, 15, 22, 26, 11],
[15, 22, 26, 11, 15],
[22, 26, 11, 15, 16],
[26, 11, 15, 16, 18],
[11, 15, 16, 18, 7],
[15, 16, 18, 7, 4],
[16, 18, 7, 4, 8],
[18, 7, 4, 8, -2],
[ 7, 4, 8, -2, -3],
[ 4, 8, -2, -3, -4],
[ 8, -2, -3, -4, -6],
[-2, -3, -4, -6, -2],
[-3, -4, -6, -2, 0],
[-4, -6, -2, 0, 10],
[-6, -2, 0, 10, 0],
[-2, 0, 10, 0, 5],
[ 0, 10, 0, 5, 8]])
Now, we want to apply the correlation coefficient calculation to each row of this array. Unfortunately, np.corrcoef doesn't take an axis argument, it applies the calculation to the entire matrix and doesn't provide a way to do so for each row/column.
However, the calculation for the correlation coefficient of two vectors is quite simple:
Applying that here:
def vec_corrcoef(X, y, axis=1):
Xm = np.mean(X, axis=axis, keepdims=True)
ym = np.mean(y)
n = np.sum((X - Xm) * (y - ym), axis=axis)
d = np.sqrt(np.sum((X - Xm)**2, axis=axis) * np.sum((y - ym)**2))
return n / d
Now, call this function with our array and arange:
cc = vec_corrcoef(swindow, np.arange(length))
which gives the desired result:
array([ 0.60697698, 0.95894955, 0.98 , 0.3279521 , -0.28709766,
-0.61035663, -0.31390158, -0.17995394, -0.80041656, -0.78192905,
-0.84702587, -0.81091772, -0.82464375, -0.86892667, -0.28347335,
0.56568542, 0.86304424, 0.64326752, 0.45374261, 0.38135638])
To get your x, just set the appropriate indices of a zeros array of the correct size.
Note: I think your x should contain nonzero values starting at the 4 index (because that's where the sliding window is full) instead of starting at index 5.
x = np.zeros(data.shape)
x[-len(cc):] = cc
If you are sure that your values should start at the index 5, then you can do:
x = np.zeros(data.shape)
x[length:] = cc[:-1] # Ignore the last value in cc
Comparing the runtimes of your original approach with those suggested in the answers here:
f_OP_loopy is your approach, which implements a sliding window using a loop
f_PH_numpy is my approach, which uses the sliding_window_view and the vectorized function for row-wise calculation of the vector correlation coefficient
f_RA_numpy is Rontogiannis's approach, which tiles the arange, calculates the correlation coefficient for the entire matrices, and only selects the first len(data) - length rows of the last column
f_RA_recur is Rontogiannis's recursive approach, but I didn't time this because it misses out on the last correlation coefficient.
Unsurprisingly, the numpy-only solution is faster than the loopy approach.
My numpy solution, which computes the row-wise correlation coefficient, is faster than that shown by Rontogiannis below, because the extra work involved in tiling the vector input and calculating the correlation of the entire matrix, only to discard the unwanted elements, is avoided by my approach.
As the input data size increases, this "extra work" in Rontogiannis's approach increases so much that its runtime is worse even than the loopy approach! I am unsure if this extra time is in the np.corrcoef calculation or in the np.tile operation.
Note: This plot was obtained on my 2.2GHz i7 Macbook Air with 8GB RAM, Python 3.10.7 and numpy 1.23.3. Similar results were obtained on Google Colab
If you're interested in the timing code, here it is:
import timeit
import numpy as np
from matplotlib import pyplot as plt
def time_funcs(funcs, sizes, arg_gen, N=20):
times = np.zeros((len(sizes), len(funcs)))
gdict = globals().copy()
for i, s in enumerate(sizes):
args = arg_gen(s)
print(args)
for j, f in enumerate(funcs):
gdict.update(locals())
try:
times[i, j] = timeit.timeit("f(*args)", globals=gdict, number=N) / N
print(f"{i}/{len(sizes)}, {j}/{len(funcs)}, {times[i, j]}")
except ValueError:
print(f"ERROR in {f}, with args=", *args)
return times
def plot_times(times, funcs):
fig, ax = plt.subplots()
for j, f in enumerate(funcs):
ax.plot(sizes, times[:, j], label=f.__name__)
ax.set_xlabel("Array size")
ax.set_ylabel("Time per function call (s)")
ax.set_xscale("log")
ax.set_yscale("log")
ax.legend()
ax.grid()
fig.tight_layout()
return fig, ax
#%%
def arg_gen(n):
return [np.random.randint(-100, 100, (n,)), 5]
#%%
def f_OP_loopy(data, length):
x = np.zeros_like(data).astype('float32')
for i in range(length-1, data.shape[0]):
x[i] = np.corrcoef(data[i - length + 1:i+1], np.arange(length))[0, 1]
return x
def f_PH_numpy(data, length):
swindow = np.lib.stride_tricks.sliding_window_view(data, (length,))
cc = vec_corrcoef(swindow, np.arange(length))
x = np.zeros(data.shape)
x[-len(cc):] = cc
return x
def f_RA_recur(data, length):
return np.concatenate((
np.zeros([length,]),
rolling_correlation_recurse(data, 0, length)
))
def f_RA_numpy(data, length):
n = len(data)
cc = np.corrcoef(np.lib.stride_tricks.sliding_window_view(data, length), np.tile(np.arange(length), (n-length+1, 1)))[:n-length+1, -1]
x = np.zeros(data.shape)
x[-len(cc):] = cc
return x
#%%
def rolling_correlation_recurse(data, i, length) :
assert i+length < data.size
left = np.array([np.corrcoef(data[i:i+length], np.arange(length))[0, 1]])
if i+length+1 == data.size :
return left
right = rolling_correlation_recurse(data, i+1, length)
return np.concatenate((left, right))
def vec_corrcoef(X, y, axis=1):
Xm = np.mean(X, axis=axis, keepdims=True)
ym = np.mean(y)
n = np.sum((X - Xm) * (y - ym), axis=axis)
d = np.sqrt(np.sum((X - Xm)**2, axis=axis) * np.sum((y - ym)**2))
return n / d
#%%
if __name__ == "__main__":
#%% Set up sim
sizes = [5, 10, 50, 100, 500, 1000, 5000, 10_000] #, 50_000, 100_000]
funcs = [f_OP_loopy, #f_RA_recur,
f_PH_numpy, f_RA_numpy]
#%% Run timing
time_fcalls = np.zeros((len(sizes), len(funcs))) * np.nan
time_fcalls = time_funcs(funcs, sizes, arg_gen)
fig, ax = plot_times(time_fcalls, funcs)
ax.set_xlabel(f"Input size")
plt.show()
input("Enter x to exit")
Ask and you shall receive. Here is a solution that uses recursion:
import numpy as np
data = np.array([10,5,8,9,15,22,26,11,15,16,18,7,4,8,-2,-3,-4,-6,-2,0,10,0,5,8])
length = 5
def rolling_correlation_recurse(data, i, length) :
assert i+length < data.size
left = np.array([np.corrcoef(data[i:i+length], np.arange(length))[0, 1]])
if i+length+1 == data.size :
return left
right = rolling_correlation_recurse(data, i+1, length)
return np.concatenate((left, right))
def rolling_correlation(data, length) :
return np.concatenate((
np.zeros([length,]),
rolling_correlation_recurse(data, 0, length)
))
print(rolling_correlation(data, length))
Edit: here is a numpy solution too:
n = len(data)
print(np.corrcoef(np.lib.stride_tricks.sliding_window_view(data, length), np.tile(np.arange(length), (n-length+1, 1)))[:n-length+1, -1])
I have an array of the shape (6416,17,3). I am trying to plot each entry (17,3) after each other in a 3D grid as if it's a video. This is the code I wrote for the visualizer function:
def draw_limbs_3d(ax, joints_3d, limb_parents):
# ax.clear()
for i in range(joints_3d.shape[0]):
x_pair = [joints_3d[i, 0], joints_3d[limb_parents[i], 0]]
y_pair = [joints_3d[i, 1], joints_3d[limb_parents[i], 1]]
z_pair = [joints_3d[i, 2], joints_3d[limb_parents[i], 2]]
ax.plot(x_pair, y_pair, zs=z_pair, linewidth=3)
def visualizer(joints_3d):
joint_parents = [16, 15, 1, 2, 3, 1, 5, 6, 14, 8, 9, 14, 11, 12, 14, 14, 1]
fig = plt.figure('3D Pose')
ax_3d = plt.axes(projection='3d')
plt.ion()
ax_3d.clear()
ax_3d.clear()
ax_3d.view_init(-90, -90)
ax_3d.set_xlim(-1000, 1000)
ax_3d.set_ylim(-1000, 1000)
ax_3d.set_zlim(0, 4000)
ax_3d.set_xticks([])
ax_3d.set_yticks([])
ax_3d.set_zticks([])
white = (1.0, 1.0, 1.0, 0.0)
ax_3d.w_xaxis.set_pane_color(white)
ax_3d.w_yaxis.set_pane_color(white)
ax_3d.w_xaxis.line.set_color(white)
ax_3d.w_yaxis.line.set_color(white)
ax_3d.w_zaxis.line.set_color(white)
draw_limbs_3d(ax_3d, joints_3d, joint_parents)
and I use this code to run on all entries:
joints_3d = np.load('output.npy')
for joint in joints_3d:
joint = joint.reshape((17,3))
visualizer(joint)
which causes the program to crash. It works for one array though and I get the correct plot. I would be grateful if you could help me. Thank you.
if i type:
microcar(np.array([[45, 10, 10], [110, 10, 8], [60, 10, 5], [170, 10, 4]]), np.array([[47, 10, 15], [112, 9, 8.5], [50, 10, 8], [160, 8.5, 5]]))
it returns:
(52.53219888177297, 85.09035245341184, -148.85032037263932, 18.5359684117836, 100, 150.0)
which is good, however i want it to repeat this code for the next set of 3 values and so on e.g. [110,10,8] for the expected and [50,10,8] for the actual.
i can't figure how to incorporate a loop, where it treats the next set of 3 values as the new one to look at.
Also, cos(45) = 0.707106 (45 degrees) however it treats the cos(45) = 0.5253 (as radians) is there a way to convert the settings to degrees?
Below is my code
import numpy as np
def microcar(expected, actual):
horizontal_expected = expected[0,1]*expected[0,2]*np.cos(expected[0,0])
vertical_expected = expected[0,1]*expected[0,2]*np.sin(expected[0,0])
horizontal_actual = actual[0,1]*actual[0,2]*np.cos(actual[0,0])
vertical_actual = actual[0,1]*actual[0,2]*np.sin(actual[0,0])
distance_expected = expected[0,1]*expected[0,2]
distance_actual = actual[0,1]*actual[0,2]
return horizontal_expected, vertical_expected, horizontal_actual, vertical_actual, distance_expected, distance_actual
You can zip the inputs and loop over them like this
import numpy as np
def microcar(expected, actual):
l = zip(expected, actual)
res = []
for e in l:
horizontal_expected = e[0][1]*e[0][2]*np.cos(e[0][0])
vertical_expected = e[0][1]*e[0][2]*np.sin(e[0][0])
horizontal_actual = e[1][1]*e[1][2]*np.cos(e[1][0])
vertical_actual = e[1][1]*e[1][2]*np.sin(e[1][0])
distance_expected = e[0][1]*e[0][2]
distance_actual = e[1][1]*e[1][2]
res.append([
horizontal_expected,
vertical_expected,
horizontal_actual,
vertical_actual,
distance_expected,
distance_actual
])
return res
x = microcar(
np.array([[45, 10, 10], [110, 10, 8], [60, 10, 5], [170, 10, 4]]),
np.array([[47, 10, 15], [112, 9, 8.5], [50, 10, 8], [160, 8.5, 5]])
)
print(x)
The output:
[
[52.53219888177297, 85.09035245341184, -148.85032037263932, 18.5359684117836, 100, 150.0],
[-79.92166506517184, -3.539414246805677, 34.88163648998712, -68.08466373406274, 80, 76.5],
[-47.62064902075782, -15.240531055110834, 77.19728227936906, -20.9899882963143, 50, 80.0],
[37.51979008477766, 13.865978219881212, -41.46424579379759, 9.325573481107702, 40, 42.5]
]
I don't know what kind of output you expected, so this simply returns a list of lists with the results.
As for your question about np.cos, it expects input in radians, so you could convert the degrees to radians through np.deg2rad:
import numpy as np
print(np.cos(np.deg2rad(45))
# 0.7071067811865476
Without using zip, you can create a range equal to the length of one the arrays and loop over that, using the values (in this case i) to index into the arrays, in the following way
import numpy as np
def microcar(expected, actual):
res = []
for i in range(len(expected)):
horizontal_expected = expected[i,1]*expected[i,2]*np.cos(expected[i,0])
vertical_expected = expected[i,1]*expected[i,2]*np.sin(expected[i,0])
horizontal_actual = actual[i,1]*actual[i,2]*np.cos(actual[i,0])
vertical_actual = actual[i,1]*actual[i,2]*np.sin(actual[i,0])
distance_expected = expected[i,1]*expected[i,2]
distance_actual = actual[i,1]*actual[i,2]
res.append([
horizontal_expected,
vertical_expected,
horizontal_actual,
vertical_actual,
distance_expected,
distance_actual
])
return res
x = microcar(
np.array([[45, 10, 10], [110, 10, 8], [60, 10, 5], [170, 10, 4]]),
np.array([[47, 10, 15], [112, 9, 8.5], [50, 10, 8], [160, 8.5, 5]])
)
print(x)
Output:
[
[52.53219888177297, 85.09035245341184, -148.85032037263932, 18.5359684117836, 100, 150.0],
[-79.92166506517184, -3.539414246805677, 34.88163648998712, -68.08466373406274, 80, 76.5],
[-47.62064902075782, -15.240531055110834, 77.19728227936906, -20.9899882963143, 50, 80.0],
[37.51979008477766, 13.865978219881212, -41.46424579379759, 9.325573481107702, 40, 42.5]
]
Note that this assumes that both inputs are of equal length. If they are not, you will likely encounter an IndexError exception. This assumption holds for zip as well, but there, you would "lose" the surplus entries in the longer array.
I am trying to do a piecewise linear regression in Python and the data looks like this,
I need to fit 3 lines for each section. Any idea how? I am having the following code, but the result is shown below. Any help would be appreciated.
import numpy as np
import matplotlib
import matplotlib.cm as cm
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
from scipy import optimize
def piecewise(x,x0,x1,y0,y1,k0,k1,k2):
return np.piecewise(x , [x <= x0, np.logical_and(x0<x, x< x1),x>x1] , [lambda x:k0*x + y0, lambda x:k1*(x-x0)+y1+k0*x0 lambda x:k2*(x-x1) y0+y1+k0*x0+k1*(x1-x0)])
x1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ,11, 12, 13, 14, 15,16,17,18,19,20,21], dtype=float)
y1 = np.array([5, 7, 9, 11, 13, 15, 28.92, 42.81, 56.7, 70.59, 84.47, 98.36, 112.25, 126.14, 140.03,145,147,149,151,153,155])
y1 = np.flip(y1,0)
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ,11, 12, 13, 14, 15,16,17,18,19,20,21], dtype=float)
y = np.array([5, 7, 9, 11, 13, 15, 28.92, 42.81, 56.7, 70.59, 84.47, 98.36, 112.25, 126.14, 140.03,145,147,149,151,153,155])
y = np.flip(y,0)
perr_min = np.inf
p_best = None
for n in range(100):
k = np.random.rand(7)*20
p , e = optimize.curve_fit(piecewise, x1, y1,p0=k)
perr = np.sum(np.abs(y1-piecewise(x1, *p)))
if(perr < perr_min):
perr_min = perr
p_best = p
xd = np.linspace(0, 21, 100)
plt.figure()
plt.plot(x1, y1, "o")
y_out = piecewise(xd, *p_best)
plt.plot(xd, y_out)
plt.show()
data with fit
Thanks.
A very simple method (without iteration, without initial guess) can solve this problem.
The method of calculus comes from page 30 of this paper : https://fr.scribd.com/document/380941024/Regression-par-morceaux-Piecewise-Regression-pdf (copy below).
The next figure shows the result :
The equation of the fitted function is :
Or equivalently :
H is the Heaviside function.
In addition, the details of the numerical calculus are given below :
I wrote IDL code:
zz= [ 0, 5, 10, 15, 30, 50, 90, 100, 500]
uz= [ 20, 20, 20, 30, 60, 90, 30, -200, -200]*(-1.)
zp= findgen(120)*500+500
up= spline((zz-10.),uz,(zp/1000.0))
print, up
and IDL gave me the values of up array from about -20 to 500
.The same I did in Python
import numpy as npy
zz = npy.array([ 0, 5, 10, 15, 30, 50, 90, 100, 500])
uz = npy.array([ 20, 20, 20, 30, 60, 90, 30, -200, -200])*(-1.)
zp = npy.arange(0,120)*500+500
from scipy.interpolate import interp1d
cubic_interp_u = interp1d(zz-10., uz, kind='cubic')
up = cubic_interp_u(zp/1000)
print up
and it gave me up with values from about -20 to -160. Any idea? Thanks in advance!
Actually, I don't see a problem. I'm using UnivariateSpline here instead of interp1d and cubic_interp_u, but the underlying routines are essentially the same, as far as I can tell:
import numpy as npy
import pyplot as pl
from scipy.interpolate import UnivariateSpline
zz = npy.array([ 0, 5, 10, 15, 30, 50, 90, 100, 500])
uz = npy.array([ 20, 20, 20, 30, 60, 90, 30, -200, -200])*(-1.)
zp = npy.arange(0,120)*500+500
pl.plot(zz, uz, 'ro')
pl.plot(zp/100, UnivariateSpline(zz, uz, s=1, k=3)(zp/100), 'k-.')
pl.plot(zp/1000, UnivariateSpline(zz, uz, s=1, k=3)(zp/1000), 'b-')
The only problem I see is that you limited the interpolation, by using zp/1000. Using zp/100, I get all lots of values outside that -160, -20 range, which you can also see on the graph from the dot-dashed line, compared to the blue line (zp/1000):
It looks like scipy is doing a fine job.
By the way, if you want to (spline-)fit such outlying values, you may want to consider working in log-log space instead, or roughly normalizing your data (log-log space kind-of does that). Most fitting problems work best if the values are in the same order of magnitude.