How to uniformly resample a non-uniform signal using SciPy? - python

I have an (x, y) signal with non-uniform sample rate in x. (The sample rate is roughly proportional to 1/x). I attempted to uniformly re-sample it using scipy.signal's resample function. From what I understand from the documentation, I could pass it the following arguments:
scipy.resample(array_of_y_values, number_of_sample_points, array_of_x_values)
and it would return the array of
[[resampled_y_values],[new_sample_points]]
I'd expect it to return an uniformly sampled data with a roughly identical form of the original, with the same minimal and maximalx value. But it doesn't:
# nu_data = [[x1, x2, ..., xn], [y1, y2, ..., yn]]
# with x values in ascending order
length = len(nu_data[0])
resampled = sg.resample(nu_data[1], length, nu_data[0])
uniform_data = np.array([resampled[1], resampled[0]])
plt.plot(nu_data[0], nu_data[1], uniform_data[0], uniform_data[1])
plt.show()
blue: nu_data, orange: uniform_data
It doesn't look unaltered, and the x scale have been resized too. If I try to fix the range: construct the desired uniform x values myself and use them instead, the distortion remains:
length = len(nu_data[0])
resampled = sg.resample(nu_data[1], length, nu_data[0])
delta = (nu_data[0,-1] - nu_data[0,0]) / length
new_samplepoints = np.arange(nu_data[0,0], nu_data[0,-1], delta)
uniform_data = np.array([new_samplepoints, resampled[0]])
plt.plot(nu_data[0], nu_data[1], uniform_data[0], uniform_data[1])
plt.show()
What is the proper way to re-sample my data uniformly, if not this?

Please look at this rough solution:
import matplotlib.pyplot as plt
from scipy import interpolate
import numpy as np
x = np.array([0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1, 2, 5, 10, 20])
y = np.exp(-x/3.0)
flinear = interpolate.interp1d(x, y)
fcubic = interpolate.interp1d(x, y, kind='cubic')
xnew = np.arange(0.001, 20, 1)
ylinear = flinear(xnew)
ycubic = fcubic(xnew)
plt.plot(x, y, 'X', xnew, ylinear, 'x', xnew, ycubic, 'o')
plt.show()
That is a bit updated example from scipy page. If you execute it, you should see something like this:
Blue crosses are initial function, your signal with non uniform sampling distribution. And there are two results - orange x - representing linear interpolation, and green dots - cubic interpolation. Question is which option you prefer? Personally I don't like both of them, that is why I usually took 4 points and interpolate between them, then another points... to have cubic interpolation without that strange ups. That is much more work, and also I can't see doing it with scipy, so it will be slow. That is why I've asked about size of the data.

Related

How to find the optimized or correct peaks

I have following graph
I am using python's scipy.signal.find_peaks to find the peaks. But I am not sure how do I do that. I did following :
per = np.percentile(x,[70])
peaks_control = findPeaks(x, per[0])
where x is the signal
array([1.07541259e+09, 1.13851049e+09, 1.19241492e+09, 1.23706527e+09,
1.27240093e+09, 1.29836131e+09, 1.31217483e+09, 1.32037296e+09,
1.31908858e+09, 1.30896503e+09, 1.29216550e+09, 1.26958042e+09,
1.24561632e+09, 1.21202121e+09, 1.16869371e+09, 1.11054499e+09,
1.04006154e+09, 9.65663403e+08, 8.87706760e+08, 8.09340093e+08,
7.37568765e+08, 6.79736364e+08, 6.38576457e+08, 6.06062937e+08,
5.80650350e+08, 5.55089744e+08, 5.36334499e+08, 5.20236597e+08,
5.06529837e+08, 4.91825175e+08, 4.77937063e+08, 4.65475058e+08,
4.56520513e+08, 4.48393240e+08, 4.41944988e+08, 4.34822844e+08,
4.33688578e+08, 4.33451049e+08, 4.36256177e+08, 4.33553613e+08,
4.29191142e+08, 4.28492541e+08, 4.24465967e+08, 4.20074825e+08,
4.19935897e+08, 4.16652681e+08, 4.12419580e+08, 4.11747552e+08,
4.08801166e+08, 4.02351981e+08, 3.99620513e+08, 3.98716550e+08,
3.46023077e+08, 3.53969464e+08, 4.17131235e+08, 5.19363869e+08,
6.50956410e+08, 8.01530303e+08, 9.50162937e+08, 1.08249790e+09,
1.18242378e+09, 1.22732168e+09, 1.20123077e+09, 1.21067599e+09,
1.21556410e+09, 1.21272261e+09, 1.20310023e+09, 1.18692774e+09,
1.16694033e+09, 1.14330117e+09, 1.11635338e+09, 1.07947529e+09,
1.03222145e+09, 9.73427972e+08, 9.08558974e+08, 8.39966200e+08,
7.70457343e+08, 7.04976224e+08, 6.49436131e+08, 6.02085548e+08,
5.68915385e+08, 5.41638928e+08, 5.18758741e+08, 5.01973660e+08,
4.88766667e+08, 4.77643823e+08, 4.65681818e+08, 4.56193240e+08,
4.46851515e+08, 4.36135198e+08, 4.32282984e+08, 4.27913520e+08,
4.23408625e+08, 4.24119580e+08, 4.22399068e+08, 4.22415385e+08,
4.20193939e+08, 4.17638462e+08, 4.14822378e+08, 4.10636364e+08,
4.08388345e+08, 4.04844522e+08, 4.00571562e+08, 4.00841026e+08,
4.00764802e+08, 4.00432867e+08, 4.00336364e+08, 4.00724709e+08,
4.03048019e+08, 3.57437995e+08, 3.62371096e+08, 4.16658741e+08,
5.10148019e+08, 6.31750117e+08, 7.65175991e+08, 8.96832168e+08,
1.01666597e+09, 1.10373263e+09, 1.14380816e+09, 1.11629790e+09,
1.12228904e+09, 1.12378788e+09, 1.11974825e+09, 1.10812774e+09,
1.09125035e+09, 1.07033566e+09, 1.04667389e+09, 1.02016830e+09,
9.86036830e+08, 9.42176457e+08, 8.88900233e+08, 8.27962005e+08,
7.64362238e+08, 7.00755245e+08, 6.42390909e+08, 5.92395338e+08,
5.52426107e+08, 5.26319114e+08, 5.03317249e+08, 4.85524942e+08,
4.70421911e+08, 4.59389510e+08, 4.51644988e+08, 4.46288578e+08,
4.41076923e+08, 4.37533566e+08, 4.31993007e+08, 4.28625641e+08,
4.25406294e+08, 4.21161538e+08, 4.19049650e+08, 4.16719347e+08,
4.13124242e+08, 4.08404429e+08, 4.06154545e+08, 4.03386014e+08,
4.00980420e+08, 3.99442657e+08, 3.97792075e+08, 3.95606527e+08,
3.97922378e+08, 3.98345221e+08, 3.96253613e+08, 3.95703030e+08,
3.96108392e+08, 3.67136830e+08, 3.58382051e+08, 3.95844289e+08,
4.70853846e+08, 5.76629837e+08, 6.97682284e+08, 8.21169930e+08,
9.32588112e+08, 1.01885804e+09, 1.06315152e+09, 1.05128159e+09,
1.03944545e+09, 1.03769580e+09, 1.03132145e+09, 1.02008601e+09,
1.00327389e+09, 9.85387646e+08, 9.66403030e+08, 9.44620746e+08,
9.18596737e+08, 8.82269697e+08, 8.37750816e+08, 7.84877156e+08,
7.27590443e+08, 6.70183217e+08, 6.14567832e+08, 5.67404895e+08,
5.30862471e+08, 5.03108625e+08, 4.84348718e+08, 4.68116550e+08,
4.55809907e+08, 4.46616783e+08, 4.39725175e+08, 4.34323077e+08])
The peaks I get are adjacent to each other, as I can see that there is little bump in second, third and forth peak sites.
How should I calculate it and ignore such adjacent ones. To calculate the width, prominecne, etc I need to calculate the peaks. If I know it already I might be able to put some threshold.
As you asked in the comments, I'll provide you an example. Please note, this is only an example, an Exploratory Data Analysis is always needed to choose the best way to reach your goal.
So, let's create some noisy data
import numpy as np
from scipy.signal import find_peaks, periodogram
import matplotlib.pyplot as plt
size = 100
a = np.linspace(1, .5, size)
x = np.linspace(0, 50, size)
np.random.seed(0)
y = a * np.sin(x) + np.random.normal(0, .1, size) + 5
now, let's try to find the peaks with find_peaks from scipy.signal
peaks = find_peaks(y)[0]
plt.plot(x, y)
plt.plot(x[peaks], y[peaks], marker='o', ls='none')
plt.show()
as you can see, there are some "wrong" peaks. We need to set distance argument in find_peaks (see documentation).
Let us suppose that we don't know the distance between the peaks. In this case, we can see that the data are periodic. So we can find the period with a periodogram and use the period as distance in find_peaks
_f, _p = periodogram(y, nfft=2**6)
# calculate the sample rate of x
sample_rate = 1 / np.median(np.diff(x))
periods = 1 / _f[1:] / sample_rate
density = _p[1:] / _p[1:].max()
max_density_idx = density.argmax()
period = periods[max_density_idx]
plt.semilogx(periods, density)
plt.scatter(period, density[max_density_idx], color='r')
plt.title(f"period {period:.2f}")
plt.show()
Now we can use the period as distance argument in find_peaks
peaks = find_peaks(y, distance=period)[0]
plt.plot(x, y)
plt.plot(x[peaks], y[peaks], marker='o', ls='none')
plt.show()
update
In your specific case, it's a little bit different.
Define the signal (I'll call variables X and Y)
Y = np.array([1.07541259e+09, 1.13851049e+09, 1.19241492e+09, 1.23706527e+09, 1.27240093e+09, 1.29836131e+09, 1.31217483e+09, 1.32037296e+09, 1.31908858e+09, 1.30896503e+09, 1.29216550e+09, 1.26958042e+09, 1.24561632e+09, 1.21202121e+09, 1.16869371e+09, 1.11054499e+09, 1.04006154e+09, 9.65663403e+08, 8.87706760e+08, 8.09340093e+08, 7.37568765e+08, 6.79736364e+08, 6.38576457e+08, 6.06062937e+08, 5.80650350e+08, 5.55089744e+08, 5.36334499e+08, 5.20236597e+08, 5.06529837e+08, 4.91825175e+08, 4.77937063e+08, 4.65475058e+08, 4.56520513e+08, 4.48393240e+08, 4.41944988e+08, 4.34822844e+08, 4.33688578e+08, 4.33451049e+08, 4.36256177e+08, 4.33553613e+08, 4.29191142e+08, 4.28492541e+08, 4.24465967e+08, 4.20074825e+08, 4.19935897e+08, 4.16652681e+08, 4.12419580e+08, 4.11747552e+08, 4.08801166e+08, 4.02351981e+08, 3.99620513e+08, 3.98716550e+08, 3.46023077e+08, 3.53969464e+08, 4.17131235e+08, 5.19363869e+08, 6.50956410e+08, 8.01530303e+08, 9.50162937e+08, 1.08249790e+09, 1.18242378e+09, 1.22732168e+09, 1.20123077e+09, 1.21067599e+09, 1.21556410e+09, 1.21272261e+09, 1.20310023e+09, 1.18692774e+09, 1.16694033e+09, 1.14330117e+09, 1.11635338e+09, 1.07947529e+09, 1.03222145e+09, 9.73427972e+08, 9.08558974e+08, 8.39966200e+08, 7.70457343e+08, 7.04976224e+08, 6.49436131e+08, 6.02085548e+08, 5.68915385e+08, 5.41638928e+08, 5.18758741e+08, 5.01973660e+08, 4.88766667e+08, 4.77643823e+08, 4.65681818e+08, 4.56193240e+08, 4.46851515e+08, 4.36135198e+08, 4.32282984e+08, 4.27913520e+08, 4.23408625e+08, 4.24119580e+08, 4.22399068e+08, 4.22415385e+08, 4.20193939e+08, 4.17638462e+08, 4.14822378e+08, 4.10636364e+08, 4.08388345e+08, 4.04844522e+08, 4.00571562e+08, 4.00841026e+08, 4.00764802e+08, 4.00432867e+08, 4.00336364e+08, 4.00724709e+08, 4.03048019e+08, 3.57437995e+08, 3.62371096e+08, 4.16658741e+08, 5.10148019e+08, 6.31750117e+08, 7.65175991e+08, 8.96832168e+08, 1.01666597e+09, 1.10373263e+09, 1.14380816e+09, 1.11629790e+09, 1.12228904e+09, 1.12378788e+09, 1.11974825e+09, 1.10812774e+09, 1.09125035e+09, 1.07033566e+09, 1.04667389e+09, 1.02016830e+09, 9.86036830e+08, 9.42176457e+08, 8.88900233e+08, 8.27962005e+08, 7.64362238e+08, 7.00755245e+08, 6.42390909e+08, 5.92395338e+08, 5.52426107e+08, 5.26319114e+08, 5.03317249e+08, 4.85524942e+08, 4.70421911e+08, 4.59389510e+08, 4.51644988e+08, 4.46288578e+08, 4.41076923e+08, 4.37533566e+08, 4.31993007e+08, 4.28625641e+08, 4.25406294e+08, 4.21161538e+08, 4.19049650e+08, 4.16719347e+08, 4.13124242e+08, 4.08404429e+08, 4.06154545e+08, 4.03386014e+08, 4.00980420e+08, 3.99442657e+08, 3.97792075e+08, 3.95606527e+08, 3.97922378e+08, 3.98345221e+08, 3.96253613e+08, 3.95703030e+08, 3.96108392e+08, 3.67136830e+08, 3.58382051e+08, 3.95844289e+08, 4.70853846e+08, 5.76629837e+08, 6.97682284e+08, 8.21169930e+08, 9.32588112e+08, 1.01885804e+09, 1.06315152e+09, 1.05128159e+09, 1.03944545e+09, 1.03769580e+09, 1.03132145e+09, 1.02008601e+09, 1.00327389e+09, 9.85387646e+08, 9.66403030e+08, 9.44620746e+08, 9.18596737e+08, 8.82269697e+08, 8.37750816e+08, 7.84877156e+08, 7.27590443e+08, 6.70183217e+08, 6.14567832e+08, 5.67404895e+08, 5.30862471e+08, 5.03108625e+08, 4.84348718e+08, 4.68116550e+08, 4.55809907e+08, 4.46616783e+08, 4.39725175e+08, 4.34323077e+08])
X = np.arange(Y.size)
Since Y.size is 200 and in your plot there are 200 secs, I assume that the sample rate is 1 sec.
If we search for the peaks with default distance we find a lot of unwanted peaks
peaks = find_peaks(Y)[0]
plt.plot(X, Y)
plt.plot(X[peaks], Y[peaks], marker='o', ls='none')
plt.show()
Let's do a periodogram
_f, _p = periodogram(Y, nfft=2**12)
# the sample rate of your signal
sample_rate = 1
periods = 1 / _f[1:] / sample_rate
density = _p[1:] / _p[1:].max()
max_density_idx = density.argmax()
period = periods[max_density_idx]
p_peaks_idx = find_peaks(density)[0]
plt.semilogx(periods, density)
plt.scatter(period, density[max_density_idx], color='r')
period_peaks = []
for p_peak in p_peaks_idx:
if density[p_peak] < .1:
continue
period_peaks.append(periods[p_peak])
plt.scatter(periods[p_peak], density[p_peak])
plt.text(periods[p_peak], density[p_peak], f"{periods[p_peak]:.1f} ", ha='right', va='center')
plt.title('periodogram')
plt.show()
We found two main periods
period_peaks
[56.888888888888886, 28.444444444444443]
If we use the higher density period (56.9, the fundamental, or 1st harmonic) we miss a peak
peaks = find_peaks(Y, distance=period_peaks[0])[0]
plt.plot(X, Y)
plt.plot(X[peaks], Y[peaks], marker='o', ls='none')
plt.show()
This could be because
you have too few observations
the periodicity is not constant
If we empirically subtract a quantity (say 10) from the period, we find all the peaks
peaks = find_peaks(Y, distance=period_peaks[0] - 10)[0]
plt.plot(X, Y)
plt.plot(X[peaks], Y[peaks], marker='o', ls='none')
plt.show()
So we got peaks at
X[peaks]
array([ 7, 61, 118, 174])
taking the difference, we see they're not regular (at this sample rate and with these few observations)
np.diff(X[peaks])
array([54, 57, 56])

Radial Heatmap from data sheet

I have a file with 3 columns of data: Zenith (Z, from 0 to 90°) and Azimuth (A, from 0 to 360°). And radiance as the color variable.
I need to use python with matplotlib to plot this data into something resembling this:
This is my code so far (it returns an error):
import matplotlib.pyplot as plt
import numpy as np
# `data` has the following shape:
# [
# [Zenith value going from 0 to 90],
# [Azimuth values (0 to 365) increasing by 1 and looping back after 365],
# [radiance: floats that need to be mapped by the color value]
#]
data = [[6.000e+00 1.200e+01 1.700e+01 2.300e+01 2.800e+01 3.400e+01 3.900e+01
4.500e+01 5.000e+01 5.600e+01 6.200e+01 6.700e+01 7.300e+01 7.800e+01
8.400e+01 8.900e+01 3.934e+01 4.004e+01 4.054e+01 4.114e+01 4.154e+01
4.204e+01 4.254e+01 4.294e+01 4.334e+01 4.374e+01 4.414e+01 4.454e+01
4.494e+01 4.534e+01 4.564e+01 4.604e+01 4.644e+01 4.684e+01 4.714e+01
4.754e+01 4.794e+01 4.824e+01 4.864e+01 4.904e+01 4.944e+01 4.984e+01
5.014e+01 5.054e+01 5.094e+01 5.134e+01 5.174e+01 5.214e+01 5.264e+01
5.304e+01 5.344e+01 5.394e+01 5.444e+01 5.494e+01 5.544e+01 5.604e+01
5.674e+01 5.764e+01]
[1.960e+02 3.600e+01 2.360e+02 7.600e+01 2.760e+02 1.160e+02 3.160e+02
1.560e+02 3.560e+02 1.960e+02 3.600e+01 2.360e+02 7.600e+01 2.760e+02
1.160e+02 3.160e+02 6.500e+00 3.400e+00 3.588e+02 2.500e+00 3.594e+02
3.509e+02 5.000e-01 6.900e+00 1.090e+01 3.478e+02 1.250e+01 1.050e+01
7.300e+00 2.700e+00 3.571e+02 3.507e+02 1.060e+01 3.200e+00 3.556e+02
3.480e+02 7.300e+00 3.597e+02 3.527e+02 1.260e+01 6.600e+00 1.200e+00
3.570e+02 3.538e+02 3.520e+02 3.516e+02 3.528e+02 3.560e+02 1.200e+00
8.800e+00 3.567e+02 1.030e+01 6.800e+00 8.300e+00 3.583e+02 3.581e+02
3.568e+02 3.589e+02]
[3.580e-04 6.100e-04 3.220e-04 4.850e-04 4.360e-04 2.910e-04 1.120e-03
2.320e-04 4.300e-03 2.680e-04 1.700e-03 3.790e-04 7.460e-04 8.190e-04
1.030e-03 3.650e-03 3.050e-03 3.240e-03 3.340e-03 3.410e-03 3.490e-03
3.290e-03 3.630e-03 3.510e-03 3.320e-03 3.270e-03 3.280e-03 3.470e-03
3.720e-03 3.960e-03 3.980e-03 3.700e-03 3.630e-03 4.100e-03 4.080e-03
3.600e-03 3.990e-03 4.530e-03 4.040e-03 3.630e-03 4.130e-03 4.370e-03
4.340e-03 4.210e-03 4.100e-03 4.090e-03 4.190e-03 4.380e-03 4.460e-03
4.080e-03 4.420e-03 3.960e-03 4.230e-03 4.120e-03 4.440e-03 4.420e-03
4.370e-03 4.380e-03]]
rad = data[0]
azm = data[1]
# From what I understand, I need to create a meshgrid from the zenith and azimuth values
r, th = np.meshgrid(rad, azm)
z = data[2] # This doesn't work as `pcolormesh` expects this to be a 2d array
plt.subplot(projection="polar")
plt.pcolormesh(th, r, z, shading="auto")
plt.plot(azm, r, color="k", ls="none")
plt.show()
Note: my actual data goes on for 56k lines and looks like this (Ignore the 4th column):
The example data above is my attempt to reduce the resolution of this massive file, so I only used 1/500 of the lines of data. This might be the wrong way to reduce the resolution, please correct me if it is!
Every tutorial I've seen generate the z value from the r array generated by meshgrid. This is leaving me confused about how I would convert my z column into a 2d array that would properly map to the zenith and azimuth values.
They'll use something like this:
z = (r ** 2.0) / 4.0
So, taking the exact shape of r and applying a transformation to create the color.
The solution was in the data file all along. I needed to better understand what np.meshrid actually did. Turns out the data already is a 2d array, it just needed to be reshaped. I also found a flaw in the file, fixing it reduced its lines from 56k to 15k. This was small enough that I did not need to reduce the resolution.
Here's how I reshaped my data, and what the solution looked like:
import matplotlib.pyplot as plt
import numpy as np
with open("data.txt") as f:
lines = np.array(
[
[float(n) for n in line.split("\t")]
for i, line in enumerate(f.read().splitlines())
]
)
data = [np.reshape(a, (89, 180)) for a in lines.T]
rad = np.radians(data[1])
azm = data[0]
z = data[2]
plt.subplot(projection="polar")
plt.pcolormesh(rad, azm, z, cmap="coolwarm", shading="auto")
plt.colorbar()
plt.show()
The simplest way to plot the given data is with a polar scatter plot.
Using blue for low values and red for high values, it could look like:
import matplotlib.pyplot as plt
import numpy as np
data = [[6.000e+00, 1.200e+01, 1.700e+01, 2.300e+01, 2.800e+01, 3.400e+01, 3.900e+01, 4.500e+01, 5.000e+01, 5.600e+01, 6.200e+01, 6.700e+01, 7.300e+01, 7.800e+01, 8.400e+01, 8.900e+01, 3.934e+01, 4.004e+01, 4.054e+01, 4.114e+01, 4.154e+01, 4.204e+01, 4.254e+01, 4.294e+01, 4.334e+01, 4.374e+01, 4.414e+01, 4.454e+01, 4.494e+01, 4.534e+01, 4.564e+01, 4.604e+01, 4.644e+01, 4.684e+01, 4.714e+01, 4.754e+01, 4.794e+01, 4.824e+01, 4.864e+01, 4.904e+01, 4.944e+01, 4.984e+01, 5.014e+01, 5.054e+01, 5.094e+01, 5.134e+01, 5.174e+01, 5.214e+01, 5.264e+01, 5.304e+01, 5.344e+01, 5.394e+01, 5.444e+01, 5.494e+01, 5.544e+01, 5.604e+01, 5.674e+01, 5.764e+01],
[1.960e+02, 3.600e+01, 2.360e+02, 7.600e+01, 2.760e+02, 1.160e+02, 3.160e+02, 1.560e+02, 3.560e+02, 1.960e+02, 3.600e+01, 2.360e+02, 7.600e+01, 2.760e+02, 1.160e+02, 3.160e+02, 6.500e+00, 3.400e+00, 3.588e+02, 2.500e+00, 3.594e+02, 3.509e+02, 5.000e-01, 6.900e+00, 1.090e+01, 3.478e+02, 1.250e+01, 1.050e+01, 7.300e+00, 2.700e+00, 3.571e+02, 3.507e+02, 1.060e+01, 3.200e+00, 3.556e+02, 3.480e+02, 7.300e+00, 3.597e+02, 3.527e+02, 1.260e+01, 6.600e+00, 1.200e+00, 3.570e+02, 3.538e+02, 3.520e+02, 3.516e+02, 3.528e+02, 3.560e+02, 1.200e+00, 8.800e+00, 3.567e+02, 1.030e+01, 6.800e+00, 8.300e+00, 3.583e+02, 3.581e+02, 3.568e+02, 3.589e+02],
[3.580e-04, 6.100e-04, 3.220e-04, 4.850e-04, 4.360e-04, 2.910e-04, 1.120e-03, 2.320e-04, 4.300e-03, 2.680e-04, 1.700e-03, 3.790e-04, 7.460e-04, 8.190e-04, 1.030e-03, 3.650e-03, 3.050e-03, 3.240e-03, 3.340e-03, 3.410e-03, 3.490e-03, 3.290e-03, 3.630e-03, 3.510e-03, 3.320e-03, 3.270e-03, 3.280e-03, 3.470e-03, 3.720e-03, 3.960e-03, 3.980e-03, 3.700e-03, 3.630e-03, 4.100e-03, 4.080e-03, 3.600e-03, 3.990e-03, 4.530e-03, 4.040e-03, 3.630e-03, 4.130e-03, 4.370e-03, 4.340e-03, 4.210e-03, 4.100e-03, 4.090e-03, 4.190e-03, 4.380e-03, 4.460e-03, 4.080e-03, 4.420e-03, 3.960e-03, 4.230e-03, 4.120e-03, 4.440e-03, 4.420e-03, 4.370e-03, 4.380e-03]]
rad = np.radians(data[1])
azm = data[0]
z = data[2]
plt.subplot(projection="polar")
plt.scatter(rad, azm, c=z, cmap='coolwarm')
plt.colorbar()
plt.show()
Creating such a scatter plot with your real data gives an idea how it looks like. You might want to choose a different colormap, depending on what you want to convey. You also can choose a smaller dot size (for example plt.scatter(rad, azm, c=z, cmap='plasma', s=1, ec='none')) if there would be too many points.
A simple way to create a filled image from non-gridded data uses tricontourf with 256 colors (it looks quite dull with the given data, so I didn't add an example plot):
plt.subplot(projection="polar")
plt.tricontourf(rad, azm, z, levels=256, cmap='coolwarm')

Why doesn't "beta.fit" come out right?

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
observed = [0.294, 0.2955, 0.235, 0.2536, 0.2423, 0.2844, 0.2099, 0.2355, 0.2946, 0.3388, 0.2202, 0.2523, 0.2209, 0.2707, 0.1885, 0.2414, 0.2846, 0.328, 0.2265, 0.2563, 0.2345, 0.2845, 0.1787, 0.2392, 0.2777, 0.3076, 0.2108, 0.2477, 0.234, 0.2696, 0.1839, 0.2344, 0.2872, 0.3224, 0.2152, 0.2593, 0.2295, 0.2702, 0.1876, 0.2331, 0.2809, 0.3316, 0.2099, 0.2814, 0.2174, 0.2516, 0.2029, 0.2282, 0.2697, 0.3424, 0.2259, 0.2626, 0.2187, 0.2502, 0.2161, 0.2194, 0.2628, 0.3296, 0.2323, 0.2557, 0.2215, 0.2383, 0.2166, 0.2315, 0.2757, 0.3163, 0.2311, 0.2479, 0.2199, 0.2418, 0.1938, 0.2394, 0.2718, 0.3297, 0.2346, 0.2523, 0.2262, 0.2481, 0.2118, 0.241, 0.271, 0.3525, 0.2323, 0.2513, 0.2313, 0.2476, 0.232, 0.2295, 0.2645, 0.3386, 0.2334, 0.2631, 0.226, 0.2603, 0.2334, 0.2375, 0.2744, 0.3491, 0.2052, 0.2473, 0.228, 0.2448, 0.2189, 0.2149]
a, b, loc, scale = stats.beta.fit(observed,floc=0,fscale=1)
ax = plt.subplot(111)
ax.hist(observed, alpha=0.75, color='green', bins=104, density=True)
ax.plot(np.linspace(0, 1, 100), stats.beta.pdf(np.linspace(0, 1, 100), a, b))
plt.show()
The α and β is out of whack (α=6.056697373013153,β=409078.57804704335)
The fitting image is also unreasonable. Histograms and beta distributions differ in height on the Y-axis.
The data of average is about 0.25, but calculated according to the expected value of beta distribution, 6.05/(6.05+409078.57)=1.47891162469e-05.This seems counterintuitive.
I think you are messing up a bit the code with whatever your observation is.
The main point to consider is that your beta fit will have both a and b, as well as loc and scale.
If you perform your fit using fixed loc/scale, i.e. scipy.stats.beta.fit(observed, floc=0, fscale=1), then your fitted a and b are: a = 33.26401059422594 and b = 99.0180817184922.
On the other hand, if you perform your fit with variable loc and scale, i.e. scipy.stats.beta.fit(observed), then you must compute / consider scipy.stats.beta.pdf() to include also those as parameter, which are, with your data, a = 6.056697380819225, b = 409078.5780469263, loc = 0.15710752697400227, scale = 6373.831662619217.
According to its documentation, the probability density above is defined in the “standardized” form. To shift and/or scale the distribution use the loc and scale parameters. Specifically, beta.pdf(x, a, b, loc, scale) is identically equivalent to beta.pdf(y, a, b) / scale with y = (x - loc) / scale.
Hence, the theoretical mean/average should be modified accordingly to include the scaling and location transformations.

Scipy interpolate.splprep error "Invalid Inputs"

I am trying to interpolate a curve to a set of (x,y) points using SciPy's interpolate.splprep method, using the procedure followed in this StackOverflow answer. My code (with the data) is given below. Please excuse me for using this large dataset, as the code works perfectly fine on a different dataset. Kindly scroll to the bottom to see the implemetation.
#!/usr/bin/env python3
import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate
# -----------------------------------------------------------------------------
# Data
xp=np.array([ -1.19824526e-01, -1.19795807e-01, -1.22298912e-01,
-1.24784611e-01, -1.27233423e-01, -1.27048456e-01,
-1.29424259e-01, -1.31781573e-01, -1.34102825e-01,
-1.36386619e-01, -1.41324999e-01, -1.43569618e-01,
-1.48471481e-01, -1.53300646e-01, -1.55387133e-01,
-1.57436481e-01, -1.53938796e-01, -1.58562951e-01,
-1.53139517e-01, -1.50456275e-01, -1.49637920e-01,
-1.48774455e-01, -1.47843528e-01, -1.44278335e-01,
-1.43299274e-01, -1.39716798e-01, -1.36111285e-01,
-1.32534352e-01, -1.28982866e-01, -1.25433151e-01,
-1.21912263e-01, -1.16106245e-01, -1.12701128e-01,
-1.09303316e-01, -1.05947571e-01, -1.00467194e-01,
-9.72083398e-02, -9.39822094e-02, -9.08033710e-02,
-8.96420533e-02, -8.65053261e-02, -8.34162875e-02,
-8.03788778e-02, -7.73929193e-02, -7.62032638e-02,
-7.32655732e-02, -7.03760465e-02, -6.91826390e-02,
-6.63378816e-02, -6.35537275e-02, -6.08302060e-02,
-5.96426925e-02, -5.69864087e-02, -5.43931715e-02,
-5.18641746e-02, -4.93958173e-02, -4.82415854e-02,
-4.58486281e-02, -4.35196817e-02, -4.01162919e-02,
-3.79466513e-02, -3.48161871e-02, -3.18596693e-02,
-2.90650417e-02, -2.64251761e-02, -2.31429101e-02,
-1.94312163e-02, -1.73997964e-02, -1.55068323e-02,
-1.43163160e-02, -1.31800087e-02, -1.20987991e-02,
-1.10708190e-02, -1.05380016e-02, -9.58116017e-03,
-9.06399242e-03, -8.54450012e-03, -7.67847396e-03,
-7.17608354e-03, -6.67181154e-03, -5.89474349e-03,
-5.40878144e-03, -4.92121197e-03, -4.43202070e-03,
-3.94148294e-03, -3.44986011e-03, -2.82410814e-03,
-2.35269319e-03, -1.88058008e-03, -1.47393691e-03,
-9.78376399e-04, -4.82633521e-04, 1.33099164e-05,
5.09212801e-04, 1.05098855e-03, 1.56929991e-03,
2.08706303e-03, 2.72055571e-03, 3.26012954e-03,
3.79870854e-03, 4.33573131e-03, 4.87172652e-03,
5.40640816e-03, 5.93914581e-03, 6.47004490e-03,
6.99921852e-03, 7.52610639e-03, 7.70592714e-03,
8.20559501e-03, 8.70268809e-03, 9.19766855e-03,
9.68963219e-03, 1.01781695e-02, 1.01960805e-02,
1.06577199e-02, 1.11156340e-02, 1.15703286e-02,
1.20215921e-02, 1.24693015e-02, 1.29129042e-02,
1.33526781e-02, 1.37884367e-02, 1.42204360e-02,
1.46473802e-02, 1.50699789e-02, 1.54884533e-02,
1.59020551e-02, 1.63103362e-02, 8.12110387e-02,
7.80794051e-02, 1.67140103e-02, 8.31537241e-02,
7.99472912e-02, 7.99472912e-02, 7.67983984e-02,
1.71128723e-02, 8.50656342e-02, 8.17851028e-02,
7.85638577e-02, 7.53861405e-02, 1.75061328e-02,
8.19411806e-02, 7.38391281e-02, 1.78939640e-02,
8.70866930e-02, 8.36940292e-02, 8.03586974e-02,
7.70534244e-02, 7.70534244e-02, 7.38013540e-02,
7.38013540e-02, 7.06147796e-02, 1.82766038e-02,
8.54279559e-02, 8.20231372e-02, 7.53294330e-02,
7.20765174e-02, 1.86539411e-02, 8.36524496e-02,
7.85095832e-02, 7.51592888e-02, 7.18792721e-02,
1.90250409e-02, 7.82997201e-02, 7.49183992e-02,
7.49183992e-02, 7.16144248e-02, 7.16144248e-02,
6.83771846e-02, 1.93904576e-02, 7.46192919e-02,
7.12865685e-02, 7.12865685e-02, 6.80175748e-02,
1.97501330e-02, 7.42568965e-02, 7.08996495e-02,
7.08996495e-02, 6.75887344e-02, 2.01042729e-02,
7.38173451e-02, 6.70923613e-02, 2.13903228e-02,
7.50479910e-02, 6.82108239e-02, 5.69753762e-02,
5.24303656e-02, 5.24303656e-02, 4.52683211e-02,
4.52683211e-02, 4.25493203e-02, 2.17470907e-02,
7.45062992e-02, 6.76173090e-02, 6.76173090e-02,
6.42925100e-02, 6.42925100e-02, 5.94649095e-02,
5.94649095e-02, 3.92303424e-02, 2.20977481e-02,
7.21341379e-02, 3.72338037e-02, 2.24415025e-02,
7.14448972e-02, 3.40025442e-02, 2.27777176e-02,
7.07064856e-02, 3.57533680e-02, 2.41421550e-02,
6.81719132e-02, 3.62534788e-02, 2.44798556e-02,
6.56110398e-02, 3.80586628e-02, 3.29287629e-02,
2.93070471e-02, 2.48093588e-02, 6.13326924e-02,
3.85518913e-02, 3.46206958e-02, 2.85091877e-02,
2.51312268e-02, 5.38330011e-02, 3.76841669e-02,
3.50540735e-02, 2.77018960e-02, 2.65615352e-02,
5.28838088e-02, 3.81396763e-02, 3.54777506e-02,
2.80364970e-02, 2.68822682e-02, 5.03377702e-02,
3.85814254e-02, 3.58887890e-02, 4.93316503e-02,
4.04098395e-02, 3.62892096e-02, 4.67615526e-02,
4.22828625e-02, 3.80435955e-02, 3.84376145e-02,
4.02332775e-02, 4.06156847e-02, 4.24553741e-02,
4.43352031e-02, 4.47040511e-02, 4.66233682e-02,
4.69790035e-02, 4.89341212e-02, 5.09256192e-02,
5.12584867e-02, 5.32790231e-02, 5.35890744e-02,
5.38831411e-02, 5.41625645e-02, 5.44267004e-02,
5.46700348e-02, 5.48984863e-02, 5.51117932e-02,
5.53082440e-02, 5.54849716e-02, 5.56464539e-02,
5.57928396e-02, 5.59201893e-02, 5.60294455e-02,
5.61233441e-02, 5.62020138e-02, 5.62604489e-02,
5.63017253e-02, 5.63275468e-02, 5.63341408e-02,
5.63226424e-02, 5.62957310e-02, 5.62533699e-02,
5.61937444e-02, 5.61140110e-02, 5.60191106e-02,
5.59087917e-02, 5.57801898e-02, 5.56328560e-02,
5.54704141e-02, 5.70775198e-02, 5.68728844e-02,
5.66515897e-02, 5.64149230e-02, 5.61622287e-02,
5.76630266e-02, 5.73643873e-02, 5.70502787e-02,
5.67190716e-02, 5.63668473e-02, 5.59997391e-02,
5.73489998e-02, 5.69355151e-02, 5.65029189e-02,
5.77751241e-02, 5.72977910e-02, 5.67990710e-02,
5.79863269e-02, 5.74393835e-02, 5.68773454e-02,
5.62926261e-02, 5.56922722e-02, 5.50771272e-02,
5.44454686e-02, 5.37935810e-02, 5.31273003e-02,
5.24468411e-02, 5.17483760e-02, 5.10330229e-02,
5.03036776e-02, 4.95607328e-02, 4.87997085e-02,
4.80238054e-02, 4.72347342e-02, 4.64331616e-02,
4.56132865e-02, 4.47805574e-02, 4.39358955e-02,
4.30782240e-02, 4.22044750e-02, 4.01052073e-02,
3.92354976e-02, 3.83523540e-02, 3.74567873e-02,
3.65508593e-02, 3.45751478e-02, 3.36740998e-02,
3.27625023e-02, 3.18417381e-02, 3.09129121e-02,
2.90665673e-02, 2.81454989e-02, 2.72171846e-02,
2.62807950e-02, 2.53342284e-02, 2.43816409e-02,
2.34221736e-02, 2.24541496e-02, 2.08179757e-02,
1.98678098e-02, 1.89113740e-02, 1.79488243e-02,
1.69806146e-02, 1.65158032e-02, 1.55075714e-02,
1.44932106e-02, 1.34746855e-02, 1.24525920e-02,
1.14268067e-02, 1.03968750e-02, 9.36414487e-03,
8.58823755e-03, 7.51804527e-03, 6.44485601e-03,
5.37002690e-03, 4.29398700e-03, 3.31511044e-03,
2.20302298e-03, 1.09069996e-03, -2.27320426e-05,
-1.16892664e-03, -2.31490869e-03, -3.46060569e-03,
-4.74178052e-03, -5.91852523e-03, -7.09360822e-03,
-8.26683115e-03, -9.43736653e-03, -1.06042682e-02,
-1.17686419e-02, -1.33107457e-02, -1.45010352e-02,
-1.56869180e-02, -1.68693838e-02, -1.80464175e-02,
-1.97732638e-02, -2.09722818e-02, -2.21650612e-02,
-2.40185758e-02, -2.52303300e-02, -2.71803154e-02,
-2.84115598e-02, -3.04489552e-02, -3.16936647e-02,
-3.29299358e-02, -3.50861051e-02, -3.63332401e-02,
-3.85745058e-02, -3.98348648e-02, -4.21660006e-02,
-4.34302610e-02, -4.46836493e-02, -4.59254575e-02,
-4.71530952e-02, -4.96209305e-02, -4.95594200e-02,
-5.07435074e-02, -5.19101301e-02, -5.16977894e-02,
-5.14280802e-02, -5.11057669e-02, -5.07251169e-02,
-5.16985297e-02, -5.12126585e-02, -5.06852098e-02,
-5.15589749e-02, -5.09397027e-02, -5.17615499e-02,
-5.10672514e-02, -5.18313966e-02, -5.25816754e-02,
-5.33179227e-02, -5.40360028e-02, -5.47358953e-02,
-5.54213064e-02, -5.77400978e-02, -5.84092053e-02,
-5.90603644e-02, -6.14284845e-02, -6.38379284e-02,
-6.62872262e-02, -6.69166162e-02, -6.93865431e-02,
-7.18947674e-02, -7.44284962e-02, -7.69969804e-02,
-7.96063191e-02, -8.01834105e-02, -8.28053535e-02,
-8.54623715e-02, -8.59961071e-02, -8.86660185e-02,
-8.91520913e-02, -9.18335218e-02, -9.45402708e-02,
-9.49610563e-02, -9.76401856e-02, -1.00332460e-01,
-1.03032191e-01, -1.03358935e-01, -1.06040606e-01,
-1.06322470e-01, -1.08984284e-01, -1.09195131e-01,
-1.11833426e-01, -1.11994247e-01, -1.14596404e-01,
-1.17192554e-01, -1.17248317e-01])
yp = np.array([ -3.90948536e-05, -2.12984775e-03, -4.31095583e-03,
-6.58019633e-03, -8.93758156e-03, -1.11568100e-02,
-1.36444162e-02, -1.62222092e-02, -1.88895170e-02,
-2.16446498e-02, -2.49629308e-02, -2.79508857e-02,
-3.16029501e-02, -3.54376380e-02, -3.87881494e-02,
-4.22310942e-02, -4.41873802e-02, -4.85246067e-02,
-4.68663315e-02, -4.60459599e-02, -4.86676408e-02,
-5.12750434e-02, -5.38586293e-02, -5.54310799e-02,
-5.79452426e-02, -5.93547929e-02, -6.06497762e-02,
-6.18505946e-02, -6.29584706e-02, -6.39609234e-02,
-6.48713094e-02, -6.44090476e-02, -6.51181556e-02,
-6.57260659e-02, -6.62541381e-02, -6.52943568e-02,
-6.56184758e-02, -6.58578685e-02, -6.60229010e-02,
-6.76012689e-02, -6.76366183e-02, -6.76004442e-02,
-6.74972483e-02, -6.73282385e-02, -6.86657097e-02,
-6.83738036e-02, -6.80140059e-02, -6.92366190e-02,
-6.87491258e-02, -6.82071471e-02, -6.76134579e-02,
-6.86669494e-02, -6.79695621e-02, -6.72259327e-02,
-6.64391135e-02, -6.56069234e-02, -6.64563885e-02,
-6.55361171e-02, -6.45783892e-02, -6.18312378e-02,
-6.07850085e-02, -5.80009440e-02, -5.52383021e-02,
-5.24888121e-02, -4.97523554e-02, -4.54714570e-02,
-3.98863362e-02, -3.73592876e-02, -3.48720213e-02,
-3.37707235e-02, -3.26655171e-02, -3.15625118e-02,
-3.04616664e-02, -3.06508019e-02, -2.95344258e-02,
-2.96968330e-02, -2.98505905e-02, -2.87101259e-02,
-2.88391064e-02, -2.89597166e-02, -2.77967360e-02,
-2.78958771e-02, -2.79854740e-02, -2.80670276e-02,
-2.81405467e-02, -2.82051366e-02, -2.69913041e-02,
-2.70365186e-02, -2.70739448e-02, -2.83768113e-02,
-2.83979671e-02, -2.84108899e-02, -2.84155794e-02,
-2.84104617e-02, -2.96993141e-02, -2.96767995e-02,
-2.96453017e-02, -3.09305120e-02, -3.08782748e-02,
-3.08172540e-02, -3.07460634e-02, -3.06652277e-02,
-3.05756546e-02, -3.04773301e-02, -3.03684498e-02,
-3.02505329e-02, -3.01240628e-02, -2.87032761e-02,
-2.85638294e-02, -2.84161924e-02, -2.82602014e-02,
-2.80957411e-02, -2.79220043e-02, -2.65224371e-02,
-2.63408455e-02, -2.61506690e-02, -2.59523304e-02,
-2.57465736e-02, -2.55333569e-02, -2.53114227e-02,
-2.50819674e-02, -2.48453976e-02, -2.46014650e-02,
-2.43490672e-02, -2.40896946e-02, -2.38232320e-02,
-2.35495727e-02, -2.32681400e-02, -1.11708561e-01,
-1.07398522e-01, -2.29799277e-02, -1.10281290e-01,
-1.06025945e-01, -1.06025945e-01, -1.01847844e-01,
-2.26850806e-02, -1.08812919e-01, -1.04614895e-01,
-1.00492396e-01, -9.64256156e-02, -2.23830803e-02,
-1.01124594e-01, -9.11212826e-02, -2.20738630e-02,
-1.03723227e-01, -9.96804013e-02, -9.57062055e-02,
-9.17682599e-02, -9.17682599e-02, -8.78935733e-02,
-8.78935733e-02, -8.40962884e-02, -2.17583603e-02,
-9.82127298e-02, -9.42965108e-02, -8.65980524e-02,
-8.28570139e-02, -2.14365508e-02, -9.28460674e-02,
-8.71354106e-02, -8.34157663e-02, -7.97743543e-02,
-2.11075333e-02, -8.39100274e-02, -8.02849723e-02,
-8.02849723e-02, -7.67428202e-02, -7.67428202e-02,
-7.32724167e-02, -2.07721464e-02, -7.72159766e-02,
-7.37663681e-02, -7.37663681e-02, -7.03828404e-02,
-2.04308432e-02, -7.42042591e-02, -7.08482147e-02,
-7.08482147e-02, -6.75385453e-02, -2.00834820e-02,
-7.12338454e-02, -6.47417418e-02, -2.06352744e-02,
-6.99333169e-02, -6.35600774e-02, -5.30876202e-02,
-4.88515872e-02, -4.88515872e-02, -4.21763073e-02,
-4.21763073e-02, -3.96425097e-02, -2.02588101e-02,
-6.70368116e-02, -6.08364913e-02, -6.08364913e-02,
-5.78440553e-02, -5.78440553e-02, -5.34994049e-02,
-5.34994049e-02, -3.52908904e-02, -1.98763502e-02,
-6.26583213e-02, -3.23368135e-02, -1.94880238e-02,
-5.99037138e-02, -2.85040222e-02, -1.90931928e-02,
-5.72132575e-02, -2.89247783e-02, -1.95297821e-02,
-5.32198482e-02, -2.82971986e-02, -1.91058177e-02,
-4.94013681e-02, -2.86515116e-02, -2.47888430e-02,
-2.20618305e-02, -1.86758942e-02, -4.45232330e-02,
-2.79827472e-02, -2.51286391e-02, -2.06919011e-02,
-1.82397645e-02, -3.76607947e-02, -2.63609122e-02,
-2.45208701e-02, -1.93767971e-02, -1.85788804e-02,
-3.56379420e-02, -2.56998805e-02, -2.39058698e-02,
-1.88908564e-02, -1.81130913e-02, -3.26595065e-02,
-2.50304222e-02, -2.32829732e-02, -3.07966353e-02,
-2.52257065e-02, -2.26527986e-02, -2.80693713e-02,
-2.53799880e-02, -2.28350066e-02, -2.21686432e-02,
-2.22782703e-02, -2.15723084e-02, -2.16081542e-02,
-2.15998200e-02, -2.08220272e-02, -2.07341864e-02,
-1.99180705e-02, -1.97463091e-02, -1.95241512e-02,
-1.86330762e-02, -1.83210810e-02, -1.73881714e-02,
-1.64501676e-02, -1.55073488e-02, -1.45603397e-02,
-1.36076891e-02, -1.26514336e-02, -1.16918550e-02,
-1.07281971e-02, -9.76103257e-03, -8.79150351e-03,
-7.81935696e-03, -6.84417527e-03, -5.86703766e-03,
-4.88857954e-03, -3.90851347e-03, -2.92690669e-03,
-1.94445885e-03, -9.62077293e-04, 2.10973681e-05,
1.00443470e-03, 1.98670872e-03, 2.96920518e-03,
3.95065293e-03, 4.93054490e-03, 5.90896238e-03,
6.88594418e-03, 7.86095305e-03, 8.83291761e-03,
9.80227952e-03, 1.11168744e-02, 1.21109612e-02,
1.31014370e-02, 1.40884671e-02, 1.50714343e-02,
1.65579859e-02, 1.75619959e-02, 1.85609524e-02,
1.95539892e-02, 2.05406220e-02, 2.15208623e-02,
2.31958067e-02, 2.41936890e-02, 2.51825785e-02,
2.69676402e-02, 2.79735240e-02, 2.89676199e-02,
3.08600313e-02, 3.18685118e-02, 3.28673845e-02,
3.38531747e-02, 3.48305552e-02, 3.57981735e-02,
3.67545357e-02, 3.76978426e-02, 3.86308181e-02,
3.95533112e-02, 4.04626970e-02, 4.13583593e-02,
4.22429533e-02, 4.31163338e-02, 4.39732984e-02,
4.48174616e-02, 4.56497573e-02, 4.64690781e-02,
4.72699006e-02, 4.80584575e-02, 4.88339015e-02,
4.95941309e-02, 5.03364921e-02, 4.95646923e-02,
5.02584615e-02, 5.09357803e-02, 5.15956682e-02,
5.22416815e-02, 5.13017754e-02, 5.18954788e-02,
5.24741267e-02, 5.30389590e-02, 5.35886852e-02,
5.24828002e-02, 5.29815950e-02, 5.34658214e-02,
5.39333680e-02, 5.43819816e-02, 5.48154596e-02,
5.52339801e-02, 5.56342267e-02, 5.42908141e-02,
5.46458325e-02, 5.49864909e-02, 5.53070690e-02,
5.56106186e-02, 5.76746395e-02, 5.79561804e-02,
5.82151857e-02, 5.84584712e-02, 5.86858866e-02,
5.88950787e-02, 5.90831689e-02, 5.92552324e-02,
6.12619766e-02, 6.14026109e-02, 6.15224608e-02,
6.16256880e-02, 6.17123394e-02, 6.36720486e-02,
6.37186812e-02, 6.37481408e-02, 6.56861133e-02,
6.56722393e-02, 6.56407664e-02, 6.55917721e-02,
6.74743714e-02, 6.73786194e-02, 6.72646677e-02,
6.71325153e-02, 6.69773741e-02, 6.68003322e-02,
6.66053297e-02, 6.83473413e-02, 6.81027202e-02,
6.78376164e-02, 6.75542903e-02, 6.72524243e-02,
6.88634515e-02, 6.85066915e-02, 6.81318613e-02,
6.96716731e-02, 6.92387957e-02, 7.07292209e-02,
7.02467261e-02, 7.16584645e-02, 7.11134550e-02,
7.05499862e-02, 7.18681370e-02, 7.12419450e-02,
7.24822144e-02, 7.17989089e-02, 7.29694293e-02,
7.22180480e-02, 7.14476517e-02, 7.06588875e-02,
6.98486903e-02, 7.08078412e-02, 6.81567317e-02,
6.72843393e-02, 6.63881936e-02, 6.37899997e-02,
6.12404950e-02, 5.87433383e-02, 5.62902969e-02,
5.53950962e-02, 5.29895052e-02, 5.06437557e-02,
4.97490264e-02, 4.74631181e-02, 4.65678359e-02,
4.43551128e-02, 4.34554011e-02, 4.25440351e-02,
4.16213883e-02, 4.06842153e-02, 3.97338457e-02,
3.87727819e-02, 3.89122376e-02, 3.78978623e-02,
3.68719281e-02, 3.68766567e-02, 3.68230044e-02,
3.67095055e-02, 3.55465346e-02, 3.53200609e-02,
3.50311849e-02, 3.46717730e-02, 3.42461153e-02,
3.37555022e-02, 3.23610029e-02, 3.17505933e-02,
3.10701527e-02, 2.95754797e-02, 2.87735213e-02,
2.72210019e-02, 2.62970023e-02, 2.52956273e-02,
2.36404433e-02, 2.25053642e-02, 2.12889860e-02,
1.99902757e-02, 1.81872330e-02, 1.67574555e-02,
1.49054892e-02, 1.33429656e-02, 1.14391250e-02,
9.74643800e-03, 7.79267351e-03, 5.96714375e-03,
4.05355227e-03, 2.00672241e-03])
# -----------------------------------------------------------------------------
# Use scipy to interpolate.
xp = np.r_[xp, xp[0]]
yp = np.r_[yp, yp[0]]
tck, u = interpolate.splprep([xp, yp], s=0, k=1, per=True)
xi, yi = interpolate.splev(np.linspace(0, 1, 1000), tck)
# -----------------------------------------------------------------------------
# Plot result
fig = plt.figure()
ax = plt.subplot(111)
ax.plot(xp, yp, '.', markersize=2)
ax.plot(xi, yi, alpha=0.5)
plt.show()
I get the following error on one machine (MacOS),
---> tck, u = interpolate.splprep([xp, yp], s=0, k=1, per=True)
SystemError: <built-in function _parcur> returned NULL without setting an error
And this error on another machine (Ubuntu),
----> tck, u = interpolate.splprep([xp, yp], s=0, k=1, per=True)
ValueError: Invalid inputs.
interpolate.splprep uses the FORTRAN parcur routine from FITPACK (from the documentation).
My questions are -
Why does the code work for different datasets? e.g. xp = np.array([0.1, 0.2, 0.3, 0.4]) yp = np.array([-0.1, -0.3, -0.4, 0.2]) and not for this particular one? What does the error mean?
How can I get this to work? (Using this method or any other method) i.e. either interpolate a curve or filter the outliers ...
Out of curiosity, why is the error machine (and OS) dependent?
This is how the data looks when plotted, I think you can guess which curve I'd like to interpolate to (and which outliers I'd like to remove, if possible)
Fitpack has a fit if it two consecutive inputs are identical. The error happens deep enough that it depends on how the libraries were compiled and linked, hence the assortment of errors.
For example, xp[147:149], yp[147:149] (and several others):
(array([ 0.07705342, 0.07705342]), array([-0.09176826, -0.09176826]))
These are okay:
okay = np.where(np.abs(np.diff(xp)) + np.abs(np.diff(yp)) > 0)
xp = np.r_[xp[okay], xp[-1], xp[0]]
yp = np.r_[yp[okay], yp[-1], yp[0]]
# the rest of your code
I add the last point back because the output of diff is always one element shorter, so the last one needs to be included manually. (And then of course, you put the 0th point again for periodicity)
Cutting off the weird part
This is my attempt to cut off the weird extruding part of the dataset. It uses a Gaussian filter from ndimage. The original points xp, yp are kept this time; the filtered ones are xn, yn.
jump = np.sqrt(np.diff(xp)**2 + np.diff(yp)**2)
smooth_jump = ndimage.gaussian_filter1d(jump, 5, mode='wrap') # window of size 5 is arbitrary
limit = 2*np.median(smooth_jump) # factor 2 is arbitrary
xn, yn = xp[:-1], yp[:-1]
xn = xn[(jump > 0) & (smooth_jump < limit)]
yn = yn[(jump > 0) & (smooth_jump < limit)]
So, we remove not only duplicate points but also the points where the values jump around too much. The rest goes as before, interpolation is built out of xn, yn now. I plot original points for comparison with the new (red) curve):
ax.plot(xp, yp, 'o', markersize=2)
ax.plot(xi, yi, 'r', alpha=0.5)

Python boxplot showing means and confidence intervals

How can I create a boxplot like the one below, in Python? I want to depict means and confidence bounds only (rather than proportions of IQRs, as in matplotlib boxplot).
I don't have any version constraints, and if your answer has some package dependency that's OK too. Thanks!
Use errorbar instead. Here is a minimal example:
import matplotlib.pyplot as plt
x = [2, 4, 3]
y = [1, 3, 5]
errors = [0.5, 0.25, 0.75]
plt.figure()
plt.errorbar(x, y, xerr=errors, fmt = 'o', color = 'k')
plt.yticks((0, 1, 3, 5, 6), ('', 'x3', 'x2', 'x1',''))
Note that boxplot is not the right approach; the conf_intervals parameter only controls the placement of the notches on the boxes (and we don't want boxes anyway, let alone notched boxes). There is no way to customize the whiskers except as a function of IQR.
Thanks to America, I propose a way to automatize this kind of graph a little bit.
Below an example of code generating 20 arrays from a normal distribution with mean=0.25 and std=0.1.
I used the formula W = t * s / sqrt(n), to calculate the margin of error of the confidence interval, with t the constant from the t distribution (see scipy.stats.t), s the standard deviation and n the number of values in an array.
list_samples=list() # making a list of arrays
for i in range(20):
list.append(np.random.normal(loc=0.25, scale=0.1, size=20))
def W_array(array, conf=0.95): # function that returns W based on the array provided
t = stats.t(df = len(array) - 1).ppf((1 + conf) /2)
W = t * np.std(array, ddof=1) / np.sqrt(len(array))
return W # the error
W_list = list()
mean_list = list()
for i in range(len(list_samples)):
W_list.append(W_array(list_samples[i])) # makes a list of W for each array
mean_list.append(np.mean(list_samples[i])) # same for the means to plot
plt.errorbar(x=mean_list, y=range(len(list_samples)), xerr=W_list, fmt='o', color='k')
plt.axvline(.25, ls='--') # this is only to demonstrate that 95%
# of the 95% CI contain the actual mean
plt.yticks([])
plt.show();

Categories