Given an undirected NetworkX Graph graph, I want to check if it is scale free.
To do this, as I understand, I need to find the degree k of each node, and the frequency of that degree P(k) within the entire network. This should represent a power law curve due to the relationship between the frequency of degrees and the degrees themselves.
Plotting my calculations for P(k) and k displays a power curve as expected, but when I double log it, a straight line is not plotted.
The following plots were obtained with a 1000 nodes.
Code as follows:
k = []
Pk = []
for node in list(graph.nodes()):
degree = graph.degree(nbunch=node)
try:
pos = k.index(degree)
except ValueError as e:
k.append(degree)
Pk.append(1)
else:
Pk[pos] += 1
# get a double log representation
for i in range(len(k)):
logk.append(math.log10(k[i]))
logPk.append(math.log10(Pk[i]))
order = np.argsort(logk)
logk_array = np.array(logk)[order]
logPk_array = np.array(logPk)[order]
plt.plot(logk_array, logPk_array, ".")
m, c = np.polyfit(logk_array, logPk_array, 1)
plt.plot(logk_array, m*logk_array + c, "-")
The m is supposed to represent the scaling coefficient, and if it's between 2 and 3 then the network ought to be scale free.
The graphs are obtained by calling the NetworkX's scale_free_graph method, and then using that as input for the Graph constructor.
Update
As per request from #Joel, below are the plots for 10000 nodes.
Additionally, the exact code that generates the graph is as follows:
graph = networkx.Graph(networkx.scale_free_graph(num_of_nodes))
As we can see, a significant amount of the values do seem to form a straight-line, but the network seems to have a strange tail in its double log form.
Have you tried powerlaw module in python?
It's pretty straightforward.
First, create a degree distribution variable from your network:
degree_sequence = sorted([d for n, d in G.degree()], reverse=True) # used for degree distribution and powerlaw test
Then fit the data to powerlaw and other distributions:
import powerlaw # Power laws are probability distributions with the form:p(x)∝x−α
fit = powerlaw.Fit(degree_sequence)
Take into account that powerlaw automatically find the optimal alpha value of xmin by creating a power law fit starting from each unique value in the dataset, then selecting the one that results in the minimal Kolmogorov-Smirnov distance,D, between the data and the fit. If you want to include all your data, you can define xmin value as follow:
fit = powerlaw.Fit(degree_sequence, xmin=1)
Then you can plot:
fig2 = fit.plot_pdf(color='b', linewidth=2)
fit.power_law.plot_pdf(color='g', linestyle='--', ax=fig2)
which will produce an output like this:
powerlaw fit
On the other hand, it may not be a powerlaw distribution but any other distribution like loglinear, etc, you can also check powerlaw.distribution_compare:
R, p = fit.distribution_compare('power_law', 'exponential', normalized_ratio=True)
print (R, p)
where R is the likelihood ratio between the two candidate distributions. This number will be positive if the data is more likely in the first distribution, but you should also check p < 0.05
Finally, once you have chosen a xmin for your distribution you can plot a comparisson between some usual degree distributions for social networks:
plt.figure(figsize=(10, 6))
fit.distribution_compare('power_law', 'lognormal')
fig4 = fit.plot_ccdf(linewidth=3, color='black')
fit.power_law.plot_ccdf(ax=fig4, color='r', linestyle='--') #powerlaw
fit.lognormal.plot_ccdf(ax=fig4, color='g', linestyle='--') #lognormal
fit.stretched_exponential.plot_ccdf(ax=fig4, color='b', linestyle='--') #stretched_exponential
lognornal vs powerlaw vs stretched exponential
Finally, take into account that powerlaw distributions in networks are being under discussion now, strongly scale-free networks seem to be empirically rare
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6399239/
Part of your problem is that you aren't including the missing degrees in fitting your line. There are a small number of large degree nodes, which you're including in your line, but you're ignoring the fact that many of the large degrees don't exist. Your largest degrees are somewhere in the 1000-2000 range, but there are only 2 observations. So really, for such large values, I'm expecting that the probability a random node has such a large degree 2/(1000*N) (or really, it's probably even less than that). But in your fit, you're treating them as if the probability of those two specific degrees is 2/N, and you're ignoring the other degrees.
The simple fix is to only use the smaller degrees in your fit.
The more robust way is to fit the complementary cumulative distribution. Instead of plotting P(K=k), plot P(K>=k) and try to fit that (noting that if the probability that P(K=k) is a powerlaw, then the probability that P(K>=k) is also, but with a different exponent - check it).
Trying to fit a line to these points is wrong, as the points are not linearly distributed over the x-axis. The fitting function of line will give more importance to the portion of the domain that contain more points.
You should redistribute the observations over the x-axis using function np.interp, like this.
logk_interp = np.linspace(np.min(logk_array),np.max(logk_array),1000)
logPk_interp = np.interp(logk_interp, logk_array, logPk_array)
plt.plot(logk_array, logPk_array,".")
m, c = np.polyfit(logk_interp, logPk_interp, 1)
plt.plot(logk_interp, m*logk_interp + c, "-")
Related
I'm doing a curve fit in python using scipy.curve_fit, and the fit itself looks great, however the parameters that are generated don't make sense.
The equation is (ax)^b + cx, but with the params python finds a = -c and b = 1, so the whole equation just equals 0 for every value of x.
here is the plot and my code.
(https://i.stack.imgur.com/fBfg7.png)](https://i.stack.imgur.com/fBfg7.png)
# experimental data
xdata = cfu_u
ydata = OD_u
# x-values to plot for curve fit
min_cfu = 0.1
max_cfu = 9.1
x_vec = pow(10,np.arange(min_cfu,max_cfu,0.1))
# exponential function
def func(x,a, b, c):
return (a*x)**b + c*x
# curve fit
popt, pcov = curve_fit(func, xdata, ydata)
# plot experimental data and fitted curve
plt.plot(x_vec, func(x_vec, *popt), label = 'curve fit',color='slateblue',linewidth = 2.2)
plt.plot(cfu_u,OD_u,'-',label = 'experimental data',marker='.',markersize=8,color='deepskyblue',linewidth = 1.4)
plt.legend(loc='upper left',fontsize=12)
plt.ylabel("Y",fontsize=12)
plt.xlabel("X",fontsize=12)
plt.xscale("log")
plt.gcf().set_size_inches(7, 5)
plt.show()
print(popt)
[ 1.44930871e+03 1.00000000e+00 -1.44930871e+03]
How can I find the actual parameters?
edit: here is the actual experimental raw data I used: https://pastebin.com/CR2BCJji
The chosen function model is :
y(x)=(ax)^b+cx
In order to understand the difficulty encountered one have first to compare the behaviour of the function to the data on the range of the lowest values of X.
We see that y(x)=0 is an acceptable fitting for the points on a large range (at least 6 decades ) considering the scatter. They are the majority of the experimental points (18 points among 27). The function y(x)=0 is obtained from the function model only if b=1 leading to y(x)=(a+c)x and with a+c=0. At first sight python seems to give : b=1 and c=-a. But we have to look more carefully.
Of course the fonction y(x)=0 is not convenient for the 9 points at larger X.
This draw to think that the fitting of the whole set of points is an extension of the above fitting with values of the parameters different from b=1 and a+c=0 but not far in order to continue to have a good fitting on the above 18 points.
Conclusion : The actual values of the parameters found by python are certainly very close to b=1 and a close to 1.44930871e+03 and b close to -1.44930871e+03
The calculus inside python is certainly carried out with 16 or 18 digits. But the display is with 9 digits only. This is not sufficient to see that b might be different from 1 and that c might be different from -a. This suggests that the clue might be only a matter of display with enough digits.
Yes, the fitting by python looks great. This is a fine performance on the mathematical viewpoint. But the physical signifiance is doubtful with so many digits essential to the fitting on the whole range.
I have a data frame containing ~900 rows; I'm trying to plot KDEplots for some of the columns. In some columns, a majority of the values are the same, minimum value. When I include too many of the minimum values, the KDEPlot abruptly stops showing the minimums. For example, the following includes 600 values, of which 450 are the minimum, and the plot looks fine:
y = df.sort_values(by='col1', ascending=False)['col1'].values[:600]
sb.kdeplot(y)
But including 451 of the minimum values gives a very different output:
y = df.sort_values(by='col1', ascending=False)['col1'].values[:601]
sb.kdeplot(y)
Eventually I would like to plot bivariate KDEPlots of different columns against each other, but I'd like to understand this first.
The problem is the default algorithm that is chosen for the "bandwidth" of the kde. The default method is 'scott', which isn't very helpful when there are many equal values.
The bandwidth is the width of the gaussians that are positioned at every sample point and summed up. Lower bandwidths are closer to the data, higher bandwidths smooth everything out. The sweet spot is somewhere in the middle. In this case bw=0.3 could be a good option. In order to compare different kde's it is recommended to each time choose exactly the same bandwidth.
Here is some sample code to show the difference between bw='scott' and bw=0.3. The example data are 150 values from a standard normal distribution together with either 400, 450 or 500 fixed values.
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns; sns.set()
fig, axs = plt.subplots(nrows=2, ncols=3, figsize=(10,5), gridspec_kw={'hspace':0.3})
for i, bw in enumerate(['scott', 0.3]):
for j, num_same in enumerate([400, 450, 500]):
y = np.concatenate([np.random.normal(0, 1, 150), np.repeat(-3, num_same)])
sns.kdeplot(y, bw=bw, ax=axs[i, j])
axs[i, j].set_title(f'bw:{bw}; fixed values:{num_same}')
plt.show()
The third plot gives a warning that the kde can not be drawn using Scott's suggested bandwidth.
PS: As mentioned by #mwascom in the comments, in this case scipy.statsmodels.nonparametric.kde is used (not scipy.stats.gaussian_kde). There the default is "scott" - 1.059 * A * nobs ** (-1/5.), where A is min(std(X),IQR/1.34). The min() clarifies the abrupt change in behavior. IQR is the "interquartile range", the difference between the 75th and 25th percentiles.
Edit: Since Seaborn 0.11, the statsmodel backend has been dropped, so kde's are only calculated via scipy.stats.gaussian_kde.
If the sample has repeated values, this implies that the underlying distribution is not continuous. In the data that you show to illustrate the issue, we can see a Dirac distribution on the left. The kernel smoothing might be applied for such data, but with care. Indeed, to approximate such data, we might use a kernel smoothing where the bandwidth associated to the Dirac is zero. However, in most KDE methods, there is only one single bandwidth for all kernel atoms. Moreover, the various rules used to compute the bandwidth are based on some estimation of the rugosity of the second derivative of the PDF of the distribution. This cannot be applied to a discontinuous distribution.
We can, however, try to separate the sample into two sub-samples:
the sub-sample(s) with replications,
the sub-sample with unique realizations.
(This idea has already been mentionned by johanc).
Below is an attempt to perform this classification. The np.unique method is used to count the occurences of the replicated realizations. The replicated values are associated with Diracs and the weight in the mixture is estimated from the fraction of these replicated values in the sample. The remaining realizations, uniques, are then used to estimate the continuous distribution with KDE.
The following function will be useful in order to overcome a limitation with the current implementation of the draw method of Mixtures with OpenTURNS.
def DrawMixtureWithDiracs(distribution):
"""Draw a distributions which has Diracs.
https://github.com/openturns/openturns/issues/1489"""
graph = distribution.drawPDF()
graph.setLegends(["Mixture"])
for atom in distribution.getDistributionCollection():
if atom.getName() == "Dirac":
curve = atom.drawPDF()
curve.setLegends(["Dirac"])
graph.add(curve)
return graph
The following script creates a use-case with a Mixture containing a Dirac and a gaussian distributions.
import openturns as ot
import numpy as np
distribution = ot.Mixture([ot.Dirac(-3.0),
ot.Normal()], [0.5, 0.5])
DrawMixtureWithDiracs(distribution)
This is the result.
Then we create a sample.
sample = distribution.getSample(100)
This is where your problem begins. We count the number of occurences of each realizations.
array = np.array(sample)
unique, index, count = np.unique(array, axis=0, return_index=True,
return_counts=True)
For all realizations, replicated values are associated with Diracs and unique values are put in a separate list.
sampleSize = sample.getSize()
listOfDiracs = []
listOfWeights = []
uniqueValues = []
for i in range(len(unique)):
if count[i] == 1:
uniqueValues.append(unique[i][0])
else:
atom = ot.Dirac(unique[i])
listOfDiracs.append(atom)
w = count[i] / sampleSize
print("New Dirac =", unique[i], " with weight =", w)
listOfWeights.append(w)
The weight of the continuous atom is the complementary of the sum of the weights of the Diracs. This way, the sum of the weights will be equal to 1.
complementaryWeight = 1.0 - sum(listOfWeights)
weights = list(listOfWeights)
weights.append(complementaryWeight)
The easy part comes: the unique realizations can be used to fit a kernel smoothing. The KDE is then added to the list of atoms.
sampleUniques = ot.Sample(uniqueValues, 1)
factory = ot.KernelSmoothing()
kde = factory.build(sampleUniques)
atoms = list(listOfDiracs)
atoms.append(kde)
Et voilà: the Mixture is ready.
mixture_estimated = ot.Mixture(atoms, weights)
The following script compares the initial Mixture and the estimated one.
graph = DrawMixtureWithDiracs(distribution)
graph.setColors(["dodgerblue3", "dodgerblue3"])
curve = DrawMixtureWithDiracs(mixture_estimated)
curve.setColors(["darkorange1", "darkorange1"])
curve.setLegends(["Est. Mixture", "Est. Dirac"])
graph.add(curve)
graph
The figure seems satisfactory, since the continuous distribution is estimated from a sub-sample which size is only equal to 50, i.e. one half of the full sample.
I have data from distinct curves, and want to fit each of them individually. However, the data is mixed into a single array, so first I believe I need a way to separate the data.
I know that each of the individual curves is under the family A/x+B. As of now I cut out each of the curves by hand and curve fit, but would like to automate this process, have the computer separate these curves a fit them. I attempted to use machine learning, but didn't know where to start, what packages to use. I am using python, but can also use C++, in fact I hope to transfer it to C++ by the end. Where do you think I should start, is it worth it to use unsupervised machine learning, or is there a better way to separate the data?
The expected curves:
An example of the data
Well, you sure do have an interesting problem.
I see that there are curves with Y-axis values that are considerably larger than the rest of them. I would simply take the first N-values with the largest Y-axis values and then fit them to an exponential decay curve (or that other curve you mention). You can then simply take the points that most fit that curve and then leave the other points alone.
Except...
This is a terrible way to extrapolate data. Doing this, you are cherry-picking the data you want. This is falsifying information and is very bad.
Your best bet is to create a single curve that all points fit too if you cannot isolate all of those points into separate curves with external information.
But...
We do know some information: a valid function must have only 1 output given a single input.
If the X-Axis is discreet, this means you can create a lookup table of Outputs given the input. This allows you to count how many curves there are associated with the specific X-value (which could be a time unit). In other words, you have to have external information to separate points locally. You can then reorder the points in increasing Y-value, and now you have your separate curves defined in discrete points.
Basically, this is an unsolvable problem in the general sense, but in your specific application, there might be extra rules that further define the domain and range such that you can do data filtering.
One more thing...
I am making these statements with the assumption that the (X,Y) values are floats that cannot maintain accuracy after some mathematical operations.
If you are using things like unum numbers, you might be able to keep enough information in the decimal such that your fitting functions can differentiate between points without extra filtering.
This case is more of a hope than anything, as adopting a new number representation to get more accuracy to isolate sampled points is a stretch at best.
Just for completeness, there are some mathematical libraries that might help you.
Boost.uBLAS
Eigen
LAPACK++
Hopefully, I have given you enough information to allow you to solve your problem.
I extracted data from the plot for analysis. Here is example code that loads, separates, fits and plots the three data sets. It works when the separate data files are appended into a single text file.
import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
##########################################################
# data load and separation section
datafilename = 'temp.dat'
textdata = open(datafilename, 'rt').read()
xLists = [[], [], []]
yLists = [[], [], []]
previousY = 0.0 # initialize
whichList = -1 # initialize
datalines = textdata.split('\n')
for line in datalines:
if not line: # allow for blank lines in data file
continue
spl = line.split()
x = float(spl[0])
y = float(spl[1])
if y > previousY + 50.0: # this separator must be greater than max noise
whichList += 1
previousY = y
xLists[whichList].append(x)
yLists[whichList].append(y)
##########################################################
# curve fitting section
def func(x, a, b):
return a / x + b
parameterLists = []
for curveIndex in range(len(xLists)):
# these are the same as the scipy defaults
initialParameters = numpy.array([1.0, 1.0])
xData = numpy.array(xLists[curveIndex], dtype=float)
yData = numpy.array(yLists[curveIndex], dtype=float)
# curve fit the test data
fittedParameters, pcov = curve_fit(func, xData, yData, initialParameters)
parameterLists.append(fittedParameters)
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
for curveIndex in range(len(xLists)):
# first the raw data as a scatter plot
axes.plot(xLists[curveIndex], yLists[curveIndex], 'D')
# create data for each fitted equation plot
xModel = numpy.linspace(min(xLists[curveIndex]), max(xLists[curveIndex]))
yModel = func(xModel, *parameterLists[curveIndex])
# now the model as a line plot
axes.plot(xModel, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)
The idea:
create N naive, easy to calculate, sufficiently precise(for clustering), approximations. Then "classify" each data-point to the closest such approximation.
This is done like this:
The approximations are analytical approximations using these two equations I derived:
where (x1,y1) and (x2,y2) are coordinates of two points on the curve.
To get these two points I assumed that (1) the first points(according to the x-axis) are distributed equally between the different real curves. And (2) the 2 first points of each real curve, are smaller or bigger than the 2 first points of each other real curve. Thus sorting them and dividing into N groups will successfully cluster the first *2*N* points. If these assumptions are false you can still manually classify the 2 first points of each real curve and the rest will be classified automatically (this is actually the first approach I implemented).
Then cluster rest of the points to each point's closest approximation. Closest meaning with the smallest error.
Edit: A stronger approach for the initial approximation could be by calculating A and B for a couple of pairs of points and using their mean A and B as the approximation. And maybe even possibly doing K-means on these points/approximations.
The Code:
import numpy as np
import matplotlib.pyplot as plt
# You should probably edit this variable
NUM_OF_CURVES = 4
# <data> should be a 1-D array containing the Y values of the series
# <x_of_data> should be a 1-D array containing the corresponding X values of the series
data, x_of_data = np.loadtxt('...')
# clustering of first 2*num_of_curves points
# I started at NUM_OF_CURVES instead of 0 because my xs started at 0.
# The range (0:NUM_OF_CURVES*2) will probably be better for you.
raw_data = data[NUM_OF_CURVES:NUM_OF_CURVES*3]
raw_xs = x_of_data[NUM_OF_CURVES:NUM_OF_CURVES*3]
sort_ind = np.argsort(raw_data)
Y = raw_data[sort_ind].reshape(NUM_OF_CURVES,-1).T
X = raw_xs[sort_ind].reshape(NUM_OF_CURVES,-1).T
# approximation of A and B for each curve
A = ((Y[0]*Y[1])*(X[0]-X[1]))/(Y[1]-Y[0])
B = (A / Y[0]) - X[0]
# creating approximating curves
f = []
for i in range(NUM_OF_CURVES):
f.append(A[i]/(x_of_data+B[i]))
curves = np.vstack(f)
# clustering the points to the approximating curves
raw_clusters = [[] for _ in range(NUM_OF_CURVES)]
for i in range(len(data)):
raw_clusters[np.abs(curves[:,i]-data[i]).argmin()].append((x_of_data[i],data[i]))
# changing the clusters to np.arrays of the shape (2,-1)
# where row 0 contains the X coordinates and row 1 the Y coordinates
clusters = []
for i in range(len(raw_clusters)):
clusters.append(np.array(list(zip(*raw_clusters[i]))))
Example:
raw series:
separated series:
I a have two sets of data of which I want to find a correlation. Although there is quite some scattering of data there's obvious a relation. I currently use numpy polyfit (8th order) but there is some "wiggling" of the line (especially at the beginning and the end) which is not appropriate. Secondly I don't think the fit is very well at the beginning of the line (the curve should be slightly steeper.
How can I get a best fit "spline" through these data points?
My current code:
# fit regression line
regressionLineOrder = 8
regressionLine = np.polyfit(data['x'], data['y'], regressionLineOrder)
p = np.poly1d(regressionLine)
Take a look at #MatthewDrury's answer for Why use regularisation in polynomial regression instead of lowering the degree?. It's simply fantastic and spot on. The most interesting bit comes in at the end when he starts talking about using a natural cubic spline to fit a regression in place of a regularized polynomial of degree 10. You could use the implementation of scipy.interpolate.CubicSpline to accomplish something very similar. There are a ton of classes for other spline methods contained in scipy.interpolate for similar methods.
Here is a simple example:
from scipy.interpolate import CubicSpline
cs = CubicSpline(data['x'], data['y'])
x_range = np.arange(x_min, x_max, some_step)
plt.plot(x_range, cs(x_range), label='Cubic Spline')
There are some possible issues with your data set... from your plot of n (x,y) points, they are linked with straight lines; if you display points instead, should see the points density along your domain, and it's not evenly distributed as the lines are not. Let's say your domain is [xmin,xmax], an 8th order polynom is good for interpolation, but it wiggles because of the high order and also because the point density is oddly distributed. Polynoms are not good for extrapolation, once there are no control points outside your domain. You could fix that with a spline, a cubic natural spline will control the derivative at xmin and xmax, but to do that, you should sort your dataset (x axis) and take a subsample of the n points with rolling average as control points to the spline algoritm. If your problem has an analytical solution (a gaussian variogram, for instance, looks like your points distribution), just try optimizing the parameters (range and sill, for the gaussian variogram, for instance) to minimize error inside the domain and follow the assintotes outside.
I'm trying to fit a sine wave curve this data distribution, but for some reason, the fit is incorrect:
import matplotlib.pyplot as plt
import numpy as np
import scipy as sp
from scipy.optimize import curve_fit
#=======================
#====== Analysis =======
#=======================
# sine curve fit
def fit_Sin(t, A, b, C):
return A* np.sin(t*b) + C
## The Data extraciton
t,y,y1 = np.loadtxt("new10_CoCore_5to20_BL.txt", unpack=True)
xdata = t
popt, pcov = curve_fit(fit_Sin, t, y)
print "A = %s , b = %s, C = %s" % (popt[0], popt[1], popt[2])
#=======================
#====== Plotting =======
#=======================
fig1 = plt.figure()
ax1 = fig1.add_subplot(111)
ax1.plot(t, y, ".")
ax1.plot(t, fit_Sin(t, *popt))
plt.show()
In which this fit makes an extreme underestimation of the data. Any ideas why that is?
Here is the data provided here: https://www.dropbox.com/sh/72jnpkkk0jf3sjg/AAAb17JSPbqhQOWnI68xK7sMa?dl=0
Any idea why this is producing this?
Sine waves are extremely difficult to fit if your frequency guess is off. That is because with a sufficient number of cycles in the data, the guess will be out of phase with half the data and in phase with half of it for even a small error in the frequency. At that point, a straight line offers a better fit than a sine wave of different frequency. That is how Fourier transforms work by the way.
I can think of three ways to estimate the frequency well enough to allow a non linear least squares algorithm to take over:
Eyeball it. Subtract the x-values of two peaks in the GUI or even in the command line. If you have very low noise data, you can automate this process quite easily.
Use a Discrete Fourier transform. If your data is a sine wave of one component, the first non-constant peak will give you the frequency. I have found this to require some additional tweaking since the frequency of the sampling is often not a multiple of the sine wave frequency. A parabolic fit to the three points around the peak (three including the peak) can help in this situation.
Find where your data crosses the vertical offset. This is similar to #1 but is easier to automate for relatively non-noisy data. The wavelength is twice the distance between a pair of intersections.
Using #1, I can clearly see that your wavelength is 50. The initial guess for b should therefore be 2*np.pi/50. Also, don't forget to add a phase shift parameter to allow the fit to slide horizontally: A*sin(b*t + d) + C.
You will need to pass in an initial guess via the p0 parameter to curve_fit. A good eyeball estimate is p0=(0.55, np.pi/25, 0.0, -np.pi/25*12.5). The phase shift in your data appears to be a quarter period to the right, hence the 12.5.
I am currently in the process of writing an algorithm for fitting noisy sine waves with a single frequency component that I will submit to SciPy. Will update when I finish.