I am trying to fit this x data: [0.4,0.165,0.165,0.585,0.585], this y data: [.45, .22, .63, .22, .63], and this z data: [1, 0.99, 0.98,0.97,0.96] to a paraboloid. I am using scipy's curve_fit tool. Here is my code:
doex = [0.4,0.165,0.165,0.585,0.585]
doey = [.45, .22, .63, .22, .63]
doez = np.array([1, .99, .98,.97,.96])
def paraBolEqn(data,a,b,c,d):
if b < .16 or b > .58 or c < .22 or c >.63:
return 1e6
else:
return ((data[0,:]-b)**2/(a**2)+(data[1,:]-c)**2/(a**2))
data = np.vstack((doex,doey))
zdata = doez
opt.curve_fit(paraBolEqn,data,zdata)
I am trying to center the paraboloid between .16 and .58 (x axis) and between .22 and .63 (y axis). I am doing this by returning a large value if b or c are outside of this range.
Unfortunately the fit is wayyy off and my popt values are all 1, and my pcov is inf.
Any help would be great.
Thank you
Rather than forcing high return values for out-of range regions you need to provide a good initial guess. In addition, the mode lacks an offset parameter and the paraboloid has the wrong sign. Change the model to:
def paraBolEqn(data,a,b,c,d):
x,y = data
return -(((x-b)/a)**2+((y-d)/c)**2)+1.0
I fixed the offset to 1.0 because if it were added as fit parameter the system would be underdetermined (fewer or equal number of data points than fit parameters).
Call curve_fit with an initial guess like this:
popt,pcov=opt.curve_fit(paraBolEqn,np.vstack((doex,doey)),doez,p0=[1.5,0.4,1.5,0.4])
This yields:
[ 1.68293045 0.31074135 2.38822062 0.36205424]
and a nice nice match to the data:
Related
I found an article which is about epipolar geometry.
I calculated the fundamental matrix. Now Iam trying to find the line on which a corresponding point lays as described in the article:
I calculated the line which is in homogeneous coordinates. How could I plot this line into the picture like in the example? I thought about transforming the line from homogeneous to inhomogeneous coordinates. I think this can be achieved by dividing x and y by z
For example, homogeneous:
x=0.0295
y=0.9996
z=-265.1531
to inhomogeneous:
x=0.0295/-265.1531
y=0.9996/-265.1531
so:
x=-0.0001112564778612809
y=0.0037698974667842843
Those numbers seem wrong to me, because theyre so small. Is this the correct approach?
How could I plot my result into an image?
the x, y and z you have are the parameters of the "Epipolar Lines" equation that appear under the "line in the image" formula in the slides, but labelled a, b and c respectively, i.e:
au + bv + c = 0
solutions to this are points on the line. e.g. in Python I'd define a as some points on the picture's x-axis, and solve for b:
import numpy as np
F = np.array([
[-0.00310695, -0.0025646, 2.96584],
[-0.028094, -0.00771621, 56.3813],
[13.1905, -29.2007, -9999.79],
])
p_l = np.array([
[343.53],
[221.70],
[ 1.0],
])
lt = F # p_l
# if you want to normalise
lt /= np.sqrt(sum(lt[:2] ** 2))
# should give your values [0.0295, 0.9996, -265.2]
print(lt)
a, b, c = lt.ravel()
x = np.array([0, 400])
y = -(x*a + c) / b
and then just draw a line between these points
I have a Python list containing continuous values (from 0 to 1020) that I'd like to descritize in ordinal values from 0 to 5 using K-Means strategy.
I have used the new class sklearn.preprocessing.KBinsDiscretizer to perform that:
def descritise_kmeans(python_arr, num_bins):
X = np.array(python_arr).reshape(-1, 1)
est = KBinsDiscretizer(n_bins=num_bins, encode='ordinal', strategy='kmeans')
est.fit(X)
Xt = est.transform(X)
return Xt
When running this method, I got error:
/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/preprocessing/_discretization.py in transform(self, X)
262 atol = 1.e-8
263 eps = atol + rtol * np.abs(Xt[:, jj])
--> 264 Xt[:, jj] = np.digitize(Xt[:, jj] + eps, bin_edges[jj][1:])
265 np.clip(Xt, 0, self.n_bins_ - 1, out=Xt)
266
ValueError: bins must be monotonically increasing or decreasing
When looking closely at this, seems like numpy.descritize method is the one that throws the error. This seems to be a bug of Sklearn library.
When number of bins n_bins is 6, the error is thrown. However, when n_bins is 5, it works.
I faced a similar problem and I find my mistake in setting values for the bins. My code is simple
bins = np.array([0.0, .33, 66, 1])
data = [0.1, .2, .4, .5, .7, 8]
inds = np.digitize(data, bins, right=False)
I missed a dot before .66 and my bins were not monotonic. While it may not be the source of the problem in this question, I hope it helps someone.
Makeshift solution:
Edit sklearns sourcecode with this transform function: sklearn/preprocessing/_discretization.py
It is at line 237 as of version '0.20.2'
def transform(self, X):
"""Discretizes the data.
Parameters
----------
X : numeric array-like, shape (n_samples, n_features)
Data to be discretized.
Returns
-------
Xt : numeric array-like or sparse matrix
Data in the binned space.
"""
check_is_fitted(self, ["bin_edges_"])
Xt = check_array(X, copy=True, dtype=FLOAT_DTYPES)
n_features = self.n_bins_.shape[0]
if Xt.shape[1] != n_features:
raise ValueError("Incorrect number of features. Expecting {}, "
"received {}.".format(n_features, Xt.shape[1]))
def ensure_monotic_increase(array):
"""
add small noise to the bin_edges[i]
when bin_edges[i] !> bin_edges[i-1]
"""
noise_overlay = np.zeros(array.shape)
for i in range(1,len(array)):
bigger = array[i]>array[i-1]
if bigger:
pass
else:
noise_overlay[i] = abs(array[i-1] * 0.0001)
return(array+noise_overlay)
bin_edges = self.bin_edges_
for jj in range(Xt.shape[1]):
# Values which are close to a bin edge are susceptible to numeric
# instability. Add eps to X so these values are binned correctly
# with respect to their decimal truncation. See documentation of
# numpy.isclose for an explanation of ``rtol`` and ``atol``.
rtol = 1.e-5
atol = 1.e-8
eps = atol + rtol * np.abs(Xt[:, jj])
old_bin_edges = bin_edges[jj][1:]
try:
Xt[:, jj] = np.digitize(Xt[:, jj] + eps, old_bin_edges)
except ValueError:
new_bin_edges = ensure_monotic_increase(old_bin_edges)
#print(old_bin_edges)
#print(new_bin_edges)
try:
Xt[:, jj] = np.digitize(Xt[:, jj] + eps, new_bin_edges)
except:
raise
np.clip(Xt, 0, self.n_bins_ - 1, out=Xt)
if self.encode == 'ordinal':
return Xt
return self._encoder.transform(Xt)
The issue (that I encountered)
The bin edges were too close to each other. Possibly, by some kind of floating point error, the prior bin edge ends up larger than the next bin edge.
When printing the edges, (uncomment the print statements in the above function), the first 2 bin edges were observably equal to each other. The printed bin_edges were:
[-0.1025641 -0.1025641 0.82793522] # ValueError
[-0.1025641 -0.10255385 0.82793522] # After fix
[0.2075 0.2075 0.88798077] # ValueError
[0.2075 0.20752075 0.88798077] # After fix
[ 0.7899066 0.7899066 24.31967669] # ValueError
[ 0.7899066 0.78998559 24.31967669] # After fix
[5.47545572e-18 5.47545572e-18 2.36842105e-01] # ValueError
[5.47545572e-18 5.47600326e-18 2.36842105e-01] # After fix
[5.47545572e-18 5.47545572e-18 2.82894737e-01] # ValueError
[5.47545572e-18 5.47600326e-18 2.82894737e-01] # After fix
[-0.46762302 -0.46762302 -0.00969465] # ValueError
[-0.46762302 -0.46757626 -0.00969465] # After fix
I have an (x, y) signal with non-uniform sample rate in x. (The sample rate is roughly proportional to 1/x). I attempted to uniformly re-sample it using scipy.signal's resample function. From what I understand from the documentation, I could pass it the following arguments:
scipy.resample(array_of_y_values, number_of_sample_points, array_of_x_values)
and it would return the array of
[[resampled_y_values],[new_sample_points]]
I'd expect it to return an uniformly sampled data with a roughly identical form of the original, with the same minimal and maximalx value. But it doesn't:
# nu_data = [[x1, x2, ..., xn], [y1, y2, ..., yn]]
# with x values in ascending order
length = len(nu_data[0])
resampled = sg.resample(nu_data[1], length, nu_data[0])
uniform_data = np.array([resampled[1], resampled[0]])
plt.plot(nu_data[0], nu_data[1], uniform_data[0], uniform_data[1])
plt.show()
blue: nu_data, orange: uniform_data
It doesn't look unaltered, and the x scale have been resized too. If I try to fix the range: construct the desired uniform x values myself and use them instead, the distortion remains:
length = len(nu_data[0])
resampled = sg.resample(nu_data[1], length, nu_data[0])
delta = (nu_data[0,-1] - nu_data[0,0]) / length
new_samplepoints = np.arange(nu_data[0,0], nu_data[0,-1], delta)
uniform_data = np.array([new_samplepoints, resampled[0]])
plt.plot(nu_data[0], nu_data[1], uniform_data[0], uniform_data[1])
plt.show()
What is the proper way to re-sample my data uniformly, if not this?
Please look at this rough solution:
import matplotlib.pyplot as plt
from scipy import interpolate
import numpy as np
x = np.array([0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1, 2, 5, 10, 20])
y = np.exp(-x/3.0)
flinear = interpolate.interp1d(x, y)
fcubic = interpolate.interp1d(x, y, kind='cubic')
xnew = np.arange(0.001, 20, 1)
ylinear = flinear(xnew)
ycubic = fcubic(xnew)
plt.plot(x, y, 'X', xnew, ylinear, 'x', xnew, ycubic, 'o')
plt.show()
That is a bit updated example from scipy page. If you execute it, you should see something like this:
Blue crosses are initial function, your signal with non uniform sampling distribution. And there are two results - orange x - representing linear interpolation, and green dots - cubic interpolation. Question is which option you prefer? Personally I don't like both of them, that is why I usually took 4 points and interpolate between them, then another points... to have cubic interpolation without that strange ups. That is much more work, and also I can't see doing it with scipy, so it will be slow. That is why I've asked about size of the data.
I am trying to fit gaussian to a spectrum and the y values are on the order of 10^(-19). Curve_fit gives me poor fitting result, both before and after I multiply my whole data by 10^(-19). Attached is my code, it is fairly simple set of data except that the values are very small. If I want to keep my original values, how would I get a reasonable gaussian fit that would give me the correct parameters?
#get fits data
aaa=pyfits.getdata('p1.cal.fits')
aaa=np.matrix(aaa)
nrow=np.shape(aaa)[0]
ncol=np.shape(aaa)[1]
ylo=79
yhi=90
xlo=0
xhi=1023
glo=430
ghi=470
#sum all the rows to get spectrum
ysum=[]
for x in range(xlo,xhi):
sum=np.sum(aaa[ylo:yhi,x])
ysum.append(sum)
wavelen_pix=range(xhi-xlo)
max=np.max(ysum)
print "maximum is at x=", np.where(ysum==max)
##fit gaussian
#fit only part of my data in the chosen range [glo:ghi]
x=wavelen_pix[glo:ghi]
y=ysum[glo:ghi]
def func(x, a, x0, sigma):
return a*np.exp(-(x-x0)**2/float((2*sigma**2)))
sig=np.std(ysum[500:1000]) #std of background noise
popt, pcov = curve_fit(func, x, sig)
print popt
#this gives me [1.,1.,1.], which is obviously wrong
gaus=func(x,popt[0],popt[1],popt[2])
aaa is a 153 by 1024 image matrix, partly looks like this:
matrix([[ -8.99793629e-20, 8.57133275e-21, 4.83523386e-20, ...,
-1.54811004e-20, 5.22941515e-20, 1.71179195e-20],
[ 2.75769318e-20, 1.03177243e-20, -3.19634928e-21, ...,
1.66583803e-20, -9.88712568e-22, -2.56897725e-20],
[ 2.88121935e-20, 8.57964252e-21, -2.60784327e-20, ...,
1.72335180e-20, -7.61189937e-21, -3.45333075e-20],
...,
[ 1.04006903e-20, 1.61200683e-20, 7.04195205e-20, ...,
1.72459645e-20, 4.29404029e-20, 1.99889374e-20],
[ 3.22315752e-21, -5.61394194e-21, 3.28763096e-20, ...,
1.99063583e-20, 2.12989880e-20, -1.23250648e-21],
[ 3.66591810e-20, -8.08647455e-22, -6.22773168e-20, ...,
-4.06145681e-21, 4.92453132e-21, 4.23689309e-20]], dtype=float32)
You are calling curve_fit incorrectly, here is the usage
curve_fit(f, xdata, ydata, p0=None, sigma=None, absolute_sigma=False, check_finite=True, **kw)
f is your function whose first arg is an array of independent variables, and whose subsequent args are the function parameters (such as amplitude, center, etc)
xdata are the independent variables
ydata are the dependedent variable
p0 is an initial guess at the function parameters (for Guassian this is amplitude, width, center)
By default p0 is set to a list of ones [1,1,...], which is probably why you get that as a result, the fit just never executed because you called it incorrectly.
Try estimating the amplitude, center, and width from the data, then make a p0 object (see below for details)
init_guess = ( a_i, x0_i, sig_i) # same order as they are supplied to your function
popt, pcov = curve_fit(func, xdata=x,ydata=y,p0=init_guess)
Here is a short example
xdata = np.linspace(0, 4, 50)
mygauss = ( 10,2,0.5) #( amp, center, width)
y = func(xdata, *mygauss ) # using your func defined above
ydata = y + 2*(np.random.random(50)- 0.5) # add some noise to create fake data
Now I can guess the fit params
ai = np.max( ydata) # guess the amplitude
xi = xdata[ np.argmax( ydata)] # guess the position of center
Guessing the width is tricky, I would first find where the half max is located (there are two, but you only need to find one, as the Gaussian is symmetric):
pos_half = argmin( np.abs( ydata-ao/2 ) ) # subtract half the amplitude and find the minimum
Now evaluate how far this is from the center of the gaussian (xi) :
sig_i = np.abs( xi - xdata[ pos_half] ) # estimate the width
Now you can make make the initial guess
init_guess = (ai, xi sig_i)
and fit
params, variance = curve_fit( func, xdata=xdata, ydata=ydata, p0=init_guess)
print params
#array([ 9.99457443, 2.01992858, 0.49599629])
which is very close to mygauss. Hope it helps.
Forget about rescaling, or making linear changes, or using the p0 parameter, which usually don't work! Try using the bounds parameter in the curve_fit for n parameters like this:
a0=np.array([a01,...,a0n])
af=np.array([af1,...,afn])
method="trf",bounds=(a0,af)
Hope it works!
;)
I have a 3D array which has a time-series of air-sea carbon flux for each grid point on the earth's surface (model output). I want to remove the trend (linear) in the time series. I came across this code:
from matplotlib import mlab
for x in xrange(40):
for y in xrange(182):
cflux_detrended[:, x, y] = mlab.detrend_linear(cflux[:, x, y])
Can I speed this up by not using for loops?
Scipy has a lot of signal processing tools.
Using scipy.signal.detrend() will remove the linear trend along an axis of the data. From the documentation it looks like the linear trend of the complete data set will be subtracted from the time-series at each grid point.
import scipy.signal
cflux_detrended = scipy.signal.detrend(cflux, axis=0)
Using scipy.signal will get the same result as using the method in the original post. Using Josef's detrend_separate() function will also return the same result.
Here are two versions using numpy.linalg.lstsq. This version uses np.vander to create any polynomial trend.
Warning: not tested except on the example.
I think something like this will be added to scikits.statsmodels, which doesn't have yet a multivariate version for detrending either. For the common trend case, we could use scikits.statsmodels OLS and we would also get all the result statistics for the estimation.
# -*- coding: utf-8 -*-
"""Detrending multivariate array
Created on Fri Dec 02 15:08:42 2011
Author: Josef Perktold
http://stackoverflow.com/questions/8355197/detrending-a-time-series-of-a-multi-dimensional-array-without-the-for-loops
I should also add the multivariate version to statsmodels
"""
import numpy as np
import matplotlib.pyplot as plt
def detrend_common(y, order=1):
'''detrend multivariate series by common trend
Paramters
---------
y : ndarray
data, can be 1d or nd. if ndim is greater then 1, then observations
are along zero axis
order : int
degree of polynomial trend, 1 is linear, 0 is constant
Returns
-------
y_detrended : ndarray
detrended data in same shape as original
'''
nobs = y.shape[0]
shape = y.shape
y_ = y.ravel()
nobs_ = len(y_)
t = np.repeat(np.arange(nobs), nobs_ /float(nobs))
exog = np.vander(t, order+1)
params = np.linalg.lstsq(exog, y_)[0]
fittedvalues = np.dot(exog, params)
resid = (y_ - fittedvalues).reshape(*shape)
return resid, params
def detrend_separate(y, order=1):
'''detrend multivariate series by series specific trends
Paramters
---------
y : ndarray
data, can be 1d or nd. if ndim is greater then 1, then observations
are along zero axis
order : int
degree of polynomial trend, 1 is linear, 0 is constant
Returns
-------
y_detrended : ndarray
detrended data in same shape as original
'''
nobs = y.shape[0]
shape = y.shape
y_ = y.reshape(nobs, -1)
kvars_ = len(y_)
t = np.arange(nobs)
exog = np.vander(t, order+1)
params = np.linalg.lstsq(exog, y_)[0]
fittedvalues = np.dot(exog, params)
resid = (y_ - fittedvalues).reshape(*shape)
return resid, params
nobs = 30
sige = 0.1
y0 = 0.5 * np.random.randn(nobs,4,3)
t = np.arange(nobs)
y_observed = y0 + t[:,None,None]
for detrend_func, name in zip([detrend_common, detrend_separate],
['common', 'separate']):
y_detrended, params = detrend_func(y_observed, order=1)
print '\n\n', name
print 'params for detrending'
print params
print 'std of detrended', y_detrended.std() #should be roughly sig=0.5 (var of y0)
print 'maxabs', np.max(np.abs(y_detrended - y0))
print 'observed'
print y_observed[-1]
print 'detrended'
print y_detrended[-1]
print 'original "true"'
print y0[-1]
plt.figure()
for i in range(4):
for j in range(3):
plt.plot(y0[:,i,j], 'bo', alpha=0.75)
plt.plot(y_detrended[:,i,j], 'ro', alpha=0.75)
plt.title(name + ' detrending: blue - original, red - detrended')
plt.show()
Since Nicholas pointed out scipy.signal.detrend. My detrend separate is basically the same as scipy.signal.detrend with fewer (no axis or breaks) or different (with polynomial order) options.
>>> res = signal.detrend(y_observed, axis=0)
>>> (res - y0).var()
0.016931858083279336
>>> (y_detrended - y0).var()
0.01693185808327945
>>> (res - y_detrended).var()
8.402584948582852e-30
I think a plain old list comprehension is easiest:
cflux_detrended = np.array([[mlab.detrend_linear(t) for t in kk] for kk in cflux.T])