python curve fit setting an array element with a sequence - python

I am trying to use curve_fit to solve the two parameters k1, E1, but it keeps giving me the same error: setting an array element with a sequence. When I only have two arrays x and y, it works fine. Could someone tell me how to fix this problem? Thank you!
x = np.array([5,5,5,5,5,5,5,5,
12,12,12,12,12,12])
y = np.array([5,1,2,10,20,40,60,80,
6,6,6,6,6,6])
z = np.array([330,330,330,330,330,330,330,330,
330,350,370,390,410,430])
r = np.array([1.199,1.303,1.58,1.81,2.24,2.35,2.49,2.71,
4.3,8.0,1.4,2.32,3.4,6.24])
R = 2.5
def func(X, k1, E1):
x, y, z = X
return k1 * np.exp(-E1/R/z) * x / y
#initial guess
init_guess = [1, 10000]
fittedParameters, pcov = curve_fit(func, (x, y, z), r, init_guess)
print('Parameters', fittedParameters)

Related

How to find the difference of x -y using sympy

As you see in the code, I want to find the difference of x-y using the resulting R of solve. But, the code keeps returning x-y as value. Please help me. I am a 10 year old kid that just started coding.
import sympy as sp
x, y = sp.symbols ('x, y')
eq1 = sp.Eq(7 * x, 12 * y)
eq2 = sp.Eq(x+y, 9500)
R = sp.solve ((eq1, eq2), (x, y))
print (x-y)
The result R of sp.solve is a Python dictionary with values for x and for y:
import sympy as sp
x, y = sp.symbols('x, y')
eq1 = sp.Eq(7 * x, 12 * y)
eq2 = sp.Eq(x + y, 9500)
R = sp.solve((eq1, eq2), (x, y))
Result: {x: 6000, y: 3500}
To apply the resulting dictionary to an expression, use subs(R):
print((x - y).subs(R))
Result: 2500

Overflow encountered in square

I have tried searching for the overflow error that I'm getting, but I did not succeed.
When I run this program, I get runtime errors that in no way makes any sense to me.
and here is the data i used: https://pastebin.com/MLWvUarm
import numpy as np
def loadData():
data = np.loadtxt('data.txt', delimiter=',')
x = np.c_[data[:,0:2]]
y = np.c_[data[:,-1]]
return x, y
def hypothesis(x, theta):
h = x.dot(theta)
return h
def computeCost(x, y, theta):
m = np.size(y, 0)
h = hypothesis(x, theta)
J = (1/(2*m)) * np.sum(np.square(h-y))
return J
def gradient_descent(x, y, theta, alpha, mxIT):
m = np.size(y, 0)
J_history = np.zeros((mxIT, 1))
for it in range(mxIT):
hyp = hypothesis(x, theta)
err = hyp - y
theta = theta - (alpha/m) * (x.T.dot(err))
J_history[it] = computeCost(x, y, theta)
return theta, J_history
def main():
x, y = loadData()
x = np.c_[np.ones(x.shape[0]), x]
theta = np.zeros((np.size(x, 1), 1))
alpha = 0.01
mxIT = 400
theta, j_his = gradient_descent(x, y, theta, alpha, mxIT)
print(theta)
if __name__ == "__main__":
main()
How do I solve this problem?
After loading x, try to divide it by the mean and see if it converges. Link to documentation for mean: numpy.mean
...
x, y = loadData()
x = x / x.mean(axis=0, keepdims=True)
x = np.c_[np.ones(x.shape[0]), x]
...
Currently it seems to diverge and this produces very high errors which numpy complains about. You can see this from the cost history which you maintain in J_history.

Linear Regression with Gradient Descent in Python with numpy

I'm trying to implement in Python the first exercise of Andrew NG's Coursera Machine Learning course. In the course the exercise is with Matlab/Octave, but I wanted to implement it in Python as well.
The problem is that the line that updates theta values, does not seem to be working right, is returning values ​​[[0.72088159] [0.72088159]] but should return [[-3.630291] [1.166362]]
I'm using a learning rate of 0.01 and the gradient loop was set to 1500 (the same values ​​from the original exercise in Octave).
And obviously, with these wrong values ​​for theta, the predictions are not correct as shown in the last chart.
In the rows in which I tesyo the cost function with theta values ​​defined as [0; 0] and [-1; 2], the results are correct (the same as the exercise in Octave), so the error can only be in the function of the gradient, but I do not know what went wrong.
I wanted someone to help me figure out what I'm doing wrong. I'm grateful already.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
def load_data():
X = np.genfromtxt('data.txt', usecols=(0), delimiter=',', dtype=None)
y = np.genfromtxt('data.txt', usecols=(1), delimiter=',', dtype=None)
X = X.reshape(1, X.shape[0])
y = y.reshape(1, y.shape[0])
ones = np.ones(X.shape)
X = np.append(ones, X, axis=0)
theta = np.zeros((2, 1))
return (X, y, theta)
alpha = 0.01
iter_num = 1500
debug_at_loop = 10
def plot(x, y, y_hat=None):
x = x.reshape(x.shape[0], 1)
plt.xlabel('x')
plt.ylabel('hΘ(x)')
plt.ylim(ymax = 25, ymin = -5)
plt.xlim(xmax = 25, xmin = 5)
plt.scatter(x, y)
if type(y_hat) is np.ndarray:
plt.plot(x, y_hat, '-')
plt.show()
plot(X[1], y)
def hip(X, theta):
return np.dot(theta.T, X)
def cost(X, y, theta):
m = y.shape[1]
return np.sum(np.square(hip(X, theta) - y)) / (2 * m)
print('With theta = [0 ; 0]')
print('Cost computed =', cost(X, y, np.array([0, 0])))
print()
print('With theta = [-1 ; 2]')
print('Cost computed =', cost(X, y, np.array([-1, 2])))
def grad(X, y, alpha, theta, iter_num=1500, debug_cost_at_each=10):
J = []
m = y.shape[1]
for i in range(iter_num):
theta -= ((alpha * 1) / m) * np.sum(np.dot(hip(X, theta) - y, X.T))
if i % debug_cost_at_each == 0:
J.append(round(cost(X, y, theta), 6))
return J, theta
X, y, theta = load_data()
J, fit_theta = grad(X, y, alpha, theta)
print('Theta found by Gradient Descent:', fit_theta)
# Predict values for population sizes of 35,000 and 70,000
predict1 = np.dot(np.array([[1], [3.5]]).T, fit_theta);
print('For population = 35,000, we predict a profit of \n', predict1 * 10000);
predict2 = np.dot(np.array([[1], [7]]).T, fit_theta);
print('For population = 70,000, we predict a profit of \n', predict2 * 10000);
pred_y = hip(X, fit_theta)
plot(X[1], y, pred_y.T)
The data I'm using is the following txt:
6.1101,17.592
5.5277,9.1302
8.5186,13.662
7.0032,11.854
5.8598,6.8233
8.3829,11.886
7.4764,4.3483
8.5781,12
6.4862,6.5987
5.0546,3.8166
5.7107,3.2522
14.164,15.505
5.734,3.1551
8.4084,7.2258
5.6407,0.71618
5.3794,3.5129
6.3654,5.3048
5.1301,0.56077
6.4296,3.6518
7.0708,5.3893
6.1891,3.1386
20.27,21.767
5.4901,4.263
6.3261,5.1875
5.5649,3.0825
18.945,22.638
12.828,13.501
10.957,7.0467
13.176,14.692
22.203,24.147
5.2524,-1.22
6.5894,5.9966
9.2482,12.134
5.8918,1.8495
8.2111,6.5426
7.9334,4.5623
8.0959,4.1164
5.6063,3.3928
12.836,10.117
6.3534,5.4974
5.4069,0.55657
6.8825,3.9115
11.708,5.3854
5.7737,2.4406
7.8247,6.7318
7.0931,1.0463
5.0702,5.1337
5.8014,1.844
11.7,8.0043
5.5416,1.0179
7.5402,6.7504
5.3077,1.8396
7.4239,4.2885
7.6031,4.9981
6.3328,1.4233
6.3589,-1.4211
6.2742,2.4756
5.6397,4.6042
9.3102,3.9624
9.4536,5.4141
8.8254,5.1694
5.1793,-0.74279
21.279,17.929
14.908,12.054
18.959,17.054
7.2182,4.8852
8.2951,5.7442
10.236,7.7754
5.4994,1.0173
20.341,20.992
10.136,6.6799
7.3345,4.0259
6.0062,1.2784
7.2259,3.3411
5.0269,-2.6807
6.5479,0.29678
7.5386,3.8845
5.0365,5.7014
10.274,6.7526
5.1077,2.0576
5.7292,0.47953
5.1884,0.20421
6.3557,0.67861
9.7687,7.5435
6.5159,5.3436
8.5172,4.2415
9.1802,6.7981
6.002,0.92695
5.5204,0.152
5.0594,2.8214
5.7077,1.8451
7.6366,4.2959
5.8707,7.2029
5.3054,1.9869
8.2934,0.14454
13.394,9.0551
5.4369,0.61705
Well, I got it after losing several strands of hair (the programming will still leave me bald).
It was on the gradient line, and the solution was this:
theta -= ((alpha * 1) / m) * np.dot(X, (hip(X, theta) - y).T)
I changed the place of X and transposed the error vector.

How can I use multiple dimensional polynomials with numpy.polynomial?

I'm able to use numpy.polynomial to fit terms to 1D polynomials like f(x) = 1 + x + x^2. How can I fit multidimensional polynomials, like f(x,y) = 1 + x + x^2 + y + yx + y x^2 + y^2 + y^2 x + y^2 x^2? It looks like numpy doesn't support multidimensional polynomials at all: is that the case? In my real application, I have 5 dimensions of input and I am interested in hermite polynomials. It looks like the polynomials in scipy.special are also only available for one dimension of inputs.
# One dimension of data can be fit
x = np.random.random(100)
y = np.sin(x)
params = np.polynomial.polynomial.polyfit(x, y, 6)
np.polynomial.polynomial.polyval([0, .2, .5, 1.5], params)
array([ -5.01799432e-08, 1.98669317e-01, 4.79425535e-01,
9.97606096e-01])
# When I try two dimensions, it fails.
x = np.random.random((100, 2))
y = np.sin(5 * x[:,0]) + .4 * np.sin(x[:,1])
params = np.polynomial.polynomial.polyvander2d(x, y, [6, 6])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-13-5409f9a3e632> in <module>()
----> 1 params = np.polynomial.polynomial.polyvander2d(x, y, [6, 6])
/usr/local/lib/python2.7/site-packages/numpy/polynomial/polynomial.pyc in polyvander2d(x, y, deg)
1201 raise ValueError("degrees must be non-negative integers")
1202 degx, degy = ideg
-> 1203 x, y = np.array((x, y), copy=0) + 0.0
1204
1205 vx = polyvander(x, degx)
ValueError: could not broadcast input array from shape (100,2) into shape (100)
I got annoyed that there is no simple function for a 2d polynomial fit of any number of degrees so I made my own. Like the other answers it uses numpy lstsq to find the best coefficients.
import numpy as np
from scipy.linalg import lstsq
from scipy.special import binom
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
def _get_coeff_idx(coeff):
idx = np.indices(coeff.shape)
idx = idx.T.swapaxes(0, 1).reshape((-1, 2))
return idx
def _scale(x, y):
# Normalize x and y to avoid huge numbers
# Mean 0, Variation 1
offset_x, offset_y = np.mean(x), np.mean(y)
norm_x, norm_y = np.std(x), np.std(y)
x = (x - offset_x) / norm_x
y = (y - offset_y) / norm_y
return x, y, (norm_x, norm_y), (offset_x, offset_y)
def _unscale(x, y, norm, offset):
x = x * norm[0] + offset[0]
y = y * norm[1] + offset[1]
return x, y
def polyvander2d(x, y, degree):
A = np.polynomial.polynomial.polyvander2d(x, y, degree)
return A
def polyscale2d(coeff, scale_x, scale_y, copy=True):
if copy:
coeff = np.copy(coeff)
idx = _get_coeff_idx(coeff)
for k, (i, j) in enumerate(idx):
coeff[i, j] /= scale_x ** i * scale_y ** j
return coeff
def polyshift2d(coeff, offset_x, offset_y, copy=True):
if copy:
coeff = np.copy(coeff)
idx = _get_coeff_idx(coeff)
# Copy coeff because it changes during the loop
coeff2 = np.copy(coeff)
for k, m in idx:
not_the_same = ~((idx[:, 0] == k) & (idx[:, 1] == m))
above = (idx[:, 0] >= k) & (idx[:, 1] >= m) & not_the_same
for i, j in idx[above]:
b = binom(i, k) * binom(j, m)
sign = (-1) ** ((i - k) + (j - m))
offset = offset_x ** (i - k) * offset_y ** (j - m)
coeff[k, m] += sign * b * coeff2[i, j] * offset
return coeff
def plot2d(x, y, z, coeff):
# regular grid covering the domain of the data
if x.size > 500:
choice = np.random.choice(x.size, size=500, replace=False)
else:
choice = slice(None, None, None)
x, y, z = x[choice], y[choice], z[choice]
X, Y = np.meshgrid(
np.linspace(np.min(x), np.max(x), 20), np.linspace(np.min(y), np.max(y), 20)
)
Z = np.polynomial.polynomial.polyval2d(X, Y, coeff)
fig = plt.figure()
ax = fig.gca(projection="3d")
ax.plot_surface(X, Y, Z, rstride=1, cstride=1, alpha=0.2)
ax.scatter(x, y, z, c="r", s=50)
plt.xlabel("X")
plt.ylabel("Y")
ax.set_zlabel("Z")
plt.show()
def polyfit2d(x, y, z, degree=1, max_degree=None, scale=True, plot=False):
"""A simple 2D polynomial fit to data x, y, z
The polynomial can be evaluated with numpy.polynomial.polynomial.polyval2d
Parameters
----------
x : array[n]
x coordinates
y : array[n]
y coordinates
z : array[n]
data values
degree : {int, 2-tuple}, optional
degree of the polynomial fit in x and y direction (default: 1)
max_degree : {int, None}, optional
if given the maximum combined degree of the coefficients is limited to this value
scale : bool, optional
Wether to scale the input arrays x and y to mean 0 and variance 1, to avoid numerical overflows.
Especially useful at higher degrees. (default: True)
plot : bool, optional
wether to plot the fitted surface and data (slow) (default: False)
Returns
-------
coeff : array[degree+1, degree+1]
the polynomial coefficients in numpy 2d format, i.e. coeff[i, j] for x**i * y**j
"""
# Flatten input
x = np.asarray(x).ravel()
y = np.asarray(y).ravel()
z = np.asarray(z).ravel()
# Remove masked values
mask = ~(np.ma.getmask(z) | np.ma.getmask(x) | np.ma.getmask(y))
x, y, z = x[mask].ravel(), y[mask].ravel(), z[mask].ravel()
# Scale coordinates to smaller values to avoid numerical problems at larger degrees
if scale:
x, y, norm, offset = _scale(x, y)
if np.isscalar(degree):
degree = (int(degree), int(degree))
degree = [int(degree[0]), int(degree[1])]
coeff = np.zeros((degree[0] + 1, degree[1] + 1))
idx = _get_coeff_idx(coeff)
# Calculate elements 1, x, y, x*y, x**2, y**2, ...
A = polyvander2d(x, y, degree)
# We only want the combinations with maximum order COMBINED power
if max_degree is not None:
mask = idx[:, 0] + idx[:, 1] <= int(max_degree)
idx = idx[mask]
A = A[:, mask]
# Do the actual least squares fit
C, *_ = lstsq(A, z)
# Reorder coefficients into numpy compatible 2d array
for k, (i, j) in enumerate(idx):
coeff[i, j] = C[k]
# Reverse the scaling
if scale:
coeff = polyscale2d(coeff, *norm, copy=False)
coeff = polyshift2d(coeff, *offset, copy=False)
if plot:
if scale:
x, y = _unscale(x, y, norm, offset)
plot2d(x, y, z, coeff)
return coeff
if __name__ == "__main__":
n = 100
x, y = np.meshgrid(np.arange(n), np.arange(n))
z = x ** 2 + y ** 2
c = polyfit2d(x, y, z, degree=2, plot=True)
print(c)
It doesn't look like polyfit supports fitting multivariate polynomials, but you can do it by hand, with linalg.lstsq. The steps are as follows:
Gather the degrees of monomials x**i * y**j you wish to use in the model. Think carefully about it: your current model already has 9 parameters, if you are going to push to 5 variables then with the current approach you'll end up with 3**5 = 243 parameters, a sure road to overfitting. Maybe limit to the monomials of __total_ degree at most 2 or three...
Plug the x-points into each monomial; this gives a 1D array. Stack all such arrays as columns of a matrix.
Solve a linear system with aforementioned matrix and with the right-hand side being the target values (I call them z because y is confusing when you also use x, y for two variables).
Here it is:
import numpy as np
x = np.random.random((100, 2))
z = np.sin(5 * x[:,0]) + .4 * np.sin(x[:,1])
degrees = [(i, j) for i in range(3) for j in range(3)] # list of monomials x**i * y**j to use
matrix = np.stack([np.prod(x**d, axis=1) for d in degrees], axis=-1) # stack monomials like columns
coeff = np.linalg.lstsq(matrix, z)[0] # lstsq returns some additional info we ignore
print("Coefficients", coeff) # in the same order as the monomials listed in "degrees"
fit = np.dot(matrix, coeff)
print("Fitted values", fit)
print("Original values", y)
I believe you have misunderstood what polyvander2d does and how it should be used. polyvander2d() returns the pseudo-Vandermonde matrix of degrees deg and sample points (x, y).
Here, y is not the value(s) of the polynomial at point(s) x but rather it is the y-coordinate of the point(s) and x is the x-coordinate. Roughly speaking, the returned array is a set of combinations of (x**i) * (y**j) and x and y are essentially 2D "mesh-grids". Therefore, both x and y must have identical shapes.
Your x and y, however, arrays have different shapes:
>>> x.shape
(100, 2)
>>> y.shape
(100,)
I do not believe numpy has a 5D-polyvander of the form polyvander5D(x, y, z, v, w, deg). Notice, all the variables here are coordinates and not the values of the polynomial p=p(x,y,z,v,w). You, however, seem to be using y (in the 2D case) as f.
It appears that numpy does not have 2D or higher equivalents for the polyfit() function. If your intention is to find the coefficients of the best-fitting polynomial in higher-dimensions, I would suggest that you generalize the approach described here: Equivalent of `polyfit` for a 2D polynomial in Python
The option isn't there because nobody wants to do that. Combine the polynomials linearly (f(x,y) = 1 + x + y + x^2 + y^2) and solve the system of equations yourself.

Using Scipy curve_fit with piecewise function

I am getting an optimize warning:
OptimizeWarning: Covariance of the parameters could not be estimated
category=OptimizeWarning)
when trying to fit my piecewise function to my data using scipy.optimize.curve_fit. Meaning no fitting is happening. I can easily fit a parabola to my data, and I'm supplying curve_fit with what I feel are good initial parameters. Full code sample below. Does anyone know why curve_fit might not be getting along with np.piecewise? Or am I making a different mistake?
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
def piecewise_linear(x, x0, y0, k1, k2):
y = np.piecewise(x, [x < x0, x >= x0],
[lambda x:k1*x + y0-k1*x0, lambda x:k2*x + y0-k2*x0])
return y
def parabola(x, a, b):
y = a * x**2 + b
return y
x = np.array([-3, -2, -1, 0, 1, 2, 3])
y = np.array([9.15, 5.68, 2.32, 0.00, 2.05, 5.29, 8.62])
popt_piecewise, pcov = curve_fit(piecewise_linear, x, y, p0=[0.1, 0.1, -5, 5])
popt_parabola, pcov = curve_fit(parabola, x, y, p0=[1, 1])
new_x = np.linspace(x.min(), x.max(), 61)
fig, ax = plt.subplots()
ax.plot(x, y, 'o', ls='')
ax.plot(new_x, piecewise_linear(new_x, *popt_piecewise))
ax.plot(new_x, parabola(new_x, *popt_parabola))
ax.set_xlim(-4, 4)
ax.set_ylim(-2, 16)
It is a problem with types, you have to change the following line, so that the x is given as floats:
x = np.array([-3, -2, -1, 0, 1, 2, 3]).astype(np.float)
otherwise the piecewise_linear will might end up casting the types.
Just to be on the safe side you could also make the initial points float here:
popt_piecewise, pcov = curve_fit(piecewise_linear, x, y, p0=[0.1, 0.1, -5., 5.])
For completeness, I'll point out that fitting a piecewise linear function does not require np.piecewise: any such function can be constructed out of absolute values, using a multiple of np.abs(x-x0) for each bend. The following produces a good fit to the data:
def pl(x, x0, a, b, c):
y = a*np.abs(x-x0) + b*x + c
return y
popt_pl, pcov = curve_fit(pl, x, y, p0=[0, 0, 0, 0])
print(pl(x, *popt_pl))
Output is close to original y-values:
[ 8.90899998 5.828 2.74700002 -0.33399996 2.03499998 5.32
8.60500002]

Categories