I implemented a support vector machine in python using the cvxopt qp solver where I need to compute a gram matrix of two vectors with a kernel function at each element. I implemented it correctly using for loops but this strategy is computationally intensive. I would like to vectorize the code.
Example:
Here is what I have written:
K = np.array( [kernel(X[i], X[j],poly=poly_kernel)
for j in range(m)
for i in range(m)]).reshape((m, m))
How can I vectorize the above code without for loops to achieve the same result faster?
The kernel function computes a gaussian kernel.
Here is a quick explanation of an svm with kernel trick. Second page of this explains the problem.
Here is my full code for context.
EDIT: Here is a quick code snippet that runs what I need to vectorized in an unvectorized form
from sklearn.datasets import make_gaussian_quantiles;
import numpy as np;
X,y = make_gaussian_quantiles(mean=None, cov=1.0, n_samples=100, n_features=2, n_classes=2, shuffle=True, random_state=5);
m = X.shape[0];
def kernel(a,b,d=20,poly=True,sigma=0.5):
if (poly):
return np.inner(a,b) ** d;
else:
return np.exp(-np.linalg.norm((a - b) ** 2)/sigma**2)
# Need to vectorize these loops
K = np.array([kernel(X[i], X[j],poly=False)
for j in range(m)
for i in range(m)]).reshape((m, m))
Thanks!
Here is a vectorized version. The non poly branch comes in two variants a direct one and a memory saving one in case the number of features is large:
from sklearn.datasets import make_gaussian_quantiles;
import numpy as np;
X,y = make_gaussian_quantiles(mean=None, cov=1.0, n_samples=100, n_features=2, n_classes=2, shuffle=True, random_state=5);
Y,_ = make_gaussian_quantiles(mean=None, cov=1.0, n_samples=200, n_features=2, n_classes=2, shuffle=True, random_state=2);
m = X.shape[0];
n = Y.shape[0]
def kernel(a,b,d=20,poly=True,sigma=0.5):
if (poly):
return np.inner(a,b) ** d;
else:
return np.exp(-np.linalg.norm((a - b) ** 2)/sigma**2)
# Need to vectorize these loops
POLY = False
LOW_MEM = 0
K = np.array([kernel(X[i], Y[j], poly=POLY)
for i in range(m)
for j in range(n)]).reshape((m, n))
def kernel_v(X, Y=None, d=20, poly=True, sigma=0.5):
Z = X if Y is None else Y
if poly:
return np.einsum('ik,jk', X, Z)**d
elif X.shape[1] < LOW_MEM:
return np.exp(-np.sqrt(((X[:, None, :] - Z[None, :, :])**4).sum(axis=-1)) / sigma**2)
elif Y is None or Y is X:
X2 = X*X
H = np.einsum('ij,ij->i', X2, X2) + np.einsum('ik,jk', X2, 3*X2) - np.einsum('ik,jk', X2*X, 4*X)
return np.exp(-np.sqrt(np.maximum(0, H+H.T)) / sigma**2)
else:
X2, Y2 = X*X, Y*Y
E = np.einsum('ik,jk', X2, 6*Y2) - np.einsum('ik,jk', X2*X, 4*Y) - np.einsum('ik,jk', X, 4*Y2*Y)
E += np.add.outer(np.einsum('ij,ij->i', X2, X2), np.einsum('ij,ij->i', Y2, Y2))
return np.exp(-np.sqrt(np.maximum(0, E)) / sigma**2)
print(np.allclose(K, kernel_v(X, Y, poly=POLY)))
Related
I'm trying to write a function to evaluate the probability mass function for the bivariate poisson distribution.
This is easy when all of the parameters (x, y, theta1, theta2, theta0) are scalars, but tricky to scale up without loops to allow these parameters to be vectors. I need it to scale such that, for:
theta0 being a scalar - the "correlation parameter" in the equation
theta1 and theta2 having length l
x, y both having length n
the output array would have shape (l, n, n). For example, a slice [j, :, :] from the output array would look like:
The first part (the constant, before the summation) I think i've figured out:
import numpy as np
from scipy.special import factorial
def constant(theta1, theta2, theta0, x, y):
exponential_part = np.exp(-(theta1 + theta2 + theta0)).reshape(-1, 1, 1)
x = np.tile(x, (len(x), 1)).transpose()
y = np.tile(y, (len(y), 1))
double_factorial = (np.power(np.array(theta1).reshape(-1, 1, 1), x)/factorial(x)) * \
(np.power(np.array(theta2).reshape(-1, 1, 1), y)/factorial(y))
return exponential_part * double_factorial
But I'm struggling with the summation part. How can I vectorize a summation where the limits depend on variable arrays?
I think I have this figured out, based on the approach that #w-m suggests: calculate every possible summation term which could appear, based on the maximum x or y value which appears, and use a mask to get rid of the ones you don't want. Assuming you have your x and y terms go from 0 to N, in consecutive order, this is calculating up to three times more terms than are actually required, but this is offset by getting to use vectorization.
Reference implementation
I wrote this by first writing a pure-Python reference implementation, which just implements your problem using loops. With 4 nested loops, it's not exactly fast, but it's handy to have while testing the numpy version.
import numpy as np
from scipy.special import factorial, comb
import operator as op
from functools import reduce
def choose(n, r):
# https://stackoverflow.com/a/4941932/530160
r = min(r, n-r)
numer = reduce(op.mul, range(n, n-r, -1), 1)
denom = reduce(op.mul, range(1, r+1), 1)
return numer // denom # or / in Python 2
def reference_impl_constant(s_theta1, s_theta2, s_theta0, s_x, s_y):
# Cast to float to prevent overflow
s_theta1 = float(s_theta1)
s_theta2 = float(s_theta2)
s_theta0 = float(s_theta0)
s_x = float(s_x)
s_y = float(s_y)
term1 = np.exp(-(s_theta1 + s_theta2 + s_theta0))
term2 = (s_theta1 ** s_x / factorial(s_x))
term3 = (s_theta2 ** s_y / factorial(s_y))
assert term1 >= 0
assert term2 >= 0
assert term3 >= 0
return term1 * term2 * term3
def reference_impl_constant_loop(theta1, theta2, theta0, x, y):
theta_len = theta1.shape[0]
xy_len = x.shape[0]
constant_array = np.zeros((theta_len, xy_len, xy_len))
for i in range(theta_len):
for j in range(xy_len):
for k in range(xy_len):
s_theta1 = theta1[i]
s_theta2 = theta2[i]
s_theta0 = theta0
s_x = x[j]
s_y = y[k]
constant_term = reference_impl_constant(s_theta1, s_theta2, s_theta0, s_x, s_y)
assert constant_term >= 0
constant_array[i, j, k] = constant_term
return constant_array
def reference_impl_summation(s_theta1, s_theta2, s_theta0, s_x, s_y):
sum_ = 0
for i in range(min(s_x, s_y) + 1):
sum_ += choose(s_x, i) * choose(s_y, i) * factorial(i) * ((s_theta0/s_theta1/s_theta2) ** i)
assert sum_ >= 0
return sum_
def reference_impl_summation_loop(theta1, theta2, theta0, x, y):
theta_len = theta1.shape[0]
xy_len = x.shape[0]
summation_array = np.zeros((theta_len, xy_len, xy_len))
for i in range(theta_len):
for j in range(xy_len):
for k in range(xy_len):
s_theta1 = theta1[i]
s_theta2 = theta2[i]
s_theta0 = theta0
s_x = x[j]
s_y = y[k]
summation_term = reference_impl_summation(s_theta1, s_theta2, s_theta0, s_x, s_y)
assert summation_term >= 0
summation_array[i, j, k] = summation_term
return summation_array
def reference_impl(theta1, theta2, theta0, x, y):
# all array inputs must be 1D
assert len(theta1.shape) == 1
assert len(theta2.shape) == 1
assert len(x.shape) == 1
assert len(y.shape) == 1
# theta vectors must have same length
theta_len = theta1.shape[0]
assert theta2.shape[0] == theta_len
# x and y must have same length
xy_len = x.shape[0]
assert y.shape[0] == xy_len
# theta0 is scalar
assert isinstance(theta0, (int, float))
constant_array = np.zeros((theta_len, xy_len, xy_len))
output = np.zeros((theta_len, xy_len, xy_len))
constant_array = reference_impl_constant_loop(theta1, theta2, theta0, x, y)
summation_array = reference_impl_summation_loop(theta1, theta2, theta0, x, y)
output = constant_array * summation_array
return output
Numpy implementation
I split the implementation of this across two functions.
The fast_constant() function calculates everything to the left of the summation symbol. The fast_summation() function calculates everything inside the summation symbol.
import numpy as np
from scipy.special import factorial, comb
def fast_summation(theta1, theta2, theta0, x, y):
x = np.tile(x, (len(x), 1)).transpose()
y = np.tile(y, (len(y), 1))
sum_limit = np.minimum(x, y)
max_sum_limit = np.max(sum_limit)
i = np.arange(max_sum_limit + 1).reshape(-1, 1, 1)
summation_mask = (i <= sum_limit)
theta_ratio = (theta0 / (theta1 * theta2)).reshape(-1, 1, 1, 1)
theta_to_power = np.power(theta_ratio, i)
terms = comb(x, i) * comb(y, i) * factorial(i) * theta_to_power
# mask out terms which aren't part of sum
terms *= summation_mask
# axis 0 is theta
# axis 1 is i
# axis 2 & 3 are x and y
# so sum across axis 1
terms = terms.sum(axis=1)
return terms
def fast_constant(theta1, theta2, theta0, x, y):
theta1 = theta1.astype('float64')
theta2 = theta2.astype('float64')
exponential_part = np.exp(-(theta1 + theta2 + theta0)).reshape(-1, 1, 1)
# x and y must be 1D
assert len(x.shape) == 1
assert len(y.shape) == 1
# x and y must have same shape
assert x.shape == y.shape
x_len, y_len = x.shape[0], y.shape[0]
x = x.reshape((x_len, 1))
y = y.reshape((1, y_len))
double_factorial = (np.power(np.array(theta1).reshape(-1, 1, 1), x)/factorial(x)) * \
(np.power(np.array(theta2).reshape(-1, 1, 1), y)/factorial(y))
return exponential_part * double_factorial
def fast_impl(theta1, theta2, theta0, x, y):
return fast_summation(theta1, theta2, theta0, x, y) * fast_constant(theta1, theta2, theta0, x, y)
Benchmarking
Assuming that X and Y range from 0 to 20, and that theta is centered somewhere inside that range, I get the result that the numpy version is roughly 280 times faster than the pure python reference.
Numerical stability
I'm unsure how numerically stable this is. For example, when I center theta at 100, I get a floating-point overflow. Typically, when computing an expression which has lots of choose and factorial expressions inside it, you'll use some mathematical equivalent which results in smaller intermediate sums. In this case I have so little understanding of the math that I don't know how you'd do that.
When I use this code for Single variable Linear regression, the theta is being evaluated correctly but when on the multi variable it is giving weird output for theta.
I am trying to convert my octave code, that I wrote when I took Andrew Ng's course.
This is the main calling file:
m = data.shape[0]
a = np.array(data[0])
a.shape = (m,1)
b = np.array(data[1])
b.shape = (m, 1)
x = np.append(a, b, axis=1)
y = np.array(data[2])
lr = LR.LinearRegression()
[X, mu, sigma] = lr.featureNormalize(x)
z = np.ones((m, 1), dtype=float)
X = np.append(z, X, axis=1)
alpha = 0.01
num_iters = 400
theta = np.zeros(shape=(3,1))
[theta, J_history] = lr.gradientDescent(X, y, theta, alpha, num_iters)
print(theta)
And here are the contents of class :
class LinearRegression:
def featureNormalize(self, data):#this normalizes the features
data = np.array(data)
x_norm = data
mu = np.zeros(shape=(1, data.shape[1]))#creates mu vector filled with zeros
sigma = np.zeros(shape=(1, data.shape[1]))
for i in range(0, data.shape[1]):
mu[0, i] = np.mean(data[:, i])
sigma[0, i] = np.std(data[:, i])
for i in range(0, data.shape[1]):
x_norm[:, i] = np.subtract(x_norm[:, i], mu[0, i])
x_norm[:, i] = np.divide(x_norm[:, i], sigma[0, i])
return [x_norm, mu, sigma]
def gradientDescent(self, X, y, theta, alpha, num_iters):
m = y.shape[0]
J_history = np.zeros(shape=(num_iters, 1))
for i in range(0, num_iters):
predictions = X.dot(theta) # X is 47*3 theta is 3*1 predictions is 47*1
theta = np.subtract(theta , (alpha / m) * np.transpose((np.transpose(np.subtract(predictions ,y))).dot(X))) #1*97 into 97*3
J_history[i] = self.computeCost(X, y, theta)
return [theta, J_history]
def computeCost(self, X, y, theta):
warnings.filterwarnings('ignore')
m = X.shape[0]
J = 0
predictions = X.dot(theta)
sqrErrors = np.power(predictions - y, 2)
J = 1 / (2 * m) * np.sum(sqrErrors)
return J
I expected a theta that'll be a 3*1 matrix. According to Andrew's course my octave implementation was producing a theta
334302.063993
100087.116006
3673.548451
But in python implementation I am getting very weird output:
[[384596.12996714 317274.97693463 354878.64955708 223121.53576488
519238.43603216 288423.05420641 302849.01557052 191383.45903309
203886.92061274 233219.70871976 230814.42009498 333720.57288972
317370.18827964 673115.35724932 249953.82390212 432682.6678475
288423.05420641 192249.97844569 480863.45534211 576076.72380674
243221.70859887 245241.34318985 233604.4010228 249953.82390212
551937.2817908 240336.51632605 446723.93690857 451051.7253178
456822.10986344 288423.05420641 336509.59208678 163398.05571747
302849.01557052 557707.6...................... this goes on for long
The same code is working absolutely fine in Single Variable dataset. It is also working fine in the octave but seems like I am missing some point for 2+ hours now. Happy to get your help.
Try in gradientDescent the following second line of the for loop:
theta=theta-(alpha/m)*X.T.dot(X.dot(theta)-y)
Also, if you want to add a column of ones, it is easier to do like so:
np.c_[np.ones((m,1)),data]
I'm trying to implement in Python the first exercise of Andrew NG's Coursera Machine Learning course. In the course the exercise is with Matlab/Octave, but I wanted to implement it in Python as well.
The problem is that the line that updates theta values, does not seem to be working right, is returning values [[0.72088159] [0.72088159]] but should return [[-3.630291] [1.166362]]
I'm using a learning rate of 0.01 and the gradient loop was set to 1500 (the same values from the original exercise in Octave).
And obviously, with these wrong values for theta, the predictions are not correct as shown in the last chart.
In the rows in which I tesyo the cost function with theta values defined as [0; 0] and [-1; 2], the results are correct (the same as the exercise in Octave), so the error can only be in the function of the gradient, but I do not know what went wrong.
I wanted someone to help me figure out what I'm doing wrong. I'm grateful already.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
def load_data():
X = np.genfromtxt('data.txt', usecols=(0), delimiter=',', dtype=None)
y = np.genfromtxt('data.txt', usecols=(1), delimiter=',', dtype=None)
X = X.reshape(1, X.shape[0])
y = y.reshape(1, y.shape[0])
ones = np.ones(X.shape)
X = np.append(ones, X, axis=0)
theta = np.zeros((2, 1))
return (X, y, theta)
alpha = 0.01
iter_num = 1500
debug_at_loop = 10
def plot(x, y, y_hat=None):
x = x.reshape(x.shape[0], 1)
plt.xlabel('x')
plt.ylabel('hΘ(x)')
plt.ylim(ymax = 25, ymin = -5)
plt.xlim(xmax = 25, xmin = 5)
plt.scatter(x, y)
if type(y_hat) is np.ndarray:
plt.plot(x, y_hat, '-')
plt.show()
plot(X[1], y)
def hip(X, theta):
return np.dot(theta.T, X)
def cost(X, y, theta):
m = y.shape[1]
return np.sum(np.square(hip(X, theta) - y)) / (2 * m)
print('With theta = [0 ; 0]')
print('Cost computed =', cost(X, y, np.array([0, 0])))
print()
print('With theta = [-1 ; 2]')
print('Cost computed =', cost(X, y, np.array([-1, 2])))
def grad(X, y, alpha, theta, iter_num=1500, debug_cost_at_each=10):
J = []
m = y.shape[1]
for i in range(iter_num):
theta -= ((alpha * 1) / m) * np.sum(np.dot(hip(X, theta) - y, X.T))
if i % debug_cost_at_each == 0:
J.append(round(cost(X, y, theta), 6))
return J, theta
X, y, theta = load_data()
J, fit_theta = grad(X, y, alpha, theta)
print('Theta found by Gradient Descent:', fit_theta)
# Predict values for population sizes of 35,000 and 70,000
predict1 = np.dot(np.array([[1], [3.5]]).T, fit_theta);
print('For population = 35,000, we predict a profit of \n', predict1 * 10000);
predict2 = np.dot(np.array([[1], [7]]).T, fit_theta);
print('For population = 70,000, we predict a profit of \n', predict2 * 10000);
pred_y = hip(X, fit_theta)
plot(X[1], y, pred_y.T)
The data I'm using is the following txt:
6.1101,17.592
5.5277,9.1302
8.5186,13.662
7.0032,11.854
5.8598,6.8233
8.3829,11.886
7.4764,4.3483
8.5781,12
6.4862,6.5987
5.0546,3.8166
5.7107,3.2522
14.164,15.505
5.734,3.1551
8.4084,7.2258
5.6407,0.71618
5.3794,3.5129
6.3654,5.3048
5.1301,0.56077
6.4296,3.6518
7.0708,5.3893
6.1891,3.1386
20.27,21.767
5.4901,4.263
6.3261,5.1875
5.5649,3.0825
18.945,22.638
12.828,13.501
10.957,7.0467
13.176,14.692
22.203,24.147
5.2524,-1.22
6.5894,5.9966
9.2482,12.134
5.8918,1.8495
8.2111,6.5426
7.9334,4.5623
8.0959,4.1164
5.6063,3.3928
12.836,10.117
6.3534,5.4974
5.4069,0.55657
6.8825,3.9115
11.708,5.3854
5.7737,2.4406
7.8247,6.7318
7.0931,1.0463
5.0702,5.1337
5.8014,1.844
11.7,8.0043
5.5416,1.0179
7.5402,6.7504
5.3077,1.8396
7.4239,4.2885
7.6031,4.9981
6.3328,1.4233
6.3589,-1.4211
6.2742,2.4756
5.6397,4.6042
9.3102,3.9624
9.4536,5.4141
8.8254,5.1694
5.1793,-0.74279
21.279,17.929
14.908,12.054
18.959,17.054
7.2182,4.8852
8.2951,5.7442
10.236,7.7754
5.4994,1.0173
20.341,20.992
10.136,6.6799
7.3345,4.0259
6.0062,1.2784
7.2259,3.3411
5.0269,-2.6807
6.5479,0.29678
7.5386,3.8845
5.0365,5.7014
10.274,6.7526
5.1077,2.0576
5.7292,0.47953
5.1884,0.20421
6.3557,0.67861
9.7687,7.5435
6.5159,5.3436
8.5172,4.2415
9.1802,6.7981
6.002,0.92695
5.5204,0.152
5.0594,2.8214
5.7077,1.8451
7.6366,4.2959
5.8707,7.2029
5.3054,1.9869
8.2934,0.14454
13.394,9.0551
5.4369,0.61705
Well, I got it after losing several strands of hair (the programming will still leave me bald).
It was on the gradient line, and the solution was this:
theta -= ((alpha * 1) / m) * np.dot(X, (hip(X, theta) - y).T)
I changed the place of X and transposed the error vector.
I am trying to mimic the gradient descent algorithm for linear regression from Andrew NG's Machine learning course to Python, but for some reason my implementation is not working correctly.
Here's my implementation in Octave, it works correctly:
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
prediction = X*theta;
margin_error = prediction - y;
gradient = 1/m * (alpha * (X' * margin_error));
theta = theta - gradient;
J_history(iter) = computeCost(X, y, theta);
end
end
However, when I translate this to Python for some reason it is not giving me accurate results. The cost seems to be going up rather than descending.
Here's my implementation in Python:
def gradientDescent(x, y, theta, alpha, iters):
m = len(y)
J_history = np.matrix(np.zeros((iters,1)))
for i in range(iters):
prediction = x*theta.T
margin_error = prediction - y
gradient = 1/m * (alpha * (x.T * margin_error))
theta = theta - gradient
J_history[i] = computeCost(x,y,theta)
return theta,J_history
My code is compiling and there isn't anything wrong. Please note this is theta:
theta = np.matrix(np.array([0,0]))
Alpha and iters is set to this:
alpha = 0.01
iters = 1000
When I run it, opt_theta, cost = gradientDescent(x, y, theta, alpha, iters) and print out opt_theta, I get this:
matrix([[ 2.36890383e+16, -1.40798902e+16],
[ 2.47503758e+17, -2.36890383e+16]])
when I should get this:
matrix([[-3.24140214, 1.1272942 ]])
What am I doing wrong?
Edit:
Cost function
def computeCost(x, y, theta):
# Get length of data set
m = len(y)
# We get theta transpose because we are working with a numpy array [0,0] for example
prediction = x * theta.T
J = 1/(2*m) * np.sum(np.power((prediction - y), 2))
return J
Look there:
>>> A = np.matrix([3,3,3])
>>> B = np.matrix([[1,1,1], [2,2,2]])
>>> A-B
matrix([[2, 2, 2],
[1, 1, 1]])
Matrices are broadcasted together.
"it's because np.matrix inherits from np.array. np.matrix overrides multiplication, but not addition and subtraction"
In yours situation theta(1x2) subtract gradient(2x1) and in result you have got 2x2. Try to transpose gradient before subtracting.
theta = theta - gradient.T
I'm able to use numpy.polynomial to fit terms to 1D polynomials like f(x) = 1 + x + x^2. How can I fit multidimensional polynomials, like f(x,y) = 1 + x + x^2 + y + yx + y x^2 + y^2 + y^2 x + y^2 x^2? It looks like numpy doesn't support multidimensional polynomials at all: is that the case? In my real application, I have 5 dimensions of input and I am interested in hermite polynomials. It looks like the polynomials in scipy.special are also only available for one dimension of inputs.
# One dimension of data can be fit
x = np.random.random(100)
y = np.sin(x)
params = np.polynomial.polynomial.polyfit(x, y, 6)
np.polynomial.polynomial.polyval([0, .2, .5, 1.5], params)
array([ -5.01799432e-08, 1.98669317e-01, 4.79425535e-01,
9.97606096e-01])
# When I try two dimensions, it fails.
x = np.random.random((100, 2))
y = np.sin(5 * x[:,0]) + .4 * np.sin(x[:,1])
params = np.polynomial.polynomial.polyvander2d(x, y, [6, 6])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-13-5409f9a3e632> in <module>()
----> 1 params = np.polynomial.polynomial.polyvander2d(x, y, [6, 6])
/usr/local/lib/python2.7/site-packages/numpy/polynomial/polynomial.pyc in polyvander2d(x, y, deg)
1201 raise ValueError("degrees must be non-negative integers")
1202 degx, degy = ideg
-> 1203 x, y = np.array((x, y), copy=0) + 0.0
1204
1205 vx = polyvander(x, degx)
ValueError: could not broadcast input array from shape (100,2) into shape (100)
I got annoyed that there is no simple function for a 2d polynomial fit of any number of degrees so I made my own. Like the other answers it uses numpy lstsq to find the best coefficients.
import numpy as np
from scipy.linalg import lstsq
from scipy.special import binom
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
def _get_coeff_idx(coeff):
idx = np.indices(coeff.shape)
idx = idx.T.swapaxes(0, 1).reshape((-1, 2))
return idx
def _scale(x, y):
# Normalize x and y to avoid huge numbers
# Mean 0, Variation 1
offset_x, offset_y = np.mean(x), np.mean(y)
norm_x, norm_y = np.std(x), np.std(y)
x = (x - offset_x) / norm_x
y = (y - offset_y) / norm_y
return x, y, (norm_x, norm_y), (offset_x, offset_y)
def _unscale(x, y, norm, offset):
x = x * norm[0] + offset[0]
y = y * norm[1] + offset[1]
return x, y
def polyvander2d(x, y, degree):
A = np.polynomial.polynomial.polyvander2d(x, y, degree)
return A
def polyscale2d(coeff, scale_x, scale_y, copy=True):
if copy:
coeff = np.copy(coeff)
idx = _get_coeff_idx(coeff)
for k, (i, j) in enumerate(idx):
coeff[i, j] /= scale_x ** i * scale_y ** j
return coeff
def polyshift2d(coeff, offset_x, offset_y, copy=True):
if copy:
coeff = np.copy(coeff)
idx = _get_coeff_idx(coeff)
# Copy coeff because it changes during the loop
coeff2 = np.copy(coeff)
for k, m in idx:
not_the_same = ~((idx[:, 0] == k) & (idx[:, 1] == m))
above = (idx[:, 0] >= k) & (idx[:, 1] >= m) & not_the_same
for i, j in idx[above]:
b = binom(i, k) * binom(j, m)
sign = (-1) ** ((i - k) + (j - m))
offset = offset_x ** (i - k) * offset_y ** (j - m)
coeff[k, m] += sign * b * coeff2[i, j] * offset
return coeff
def plot2d(x, y, z, coeff):
# regular grid covering the domain of the data
if x.size > 500:
choice = np.random.choice(x.size, size=500, replace=False)
else:
choice = slice(None, None, None)
x, y, z = x[choice], y[choice], z[choice]
X, Y = np.meshgrid(
np.linspace(np.min(x), np.max(x), 20), np.linspace(np.min(y), np.max(y), 20)
)
Z = np.polynomial.polynomial.polyval2d(X, Y, coeff)
fig = plt.figure()
ax = fig.gca(projection="3d")
ax.plot_surface(X, Y, Z, rstride=1, cstride=1, alpha=0.2)
ax.scatter(x, y, z, c="r", s=50)
plt.xlabel("X")
plt.ylabel("Y")
ax.set_zlabel("Z")
plt.show()
def polyfit2d(x, y, z, degree=1, max_degree=None, scale=True, plot=False):
"""A simple 2D polynomial fit to data x, y, z
The polynomial can be evaluated with numpy.polynomial.polynomial.polyval2d
Parameters
----------
x : array[n]
x coordinates
y : array[n]
y coordinates
z : array[n]
data values
degree : {int, 2-tuple}, optional
degree of the polynomial fit in x and y direction (default: 1)
max_degree : {int, None}, optional
if given the maximum combined degree of the coefficients is limited to this value
scale : bool, optional
Wether to scale the input arrays x and y to mean 0 and variance 1, to avoid numerical overflows.
Especially useful at higher degrees. (default: True)
plot : bool, optional
wether to plot the fitted surface and data (slow) (default: False)
Returns
-------
coeff : array[degree+1, degree+1]
the polynomial coefficients in numpy 2d format, i.e. coeff[i, j] for x**i * y**j
"""
# Flatten input
x = np.asarray(x).ravel()
y = np.asarray(y).ravel()
z = np.asarray(z).ravel()
# Remove masked values
mask = ~(np.ma.getmask(z) | np.ma.getmask(x) | np.ma.getmask(y))
x, y, z = x[mask].ravel(), y[mask].ravel(), z[mask].ravel()
# Scale coordinates to smaller values to avoid numerical problems at larger degrees
if scale:
x, y, norm, offset = _scale(x, y)
if np.isscalar(degree):
degree = (int(degree), int(degree))
degree = [int(degree[0]), int(degree[1])]
coeff = np.zeros((degree[0] + 1, degree[1] + 1))
idx = _get_coeff_idx(coeff)
# Calculate elements 1, x, y, x*y, x**2, y**2, ...
A = polyvander2d(x, y, degree)
# We only want the combinations with maximum order COMBINED power
if max_degree is not None:
mask = idx[:, 0] + idx[:, 1] <= int(max_degree)
idx = idx[mask]
A = A[:, mask]
# Do the actual least squares fit
C, *_ = lstsq(A, z)
# Reorder coefficients into numpy compatible 2d array
for k, (i, j) in enumerate(idx):
coeff[i, j] = C[k]
# Reverse the scaling
if scale:
coeff = polyscale2d(coeff, *norm, copy=False)
coeff = polyshift2d(coeff, *offset, copy=False)
if plot:
if scale:
x, y = _unscale(x, y, norm, offset)
plot2d(x, y, z, coeff)
return coeff
if __name__ == "__main__":
n = 100
x, y = np.meshgrid(np.arange(n), np.arange(n))
z = x ** 2 + y ** 2
c = polyfit2d(x, y, z, degree=2, plot=True)
print(c)
It doesn't look like polyfit supports fitting multivariate polynomials, but you can do it by hand, with linalg.lstsq. The steps are as follows:
Gather the degrees of monomials x**i * y**j you wish to use in the model. Think carefully about it: your current model already has 9 parameters, if you are going to push to 5 variables then with the current approach you'll end up with 3**5 = 243 parameters, a sure road to overfitting. Maybe limit to the monomials of __total_ degree at most 2 or three...
Plug the x-points into each monomial; this gives a 1D array. Stack all such arrays as columns of a matrix.
Solve a linear system with aforementioned matrix and with the right-hand side being the target values (I call them z because y is confusing when you also use x, y for two variables).
Here it is:
import numpy as np
x = np.random.random((100, 2))
z = np.sin(5 * x[:,0]) + .4 * np.sin(x[:,1])
degrees = [(i, j) for i in range(3) for j in range(3)] # list of monomials x**i * y**j to use
matrix = np.stack([np.prod(x**d, axis=1) for d in degrees], axis=-1) # stack monomials like columns
coeff = np.linalg.lstsq(matrix, z)[0] # lstsq returns some additional info we ignore
print("Coefficients", coeff) # in the same order as the monomials listed in "degrees"
fit = np.dot(matrix, coeff)
print("Fitted values", fit)
print("Original values", y)
I believe you have misunderstood what polyvander2d does and how it should be used. polyvander2d() returns the pseudo-Vandermonde matrix of degrees deg and sample points (x, y).
Here, y is not the value(s) of the polynomial at point(s) x but rather it is the y-coordinate of the point(s) and x is the x-coordinate. Roughly speaking, the returned array is a set of combinations of (x**i) * (y**j) and x and y are essentially 2D "mesh-grids". Therefore, both x and y must have identical shapes.
Your x and y, however, arrays have different shapes:
>>> x.shape
(100, 2)
>>> y.shape
(100,)
I do not believe numpy has a 5D-polyvander of the form polyvander5D(x, y, z, v, w, deg). Notice, all the variables here are coordinates and not the values of the polynomial p=p(x,y,z,v,w). You, however, seem to be using y (in the 2D case) as f.
It appears that numpy does not have 2D or higher equivalents for the polyfit() function. If your intention is to find the coefficients of the best-fitting polynomial in higher-dimensions, I would suggest that you generalize the approach described here: Equivalent of `polyfit` for a 2D polynomial in Python
The option isn't there because nobody wants to do that. Combine the polynomials linearly (f(x,y) = 1 + x + y + x^2 + y^2) and solve the system of equations yourself.