Multiple linear regression for a surface using NumPy - example

Multiple linear regression for a surface using NumPy - example - python

This question is close to: fitting a linear surface with numpy least squares, but there's no sample data. I must be terribly slow but it seems I can't get it to work.
I have the following code:
import numpy as np
XYZ = np.array([[0, 1, 0, 1],
[0, 0, 1, 1],
[1, 1, 1, 1]])
A = np.row_stack((np.ones(len(XYZ[0])), XYZ[0, :], XYZ[1:]))
coeffs = np.linalg.lstsq(A.T, XYZ[2, :])[0]
print coeffs
The output is:
[ 5.00000000e-01 5.55111512e-17 9.71445147e-17 5.00000000e-01]
I want z = a + bx + cy, i.e. three coefficients, but the output gives me four. Where do I go wrong here? I expected coeffs to be something like:
[ 1.0 0.0 0.0]
Any help appreciated.

Peter Schneider (comment) is right: you'll want to feed XYZ[1, :] to row_stack:
>>> A = np.row_stack((np.ones(len(XYZ[0])), XYZ[0, :], XYZ[1, :]))
>>> np.linalg.lstsq(A.T, XYZ[2, :])[0]
array([ 1.00000000e+00, -7.85046229e-17, -7.85046229e-17])

Related

Calculate condensed distance matrix with varying length data points

Scipy's pdist function expects an evenly shaped numpy array as input.
Working example:
from scipy.spatial.distance import pdist
from scipy.spatial.distance import squareform
#Example distance function.
def dfun(u, v):
return u.sum() + v.sum()
dat0 = np.array([-1, 1,-3, 1])
dat1 = np.array([-1, 1,-3, 1])
dat2 = np.array([ 1, 1, 1, 1])
data = np.array([dat0, dat1, dat2])
distance_matrix = pdist(data, dfun)
squareform(distance_matrix)
I got a custom distance function which works with run-length encoded data, thus the arrays may vary in size. When using the following input
dat0 = np.array([-1, 1,-4, 1])
dat1 = np.array([-1, 1,-3, 1, 1])
dat2 = np.array([ 1,-6])
A value error ValueError: A 2-dimensional array must be passed. is raised even though the distance function would be just fine handling the input. Does there exist an alternative to calculate these values?
Edit: the distance function in the above snippet is just an example for a metric which does not care about the actual number of elements inside the datapoint. In my case https://github.com/mclmza/AWarp is used which computes the dtw for sparse data sets example series: [1,-456,1,1,-23,1], thus padding the data is not a valid option.

If I understand correctly, you want to compute the distances using awarp, but that distance function takes signals of varying length. So you need to avoid creating an array, because NumPy doesn't allow 'ragged' arrays. Then I think you can do this:
from itertools import combinations
from scipy.spatial.distance import squareform
# Example distance function.
def dfun(u, v):
return u.sum() + v.sum()
dat0 = np.array([-1, 1,-4, 1])
dat1 = np.array([-1, 1,-3, 1, 1])
dat2 = np.array([ 1,-6])
data = [dat0, dat1, dat2]
dists = [dfun(a, b) for a, b in combinations(data, r=2)]
squareform(dists)
For your example, this yields:
array([[ 0, -4, -8],
[-4, 0, -6],
[-8, -6, 0]])
And if dfun = awarp then you get this output for those signals:
array([[ 0. , 0. , 2.23606798],
[ 0. , 0. , 2.44948974],
[ 2.23606798, 2.44948974, 0. ]])
I guess this approach only works if dfun is commutative, which I think awarp is.

Normalizing vectors contained in an array

I've got an array, called X, where every element is a 2d-vector itself. The diagonal of this array is filled with nothing but zero-vectors.
Now I need to normalize every vector in this array, without changing the structure of it.
First I tried to calculate the norm of every vector and put it in an array, called N. After that I wanted to divide every element of X by every element of N.
Two problems occured to me:
1) Many entries of N are zero, which is obviously a problem when I try to divide by them.
2) The shapes of the arrays don't match, so np.divide() doesn't work as expected.
Beyond that I don't think, that it's a good idea to calculate N like this, because later on I want to be able to do the same with more than two vectors.
import numpy as np
# Example array
X = np.array([[[0, 0], [1, -1]], [[-1, 1], [0, 0]]])
# Array containing the norms
N = np.vstack((np.linalg.norm(X[0], axis=1), np.linalg.norm(X[1],
axis=1)))
R = np.divide(X, N)
I want the output to look like this:
R = np.array([[[0, 0], [0.70710678, -0.70710678]], [[-0.70710678, 0.70710678], [0, 0]]])

You do not need to use sklearn. Just define a function and then use list comprehension:
Assuming that the 0th dimension of the X is equal to the number of 2D arrays that you have, use this:
import numpy as np
# Example array
X = np.array([[[0, 0], [1, -1]], [[-1, 1], [0, 0]]])
def stdmtx(X):
X= X - X.mean(axis =1)[:, np.newaxis]
X= X / X.std(axis= 1, ddof=1)[:, np.newaxis]
return np.nan_to_num(X)
R = np.array([stdmtx(X[i,:,:]) for i in range(X.shape[0])])
The desired output R:
array([[[ 0. , 0. ],
[ 0.70710678, -0.70710678]],
[[-0.70710678, 0.70710678],
[ 0. , 0. ]]])

Creating 2d histogram from 2d numpy array

I have a numpy array like this:
[[[0,0,0], [1,0,0], ..., [1919,0,0]],
[[0,1,0], [1,1,0], ..., [1919,1,0]],
...,
[[0,1019,0], [1,1019,0], ..., [1919,1019,0]]]
To create I use function (thanks to #Divakar and #unutbu for helping in other question):
def indices_zero_grid(m,n):
I,J = np.ogrid[:m,:n]
out = np.zeros((m,n,3), dtype=int)
out[...,0] = I
out[...,1] = J
return out
I can access this array by command:
>>> out = indices_zero_grid(3,2)
>>> out
array([[[0, 0, 0],
[0, 1, 0]],
[[1, 0, 0],
[1, 1, 0]],
[[2, 0, 0],
[2, 1, 0]]])
>>> out[1,1]
array([1, 1, 0])
Now I wanted to plot 2d histogram where (x,y) (out[(x,y]) is the coordinates and the third value is number of occurrences. I've tried using normal matplotlib plot, but I have so many values for each coordinates (I need 1920x1080) that program needs too much memory.

If I understand correctly, you want an image of size 1920x1080 which colors the pixel at coordinate (x, y) according to the value of out[x, y].
In that case, you could use
import numpy as np
import matplotlib.pyplot as plt
def indices_zero_grid(m,n):
I,J = np.ogrid[:m,:n]
out = np.zeros((m,n,3), dtype=int)
out[...,0] = I
out[...,1] = J
return out
h, w = 1920, 1080
out = indices_zero_grid(h, w)
out[..., 2] = np.random.randint(256, size=(h, w))
plt.imshow(out[..., 2])
plt.show()
which yields
Notice that the other two "columns", out[..., 0] and out[..., 1] are not used. This suggests that indices_zero_grid is not really needed here.
plt.imshow can accept an array of shape (1920, 1080). This array has a scalar value at each location in the array. The structure of the array tells imshow where to color each cell. Unlike a scatter plot, you don't need to generate the coordinates yourself.

Generating random numbers around a set of coordinates without for loop

I have a set of coordinate means (3D) and a set of standard deviations (3D) accompying them like this:
means = [[x1, y1, z1],
[x2, y2, z2],
...
[xn, yn, zn]]
stds = [[sx1, sy1, sz1],
[sx2, sy2, sz2],
...
[sxn, syn, szn]]
so the problem is N x 3
I am looking to generate 1000 coordinate sample sets (N x 3 x 1000) randomly using np.random.normal(). Currently I generate the samples using a for loop:
for i in range(0,1000):
samples = np.random.normal(means, stds)
But I have the feeling I can lose the for loop and let numpy do it faster and in one call, anybody know how I should code that?

or alternatively use the size argument:
import numpy as np
means = [ [0, 0, 0], [1, 1, 1] ]
std = [ [1, 1, 1], [1, 1, 1] ]
#100 samples
print(np.random.normal(means, std, size = (100, len(means), 3)))

You can repeat your means and stds arrays 1000 times, and then call np.random.normal() once.
means = [[0, 0, 0],
[1, 1, 1]]
stds = [[1, 1, 1],
[2, 2, 2]]
means = numpy.array(means) * numpy.ones(1000)[:, None, None]
stds = numpy.array(stds) * numpy.ones(1000)[:, None, None]
samples = numpy.random.normal(means, stds)

Why SymPy didn't show me the inverse matrix result in the book?

According to the book I'm reading, the inverse matrix of
is
.
Where
a = e^(π*(2/3)*j), like the complex number j, only that the phase of j is 90°, but that of a is 120°.
So I tried this in SymPy:
from sympy import *
a = symbols('a')
T = Matrix([
[1, 1, 1],
[1, a**2, a],
[1, a, a**2]
])
simplify(T.inv())
This is the result in IPython:
which doesn't seem like the inverse matrix in the book at all.
Why did I get this?
And how can I get the result in the book using SymPy?

After your edit, it is clear that a is not a parameter, but rather it has a precise value, that is, -0.5 + i*sqrt(3)/2. If you don't tell SymPy what that value is, it will treat it as a parameter, and the inverted matrix looks like that. But if you give a the right value, then everything works:
from sympy import *
a = -0.5 + I*sqrt(3)/2
T = Matrix([
[1, 1, 1],
[1, a**2, a],
[1, a, a**2]
])
invT = Matrix([
[1, 1, 1],
[1, a, a**2],
[1, a**2, a]
])
simplify(1/3*(T*invT))
and this gives the identity matrix as expected.
This was my original answer:
You can't get the result given by your book, because it's wrong.
Emathelp.net confirms that the result found by SymPy is correct, and symbolab.com shows that the result provided by your book is wrong, because if you multiply A * A-1 you don't get the identity matrix.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multiple linear regression for a surface using NumPy - example - python

Peter Schneider (comment) is right: you'll want to feed XYZ[1, :] to row_stack: >>> A = np.row_stack((np.ones(len(XYZ[0])), XYZ[0, :], XYZ[1, :])) >>> np.linalg.lstsq(A.T, XYZ[2, :])[0] array([ 1.00000000e+00, -7.85046229e-17, -7.85046229e-17])

Related

Calculate condensed distance matrix with varying length data points

Normalizing vectors contained in an array

Creating 2d histogram from 2d numpy array

Generating random numbers around a set of coordinates without for loop

Why SymPy didn't show me the inverse matrix result in the book?

Categories

Resources