Given these two arrays:
E = [[16.461, 17.015, 14.676],
[15.775, 18.188, 14.459],
[14.489, 18.449, 14.756],
[14.171, 19.699, 14.406],
[14.933, 20.644, 13.839],
[16.233, 20.352, 13.555],
[16.984, 21.297, 12.994],
[16.683, 19.056, 13.875],
[17.918, 18.439, 13.718],
[17.734, 17.239, 14.207]]
S = [[0.213, 0.660, 1.287],
[0.250, 2.016, 1.509],
[0.016, 2.995, 0.619],
[0.142, 4.189, 1.194],
[0.451, 4.493, 2.459],
[0.681, 3.485, 3.329],
[0.990, 3.787, 4.592],
[0.579, 2.170, 2.844],
[0.747, 0.934, 3.454],
[0.520, 0.074, 2.491]]
The problem states that I should get the 3x3 covariance matrix (C) between S and E using the following formula:
C = (1/(n-1))[S'E - (1/10)S'i i'E]
Here n is 10, and i is an n x 1 column vector consisting of only ones. S' and i' are the transpose of matrix S and column vector i, respectively.
So far, I can't get C because I don't understand the meaning of i (and i') and its implementation in the formula. Using numpy, so far I do:
import numpy as np
tS = numpy.array(S).T
C = (1.0/9.0)*(np.dot(tS, E)-((1.0/10.0)*np.dot(tS, E))) #Here is where I lack the i and i' implementation.
I will really appreciate your help to understand and implement i and i' in the formula. The output should be:
C= [[0.2782, 0.2139, -0.1601],
[-1.4028, 1.9619, -0.2744],
[1.0443, 0.9712, -0.6610]]
It looks like the only part you're missing is making i:
>>> i = np.ones((N, 1))
>>> i
array([[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.]])
After that, we get
>>> C = (1.0/(N-1)) * (S.T.dot(E) - (1.0/N) * S.T.dot(i) * i.T.dot(E))
>>> C
array([[ 0.27842301, 0.21388842, -0.16011839],
[-1.4017267 , 1.96193373, -0.27441417],
[ 1.04532836, 0.97120807, -0.66095656]])
Note that this doesn't quite produce the array you expected, which is more obvious if you round it, but maybe there are some minor typos in your data?
>>> C.round(4)
array([[ 0.2784, 0.2139, -0.1601],
[-1.4017, 1.9619, -0.2744],
[ 1.0453, 0.9712, -0.661 ]])
This is what you want I guess:
S = numpy.array(S)
E = numpy.array(E)
ones = np.ones((10,1))
C = (1.0/9)*(np.dot(S.T, E)-((1.0/10)* (np.dot(np.dot(np.dot(S.T,ones),ones.T),E))))
My output is :
array([[ 0.27842301, 0.21388842, -0.16011839],
[-1.4017267 , 1.96193373, -0.27441417],
[ 1.04532836, 0.97120807, -0.66095656]])
Related
I am trying to write a function which would estimate data noise (σ2) based on three NP arrays - One augmented X-matrix and the two vectors - the y-target and the MAP weights:
This function should return the empirical data noise estimate, σ2.
I have the following function:
def estimDS (X, output_y, W):
n = X.shape[0] # observations rows
d = X.shape[1] # number of features in columns
matmul = np.matmul(aug_x, ml_weights)
mult_left = (1/(n-d))
mult_right = (output_y-matmul)**2
estimDS = mult_left * mult_right
return estimDS
And this is an example on which I run function:
output_y = np.array([208500, 181500, 223500,
140000, 250000, 143000,
307000, 200000, 129900,
118000])
aug_x = np. array([[ 1., 1710., 2003.],
[ 1., 1262., 1976.],
[ 1., 1786., 2001.],
[ 1., 1717., 1915.],
[ 1., 2198., 2000.],
[ 1., 1362., 1993.],
[ 1., 1694., 2004.],
[ 1., 2090., 1973.],
[ 1., 1774., 1931.],
[ 1., 1077., 1939.]])
W = [-2.29223802e+06 5.92536529e+01 1.20780450e+03]
sig2 = estimDS(aug_x, output_y, W)
print(sig2)
Function returns an array, but I need to get this result as a float 3700666577282.7227
[5.61083809e+07 2.17473754e+07 6.81288433e+06 4.40198178e+07
1.86225354e+06 3.95549405e+08 8.78575426e+08 3.04530677e+07
3.32164594e+07 2.87861673e+06]
You forgot to sum over i=1 to n. Therefore mult_right should be defined as:
mult_right=np.sum((output_y-matmul)**2, axis=0)
This is one of the first things I try to code in python (and any programming language) and my first question here, so I hope I provide everything neccessary to help me.
I have upper triangular matrix and I need to solve system of equations Wx=y, where W (3x3 matrix) and y (vector) are given. I cannot use numpy.linalg functions, so I try to implement this, but backwards of course.
After several failed attempts, I limited my task to 3x3 matrix. Without loop, code looks like this:
x[0,2]=y[2]/W[2,2]
x[0,1]=(y[1]-W[1,2]*x[0,2])/W[1,1]
x[0,0]=(y[0]-W[0,2]*x[0,2]-W[0,1]*x[0,1])/W[0,0]
Now, every new sum contains more elements, which are schematic, but nevertheless need to be defined somehow. I suppose there must be sum function in numpy, but not linalg, which does such things, but I cannot find it.
My newest, partial "attempt" begins with something like this:
n=3
for k in range(n):
for i in range(n-k-1):
x[0,n-k-1]=y[n-k-1]/W[n-k-1,n-k-1]
Which, of course, contains only first element of each sum.
I would be thankful for any assistance.
Example I am working on:
y=np.array([ 0.80064077, 2.64300842, -0.74912957])
W=np.array([[6.244998,2.88230677,-5.44435723],[0.,2.94827198,2.26990852],[0.,0.,0.45441135]]
n=W.shape[1]
x=np.zeros((1,n), dtype=np.float)
Proper solution should look like:
[-2.30857143 2.16571429 -1.64857143]
Here's one approach to use generic n and with one-loop -
def one_loop(y, W, n):
out = np.zeros((1,n))
for i in range(n-1,-1,-1):
sums = (W[i,i+1:]*out[0,i+1:]).sum()
out[0,i] = (y[i] - sums)/W[i,i]
return out
For performance, we can replace that sum-reduction step with a dot-product. Thus, sums could be alternatively computed like so -
sums = W[i,i+1:].dot(x[0,i+1:])
Sample runs
1) n = 3 :
In [149]: y
Out[149]: array([ 5., 8., 7.])
In [150]: W
Out[150]:
array([[ 6., 6., 2.],
[ 3., 3., 3.],
[ 4., 8., 5.]])
In [151]: x = np.zeros((1,3))
...: x[0,2]=y[2]/W[2,2]
...: x[0,1]=(y[1]-W[1,2]*x[0,2])/W[1,1]
...: x[0,0]=(y[0]-W[0,2]*x[0,2]-W[0,1]*x[0,1])/W[0,0]
...:
In [152]: x
Out[152]: array([[-0.9 , 1.26666667, 1.4 ]])
In [154]: one_loop(y, W, n=3)
Out[154]: array([[-0.9 , 1.26666667, 1.4 ]])
2) n = 4 :
In [156]: y
Out[156]: array([ 5., 8., 7., 6.])
In [157]: W
Out[157]:
array([[ 6., 2., 3., 3.],
[ 3., 4., 8., 5.],
[ 8., 6., 6., 4.],
[ 8., 4., 2., 2.]])
In [158]: x = np.zeros((1,4))
...: x[0,3]=y[3]/W[3,3]
...: x[0,2]=(y[2]-W[2,3]*x[0,3])/W[2,2]
...: x[0,1]=(y[1]-W[1,3]*x[0,3]-W[1,2]*x[0,2])/W[1,1]
...: x[0,0]=(y[0]-W[0,3]*x[0,3]-W[0,2]*x[0,2]-W[0,1]*x[0,1])/W[0,0]
...:
In [159]: x
Out[159]: array([[-0.22222222, -0.08333333, -0.83333333, 3. ]])
In [160]: one_loop(y, W, n=4)
Out[160]: array([[-0.22222222, -0.08333333, -0.83333333, 3. ]])
One more take (now updated to the state-of-the-art provided by Divakar in another answer):
import numpy as np
y=np.array([ 0.80064077, 2.64300842, -0.74912957])
W=np.array([[6.244998,2.88230677,-5.44435723],[0.,2.94827198,2.26990852],[0.,0.,0.45441135]])
n=W.shape[1]
x=np.zeros((1,n), dtype=np.float)
for i in range(n-1, -1, -1):
x[0,i] = (y[i]-W[i,i+1:].dot(x[0,i+1:]))/W[i,i]
print(x)
gives:
[[-2.30857143 2.16571429 -1.64857143]]
My take
n=3
for k in range(n):
print("s=y[%d]"% (n-k-1))
s = y[n-k-1]
for i in range(0,k):
print("s - W[%d,%d]*x[0,%d]" % (n-k-1, n-i-1, n-i-1))
s = s - W[n-k-1,n-i-1]*x[0,n-i-1]
print("x[0,%d] = s/W[%d,%d]" % (n-k-1,n-k-1,n-k-1))
x[0,n-k-1] = s/W[n-k-1,n-k-1]
print(x)
and without print statements
n=3
for k in range(n):
s = y[n-k-1]
for i in range(0,k):
s = s - W[n-k-1,n-i-1]*x[0,n-i-1]
x[0,n-k-1] = s/W[n-k-1,n-k-1]
print(x)
Output
s=y[2]
x[0,2] = s/W[2,2]
s=y[1]
s - W[1,2]*x[0,2]
x[0,1] = s/W[1,1]
s=y[0]
s - W[0,2]*x[0,2]
s - W[0,1]*x[0,1]
x[0,0] = s/W[0,0]
[[-2.30857143 2.16571429 -1.64857143]]
I need obtain a "W" matrix of multiples matrix multiplications (all multiplications result in column vectors).
from numpy import matrix
from numpy import transpose
from numpy import matmul
from numpy import dot
# Iterative matrix multiplication
def iterativeMultiplication(X, Y):
W = [] # Matrix of matricial products
X = matrix(X) # same number of rows
Y = matrix(Y) # same number of rows
h = 0
while (h < X.shape[1]):
W.append([])
W[h] = dot(transpose(X), Y) # using "dot" function
h += 1
return W
But, unexpectedly, I obtain a list of objects with their respective data types.
X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [2.,5.,4.]]
Y = [[-0.2], [1.1], [5.9], [12.3]] # Edit Y column
iterativeMultiplication( X, Y )
Results in:
[array([[37.5],[73.3],[60.8]]),
array([[37.5],[73.3],[60.8]]),
array([[37.5],[73.3],[60.8]])]
I need any method for obtain only the numerical values for the matrix conversion.
W = matrix(W) # Results in error
It is the same using "matmul" function. Thx for your time.
If you want to stack multiple matrices, you can use numpy.vstack:
W = numpy.vstack(W)
Edit: There seems to be a discrepancy between your function, X and Y versus the "result" list in your question. But based on your comments below, what you're actually looking for is numpy.hstack (horizontal stack) which will give you the desired 3x3 matrix based on your "result" list.
W = numpy.hstack(W)
Of course you are going to get a list. You initial W as a list, and append the same calculation to it 3 times.
But your 3 element arrays don't make sense with this data, array([[ 3.36877336],[ 3.97112615],[ 3.8092797 ]]).
If I make Xm=np.matrix(X), etc:
In [162]: Xm
Out[162]:
matrix([[ 0., 0., 1.],
[ 1., 0., 0.],
[ 2., 2., 2.],
[ 2., 5., 4.]])
In [163]: Ym
Out[163]:
matrix([[ 0.1, -0.2],
[ 0.9, 1.1],
[ 6.2, 5.9],
[ 11.9, 12.3]])
In [164]: Xm.T.dot(Ym)
Out[164]:
matrix([[ 37.1, 37.5],
[ 71.9, 73.3],
[ 60.1, 60.8]])
In [165]: Xm.T*Ym # matrix interprets * as .dot
Out[165]:
matrix([[ 37.1, 37.5],
[ 71.9, 73.3],
[ 60.1, 60.8]])
You need to edit the question, to have both valid Python code (missing def and :), and results that match the inputs.
===============
In [173]: Y = [[-0.2], [1.1], [5.9], [12.3]]
In [174]: Ym=np.matrix(Y)
Out[176]:
matrix([[ 37.5],
[ 73.3],
[ 60.8]])
=====================
This iteration is clumsy:
h = 0
while (h < X.shape[1]):
W.append([])
W[h] = dot(transpose(X), Y) # using "dot" function
h += 1
A more Pythonic approach
for h in range(X.shape[1]):
W.append(np.dot(...))
Or even
W = [np.dot(....) for h in range(X.shape[1])]
I have a list which consists of several numpy arrays with different shapes.
I want to reshape this list of arrays into a numpy vector and then change each element in the vector and then reshape it back to the original list of arrays.
For example:
input
[numpy.zeros((2,2)), numpy.ones((3,3))]
First
To vector
[0,0,0,0,1,1,1,1,1,1,1,1,1]
Second
every time change only one element. for example change the 1st element 0 to 2
[0,2,0,0,1,1,1,1,1,1,1,1,1]
Last
convert it back to
[array([[0,2],[0,0]]),array([[1,1,1],[1,1,1],[1,1,1]])]
Is there any fast implementation? Thanks very much.
It seems like converting to a list and back will be inefficient. Instead, why not figure out which array to index (and where) and then just update that index? e.g.
def change_element(arr1, arr2, ix, value):
which = ix >= arr1.size
arr = [arr1, arr2][which]
ix = ix - arr1.size if which else ix
arr.ravel()[ix] = value
And here's some example usage:
>>> arr1 = np.zeros((2, 2))
>>> arr2 = np.ones((3, 3))
>>> change_element(arr1, arr2, 1, 2)
>>> change_element(arr1, arr2, 6, 3.14)
>>> arr1
array([[ 0., 2.],
[ 0., 0.]])
>>> arr2
array([[ 1. , 1. , 3.14],
[ 1. , 1. , 1. ],
[ 1. , 1. , 1. ]])
>>> change_element(arr1, arr2, 7, 3.14)
>>> arr1
array([[ 0., 2.],
[ 0., 0.]])
>>> arr2
array([[ 1. , 1. , 3.14],
[ 3.14, 1. , 1. ],
[ 1. , 1. , 1. ]])
A few notes -- This updates the arrays in place. It doesn't create new arrays. If you really need to create new arrays, I suppose you could np.copy them and return. Also, this relies on the arrays sharing memory before and after the ravel. I don't remember the exact circumstances where ravel would return a new array rather than a view into the original array . . .
Generalizing to more arrays is actually quite easy. We just need to walk down the list of arrays and see if ix is less than the array size. If it is, we've found our array. If it isn't, we need to subtract the array's size from ix to represent the number of elements we've traversed thus far:
def change_element(arrays, ix, value):
for arr in arrays:
if ix < arr.size:
arr.ravel()[ix] = value
return
ix -= arr.size
And you can call this similar to before:
change_element([arr1, arr2], 6, 3.14159)
#mgilson probably has the best answer for you, but if you absolutely have to convert to a flat list first and then go back again (perhaps because you need to do something else with the flat list as well), then you can do this with list comprehensions:
lst = [numpy.zeros((2,4)), numpy.ones((3,3))]
tlist = [e for a in lst for e in a.ravel()]
tlist[1] = 2
i = 0
lst2 = []
dims = [a.shape for a in lst]
for n, m in dims:
lst2.append(np.array(tlist[i:i+n*m]).reshape(n,m))
i += n*m
lst2
[array([[ 0., 2.],
[ 0., 0.]]), array([[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.]])]
Of course, you lose the information about your array sizes when you flatten, so you need to store them somewhere (here, in dims).
I have data which I need to center and scale so that is centered around the origin. Then the data needs to be rotated so that the direction of maximum variance is on the x-axis. The mean of the data and the covariance is then calculated. I need the first element of the covariance matrix to be 1. I think this is done by adjusting the scaling factor, but I can't figure out what the scaling factor should be.
To center the data I take away the mean, and to rotate I use SVD, but the scaling is still my problem.
signature = numpy.loadtxt(name, comments = '%', usecols = (0,cols-1))
signature = numpy.transpose(signature)
#SVD to get D so that data can be scaled by 1/(highest singular value in D)
U, D, Vt = numpy.linalg.svd( signature , full_matrices=0)
cs = utils.centerscale(signature, scale=False)
signature = cs[0]
#plt.scatter(cs[0][0],cs[0][1],color='r')
#SVD so that data can be rotated so that direction of most variance is on x-axis
U, D, Vt = numpy.linalg.svd( signature , full_matrices=0)
cs = utils.centerscale(signature, center=False, scalefactor=D[0])
U, D, Vt = numpy.linalg.svd( cs[0] , full_matrices=0)
D = numpy.diag(D)
norm = numpy.dot(D,Vt)
The following are examples of results of the mean and cov of norm (the test cases use res).
**********************************************************************
Failed example:
print numpy.mean(res, axis=1)
Expected:
[ 7.52074907e-18 -6.59917722e-18]
Got:
[ -1.22008884e-17 2.41126563e-17]
**********************************************************************
Failed example:
print numpy.cov(res, bias=1)
Expected:
[[ 1.00000000e+00 9.02112676e-18]
[ 9.02112676e-18 1.40592827e-01]]
Got:
[[ 4.16666667e-03 -1.57698124e-19]
[ -1.57698124e-19 5.85803446e-04]]
**********************************************************************
1 items had failures:
2 of 4 in __main__.processfile
***Test Failed*** 2 failures.
All values are irrelevant except for the first element of the covariance matrix, that needs to be one.
I have tried looking everywhere and can't find an answer. Any help would be appreciated.
I don't know what utils.centerscale is or does, but if you want to scale a matrix by a constant factor so that the upper left term of its covariance matrix is 1, you can simply divide the matrix by the square root of the unscaled covariance term:
>>> import numpy
>>> numpy.random.seed(17)
>>> m = numpy.random.rand(5,4)
>>> m
array([[ 0.294665 , 0.53058676, 0.19152079, 0.06790036],
[ 0.78698546, 0.65633352, 0.6375209 , 0.57560289],
[ 0.03906292, 0.3578136 , 0.94568319, 0.06004468],
[ 0.8640421 , 0.87729053, 0.05119367, 0.65241862],
[ 0.55175137, 0.59751325, 0.48352862, 0.28298816]])
>>> c = numpy.cov(m,bias=1)
>>> c
array([[ 0.0288779 , 0.00524455, 0.00155373, 0.02779861, 0.01798404],
[ 0.00524455, 0.00592484, -0.00711072, 0.01006019, 0.00631144],
[ 0.00155373, -0.00711072, 0.13391344, -0.10551922, 0.00945934],
[ 0.02779861, 0.01006019, -0.10551922, 0.11250984, 0.00982862],
[ 0.01798404, 0.00631144, 0.00945934, 0.00982862, 0.01444482]])
>>> numpy.cov(m/c[0][0]**0.5, bias=1)
array([[ 1. , 0.18161135, 0.05380354, 0.96262562, 0.62276138],
[ 0.18161135, 0.20516847, -0.24623392, 0.3483699 , 0.21855613],
[ 0.05380354, -0.24623392, 4.63722877, -3.65397781, 0.32756326],
[ 0.96262562, 0.3483699 , -3.65397781, 3.89605297, 0.34035085],
[ 0.62276138, 0.21855613, 0.32756326, 0.34035085, 0.5002033 ]])
But this has the same effect as simply dividing the covariance matrix by the upper left member:
>>> (numpy.cov(m,bias=1)/numpy.cov(m,bias=1)[0][0])/(numpy.cov(m/c[0][0]**0.5, bias=1))
array([[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]])
Depending on what you're doing, you might also be interested in numpy.corrcoef, which gives the correlation coefficient matrix instead.