I am trying to write a function which would estimate data noise (σ2) based on three NP arrays - One augmented X-matrix and the two vectors - the y-target and the MAP weights:
This function should return the empirical data noise estimate, σ2.
I have the following function:
def estimDS (X, output_y, W):
n = X.shape[0] # observations rows
d = X.shape[1] # number of features in columns
matmul = np.matmul(aug_x, ml_weights)
mult_left = (1/(n-d))
mult_right = (output_y-matmul)**2
estimDS = mult_left * mult_right
return estimDS
And this is an example on which I run function:
output_y = np.array([208500, 181500, 223500,
140000, 250000, 143000,
307000, 200000, 129900,
118000])
aug_x = np. array([[ 1., 1710., 2003.],
[ 1., 1262., 1976.],
[ 1., 1786., 2001.],
[ 1., 1717., 1915.],
[ 1., 2198., 2000.],
[ 1., 1362., 1993.],
[ 1., 1694., 2004.],
[ 1., 2090., 1973.],
[ 1., 1774., 1931.],
[ 1., 1077., 1939.]])
W = [-2.29223802e+06 5.92536529e+01 1.20780450e+03]
sig2 = estimDS(aug_x, output_y, W)
print(sig2)
Function returns an array, but I need to get this result as a float 3700666577282.7227
[5.61083809e+07 2.17473754e+07 6.81288433e+06 4.40198178e+07
1.86225354e+06 3.95549405e+08 8.78575426e+08 3.04530677e+07
3.32164594e+07 2.87861673e+06]
You forgot to sum over i=1 to n. Therefore mult_right should be defined as:
mult_right=np.sum((output_y-matmul)**2, axis=0)
Related
I'm new in machine learning and want to build a Kmean algorithm with k = 2 and I'm struggling by calculate the new centroids. here is my code for kmeans:
def euclidean_distance(x: np.ndarray, y: np.ndarray):
# x shape: (N1, D)
# y shape: (N2, D)
# output shape: (N1, N2)
dist = []
for i in x:
for j in y:
new_list = np.sqrt(sum((i - j) ** 2))
dist.append(new_list)
distance = np.reshape(dist, (len(x), len(y)))
return distance
def kmeans(x, centroids, iterations=30):
assignment = None
for i in iterations:
dist = euclidean_distance(x, centroids)
assignment = np.argmin(dist, axis=1)
for c in range(len(y)):
centroids[c] = np.mean(x[assignment == c], 0) #error here
return centroids, assignment
I have input x = [[1., 0.], [0., 1.], [0.5, 0.5]] and y = [[1., 0.], [0., 1.]] and
distance is an array and look like that:
[[0. 1.41421356]
[1.41421356 0. ]
[0.70710678 0.70710678]]
and when I run kmeans(x,y) then it returns error:
--------------------------------------------------------------------------- TypeError Traceback (most recent call
last) /tmp/ipykernel_40086/2170434798.py in
5
6 for c in range(len(y)):
----> 7 centroids[c] = (x[classes == c], 0)
8 print(centroids)
TypeError: only integer scalar arrays can be converted to a scalar
index
Does anyone know how to fix it or improve my code? Thank you in advance!
Changing inputs to NumPy arrays should get rid of errors:
x = np.array([[1., 0.], [0., 1.], [0.5, 0.5]])
y = np.array([[1., 0.], [0., 1.]])
Also seems like you must change for i in iterations to for i in range(iterations) in kmeans function.
Problem
I need to create an array that takes the argmax and based on that maximum value position fill the array with [1,0] while the other fields that are not the maximum will be filled with [0,1].
Example:
Given the vector a:
a.shape = (3,2)
a = np.array([[1,0],[1,2],[1,3]])
Return the vector b:
b.shape = (3,2,2)
b = np.array([[[1,0],[0,1]],[[0,1],[1,0]],[[0,1],[1,0]]])
c = np.argmax(a, axis=1)
b = np.empty(tuple(list(a.shape) + [2]))
b[range(len(c)), c, :] = [1, 0]
b[range(len(c)), ~c, :] = [0, 1]
b
>>>array([[[1., 0.],
[0., 1.]],
[[0., 1.],
[1., 0.]],
[[0., 1.],
[1., 0.]]])
Note this only works in this example since the argmax will ever be only 0 or 1. If the second dimension in a is greater than 2 I don't think that this solution will work
I was able to create a function that returns the desirable result but will only work for two classes. It could be adapted for multiple classes:
a = np.array([[1,0],[1,2],[1,3]])
def create_dist_prob_target(arr):
p_ = np.squeeze(arr,axis=1)
a = np.expand_dims(np.where((p_ == np.amax(p_,axis = 1)[:,None]),1,0),axis=-1)
b = np.expand_dims(np.where((p_ == np.amax(p_,axis = 1)[:,None]),0,1),axis=-1)
return np.concatenate((a,b),axis=2)
b = create_dist_prob_target(a)
print(b)
I need obtain a "W" matrix of multiples matrix multiplications (all multiplications result in column vectors).
from numpy import matrix
from numpy import transpose
from numpy import matmul
from numpy import dot
# Iterative matrix multiplication
def iterativeMultiplication(X, Y):
W = [] # Matrix of matricial products
X = matrix(X) # same number of rows
Y = matrix(Y) # same number of rows
h = 0
while (h < X.shape[1]):
W.append([])
W[h] = dot(transpose(X), Y) # using "dot" function
h += 1
return W
But, unexpectedly, I obtain a list of objects with their respective data types.
X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [2.,5.,4.]]
Y = [[-0.2], [1.1], [5.9], [12.3]] # Edit Y column
iterativeMultiplication( X, Y )
Results in:
[array([[37.5],[73.3],[60.8]]),
array([[37.5],[73.3],[60.8]]),
array([[37.5],[73.3],[60.8]])]
I need any method for obtain only the numerical values for the matrix conversion.
W = matrix(W) # Results in error
It is the same using "matmul" function. Thx for your time.
If you want to stack multiple matrices, you can use numpy.vstack:
W = numpy.vstack(W)
Edit: There seems to be a discrepancy between your function, X and Y versus the "result" list in your question. But based on your comments below, what you're actually looking for is numpy.hstack (horizontal stack) which will give you the desired 3x3 matrix based on your "result" list.
W = numpy.hstack(W)
Of course you are going to get a list. You initial W as a list, and append the same calculation to it 3 times.
But your 3 element arrays don't make sense with this data, array([[ 3.36877336],[ 3.97112615],[ 3.8092797 ]]).
If I make Xm=np.matrix(X), etc:
In [162]: Xm
Out[162]:
matrix([[ 0., 0., 1.],
[ 1., 0., 0.],
[ 2., 2., 2.],
[ 2., 5., 4.]])
In [163]: Ym
Out[163]:
matrix([[ 0.1, -0.2],
[ 0.9, 1.1],
[ 6.2, 5.9],
[ 11.9, 12.3]])
In [164]: Xm.T.dot(Ym)
Out[164]:
matrix([[ 37.1, 37.5],
[ 71.9, 73.3],
[ 60.1, 60.8]])
In [165]: Xm.T*Ym # matrix interprets * as .dot
Out[165]:
matrix([[ 37.1, 37.5],
[ 71.9, 73.3],
[ 60.1, 60.8]])
You need to edit the question, to have both valid Python code (missing def and :), and results that match the inputs.
===============
In [173]: Y = [[-0.2], [1.1], [5.9], [12.3]]
In [174]: Ym=np.matrix(Y)
Out[176]:
matrix([[ 37.5],
[ 73.3],
[ 60.8]])
=====================
This iteration is clumsy:
h = 0
while (h < X.shape[1]):
W.append([])
W[h] = dot(transpose(X), Y) # using "dot" function
h += 1
A more Pythonic approach
for h in range(X.shape[1]):
W.append(np.dot(...))
Or even
W = [np.dot(....) for h in range(X.shape[1])]
Given these two arrays:
E = [[16.461, 17.015, 14.676],
[15.775, 18.188, 14.459],
[14.489, 18.449, 14.756],
[14.171, 19.699, 14.406],
[14.933, 20.644, 13.839],
[16.233, 20.352, 13.555],
[16.984, 21.297, 12.994],
[16.683, 19.056, 13.875],
[17.918, 18.439, 13.718],
[17.734, 17.239, 14.207]]
S = [[0.213, 0.660, 1.287],
[0.250, 2.016, 1.509],
[0.016, 2.995, 0.619],
[0.142, 4.189, 1.194],
[0.451, 4.493, 2.459],
[0.681, 3.485, 3.329],
[0.990, 3.787, 4.592],
[0.579, 2.170, 2.844],
[0.747, 0.934, 3.454],
[0.520, 0.074, 2.491]]
The problem states that I should get the 3x3 covariance matrix (C) between S and E using the following formula:
C = (1/(n-1))[S'E - (1/10)S'i i'E]
Here n is 10, and i is an n x 1 column vector consisting of only ones. S' and i' are the transpose of matrix S and column vector i, respectively.
So far, I can't get C because I don't understand the meaning of i (and i') and its implementation in the formula. Using numpy, so far I do:
import numpy as np
tS = numpy.array(S).T
C = (1.0/9.0)*(np.dot(tS, E)-((1.0/10.0)*np.dot(tS, E))) #Here is where I lack the i and i' implementation.
I will really appreciate your help to understand and implement i and i' in the formula. The output should be:
C= [[0.2782, 0.2139, -0.1601],
[-1.4028, 1.9619, -0.2744],
[1.0443, 0.9712, -0.6610]]
It looks like the only part you're missing is making i:
>>> i = np.ones((N, 1))
>>> i
array([[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.]])
After that, we get
>>> C = (1.0/(N-1)) * (S.T.dot(E) - (1.0/N) * S.T.dot(i) * i.T.dot(E))
>>> C
array([[ 0.27842301, 0.21388842, -0.16011839],
[-1.4017267 , 1.96193373, -0.27441417],
[ 1.04532836, 0.97120807, -0.66095656]])
Note that this doesn't quite produce the array you expected, which is more obvious if you round it, but maybe there are some minor typos in your data?
>>> C.round(4)
array([[ 0.2784, 0.2139, -0.1601],
[-1.4017, 1.9619, -0.2744],
[ 1.0453, 0.9712, -0.661 ]])
This is what you want I guess:
S = numpy.array(S)
E = numpy.array(E)
ones = np.ones((10,1))
C = (1.0/9)*(np.dot(S.T, E)-((1.0/10)* (np.dot(np.dot(np.dot(S.T,ones),ones.T),E))))
My output is :
array([[ 0.27842301, 0.21388842, -0.16011839],
[-1.4017267 , 1.96193373, -0.27441417],
[ 1.04532836, 0.97120807, -0.66095656]])
I have data which I need to center and scale so that is centered around the origin. Then the data needs to be rotated so that the direction of maximum variance is on the x-axis. The mean of the data and the covariance is then calculated. I need the first element of the covariance matrix to be 1. I think this is done by adjusting the scaling factor, but I can't figure out what the scaling factor should be.
To center the data I take away the mean, and to rotate I use SVD, but the scaling is still my problem.
signature = numpy.loadtxt(name, comments = '%', usecols = (0,cols-1))
signature = numpy.transpose(signature)
#SVD to get D so that data can be scaled by 1/(highest singular value in D)
U, D, Vt = numpy.linalg.svd( signature , full_matrices=0)
cs = utils.centerscale(signature, scale=False)
signature = cs[0]
#plt.scatter(cs[0][0],cs[0][1],color='r')
#SVD so that data can be rotated so that direction of most variance is on x-axis
U, D, Vt = numpy.linalg.svd( signature , full_matrices=0)
cs = utils.centerscale(signature, center=False, scalefactor=D[0])
U, D, Vt = numpy.linalg.svd( cs[0] , full_matrices=0)
D = numpy.diag(D)
norm = numpy.dot(D,Vt)
The following are examples of results of the mean and cov of norm (the test cases use res).
**********************************************************************
Failed example:
print numpy.mean(res, axis=1)
Expected:
[ 7.52074907e-18 -6.59917722e-18]
Got:
[ -1.22008884e-17 2.41126563e-17]
**********************************************************************
Failed example:
print numpy.cov(res, bias=1)
Expected:
[[ 1.00000000e+00 9.02112676e-18]
[ 9.02112676e-18 1.40592827e-01]]
Got:
[[ 4.16666667e-03 -1.57698124e-19]
[ -1.57698124e-19 5.85803446e-04]]
**********************************************************************
1 items had failures:
2 of 4 in __main__.processfile
***Test Failed*** 2 failures.
All values are irrelevant except for the first element of the covariance matrix, that needs to be one.
I have tried looking everywhere and can't find an answer. Any help would be appreciated.
I don't know what utils.centerscale is or does, but if you want to scale a matrix by a constant factor so that the upper left term of its covariance matrix is 1, you can simply divide the matrix by the square root of the unscaled covariance term:
>>> import numpy
>>> numpy.random.seed(17)
>>> m = numpy.random.rand(5,4)
>>> m
array([[ 0.294665 , 0.53058676, 0.19152079, 0.06790036],
[ 0.78698546, 0.65633352, 0.6375209 , 0.57560289],
[ 0.03906292, 0.3578136 , 0.94568319, 0.06004468],
[ 0.8640421 , 0.87729053, 0.05119367, 0.65241862],
[ 0.55175137, 0.59751325, 0.48352862, 0.28298816]])
>>> c = numpy.cov(m,bias=1)
>>> c
array([[ 0.0288779 , 0.00524455, 0.00155373, 0.02779861, 0.01798404],
[ 0.00524455, 0.00592484, -0.00711072, 0.01006019, 0.00631144],
[ 0.00155373, -0.00711072, 0.13391344, -0.10551922, 0.00945934],
[ 0.02779861, 0.01006019, -0.10551922, 0.11250984, 0.00982862],
[ 0.01798404, 0.00631144, 0.00945934, 0.00982862, 0.01444482]])
>>> numpy.cov(m/c[0][0]**0.5, bias=1)
array([[ 1. , 0.18161135, 0.05380354, 0.96262562, 0.62276138],
[ 0.18161135, 0.20516847, -0.24623392, 0.3483699 , 0.21855613],
[ 0.05380354, -0.24623392, 4.63722877, -3.65397781, 0.32756326],
[ 0.96262562, 0.3483699 , -3.65397781, 3.89605297, 0.34035085],
[ 0.62276138, 0.21855613, 0.32756326, 0.34035085, 0.5002033 ]])
But this has the same effect as simply dividing the covariance matrix by the upper left member:
>>> (numpy.cov(m,bias=1)/numpy.cov(m,bias=1)[0][0])/(numpy.cov(m/c[0][0]**0.5, bias=1))
array([[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]])
Depending on what you're doing, you might also be interested in numpy.corrcoef, which gives the correlation coefficient matrix instead.