Error: ValueError: shapes (3,1) and (3,2) not aligned: 1 (dim 1) != 3 (dim 0)
The error occurs because the matrices are different sizes, but how can I multiply two matrices with different size and where the resulting output should be: [-0.78 0.85]?
import numpy as np
x1 = 3-7/3;
x2 = 2-4/3;
x3 = 1-5/3;
X = ([x1], [x2],[x3])
V = ([-0.99, -0.13], [-0.09, 0.70],[0.09, -0.70])
res = np.dot(X,V)
print("Res: ",res)
Any help is appreciated!
Mathematical question, for better understanding:
A principal component analysis is carried out on a dataset comprised of three data points x1, x2 and x3 collected in a N × M matrix X such that each row of the matrix is a data point. Suppose the matrix X ̃ corresponds to X with the mean of each columns substracted i.e.
X = ([3.00, 2.00, 1.00],[4.00, 1.00, 2.00],[0.00, 1.00, 2.00])
and suppose X ̃ has the singular value decomposition:
V = ([-0.99, -0.13, -0.00], [-0.09, 0.70, -0.71],[0.09, -0.70, -0.71])
What is the (rounded to two significant digits) coordinates of the first observation x1 projected onto the 2-Dimensional subspace containing the maximal variation?
Answer:
The projection can be found by substracting the mean from X
and projecting onto the first two columns of V. The first point with the mean subtracted has coordinates: [2-7/3 2-4/3 1-5/3]
This should be (left) multiplied with the first two columns of V:
([3-7/3], [2-4/3],[1-5/3]) * ([-0.99, -0.13], [-0.09, 0.70],[0.09, -0.70]) = [-0.78 0.85]
So I am trying to find out how to calculate this in python.
I am assuming you wish to perform matrix multliplication. This cannot be achieved if the dimensions of the matrices are different. You can achieve the desired result by using reshape and numpy.matmul().
Code:
import numpy as np
x1 = 3-7/3;
x2 = 2-4/3;
x3 = 1-5/3;
X = np.array([[x1], [x2],[x3]])
X = X.reshape(1, 3)
V = np.array([[-0.99, -0.13], [-0.09, 0.70],[0.09, -0.70]])
res = np.matmul(X, V)
print("Res: ",res)
Related
I'm new to the numpy in general so this is an easy question however i'm clueless as how to solve it.
i'm trying to implement K nearest neighbor algorithm for classification of a Data set
there are to arrays named new_points and point that respectively have the shape of (30,4)
and (120,4) (with 4 being the total number of the properties of each element)
so i'm trying to calculate the distance between each new point and all old points using numpy.broadcasting
def calc_no_loop(new_points, points):
return np.sum((new_points-points)**2,axis=1)
#doesn't work here is log
ValueError: operands could not be broadcast together with shapes (30,4) (120,4)
however as per rules of broadcasting two array of shapes (30,4) and (120,4) are incompatible
so i would appreciate any insight on how to slove this (using .reshape prehaps - not sure)
please note: that i'have already implemented the same function using one and two loops but can't implement it without one
def calc_two_loops(new_points, points):
m, n = len(new_points), len(points)
d = np.zeros((m, n))
for i in range(m):
for j in range(n):
d[i, j] = np.sum((new_points[i] - points[j])**2)
return d
def calc_one_loop(new_points, points):
m, n = len(new_points), len(points)
d = np.zeros((m, n))
print(d)
for i in range(m):
d[i] = np.sum((new_points[i] - points)**2)
return d
Let's create an exapmle smaller in size:
nNew = 3; nOld = 5 # Number of new / old points
# New points
new_points = np.arange(100, 100 + nNew * 4).reshape(nNew, 4)
# Old points
points = np.arange(10, 10 + nOld * 8, 2).reshape(nOld, 4)
To compute the differences alone, run:
dfr = new_points[:, np.newaxis, :] - points[np.newaxis, :, :]
So far we have differences in each property of each point (every new point with every old point).
The shape of dfr is (3, 5, 4):
first dimension: the number of new point,
second dimension: the number of old point,
third dimension: the difference in each property.
Then, to sum squares of differences by points, run:
d = np.power(dfr, 2).sum(axis=2)
and this is your result.
For my sample data, the result is:
array([[31334, 25926, 21030, 16646, 12774],
[34230, 28566, 23414, 18774, 14646],
[37254, 31334, 25926, 21030, 16646]], dtype=int32)
So you have 30 new points, and 120 old points, so if I understand you correctly you want a shape(120,30) array result of distances.
You could do
import numpy as np
points = np.random.random(120*4).reshape(120,4)
new_points = np.random.random(30*4).reshape(30,4)
def calc_no_loop(new_points, points):
res = np.zeros([len(points[:,0]),len(new_points[:,0])])
for idx in range(len(points[:,0])):
res[idx,:] = np.sum((points[idx,:]-new_points)**2,axis=1)
return np.sqrt(res)
test = calc_no_loop(new_points,points)
print(np.shape(test))
print(test)
Which gives
(120, 30)
[[0.67166838 0.78096694 0.94983683 ... 1.00960301 0.48076185 0.56419991]
[0.88156338 0.54951826 0.73919191 ... 0.87757896 0.76305462 0.52486626]
[0.85271938 0.56085692 0.73063341 ... 0.97884167 0.90509791 0.7505591 ]
...
[0.53968258 0.64514941 0.89225849 ... 0.99278462 0.31861253 0.44615026]
[0.51647526 0.58611128 0.83298535 ... 0.86669406 0.64931403 0.71517123]
[1.08515826 0.64626221 0.6898687 ... 0.96882542 1.08075076 0.80144746]]
But from your function name above I get the notion that you do not want a loop? Then you could do this instead:
def calc_no_loop(new_points, points):
new_points1 = np.repeat(new_points[np.newaxis,...],len(points),axis=0)
points1 = np.repeat(points[:,np.newaxis,:],len(new_points),axis=1)
return np.sqrt(np.sum((new_points-points1)**2 ,axis=2))
test = calc_no_loop(new_points,points)
print(np.shape(test))
print(test)
which has output:
(120, 30)
[[0.67166838 0.78096694 0.94983683 ... 1.00960301 0.48076185 0.56419991]
[0.88156338 0.54951826 0.73919191 ... 0.87757896 0.76305462 0.52486626]
[0.85271938 0.56085692 0.73063341 ... 0.97884167 0.90509791 0.7505591 ]
...
[0.53968258 0.64514941 0.89225849 ... 0.99278462 0.31861253 0.44615026]
[0.51647526 0.58611128 0.83298535 ... 0.86669406 0.64931403 0.71517123]
[1.08515826 0.64626221 0.6898687 ... 0.96882542 1.08075076 0.80144746]]
i.e. the same result. Note that I added the np.sqrt() into the result which you may have forgotten in your example above.
I keep getting error 'index 3 is out of bounds for axis 1 with size 3' but I'm sure that I'm using a (3,n) matrix rather than a (n,3) one. I'm not very familiar with matrices in python so have been using a kind of hacky way of getting them into the shape I want so I can multiply or add them. Can anyone see where I've gone wrong or suggest some better practice?
I'm trying to perform a rotational transform on A, generated via:
A = array(random.rand(3, 9));
where A is containes a set of x,y,z coordinates in every column. E.g:
Matrix A:
[[0.70799333 0.77123425 0.07271538 0.52498025 0.84353825 0.78331767
0.06428417 0.25629863 0.6654734 0.77562903]
[0.34179928 0.83233168 0.3920859 0.19819796 0.22486337 0.09274312
0.49057914 0.69716143 0.613912 0.04940198]
[0.98522559 0.71273242 0.70784866 0.61589377 0.34007973 0.34492078
0.44491238 0.37423906 0.37427018 0.13558728]]
The translated matrix is calculated via A_translated = re_R.(each column of A) + ret_t, where
ret_R:
[[ 0.1928724 0.90776212 0.372516 ]
[ 0.27931303 -0.41473028 0.8660156 ]
[ 0.94062983 -0.06298194 -0.33353981]]
and
ret_t:
[[0.93445859]
[0.59949888]
[0.77385835]]
My attempt was as follows
count = 0
num_rows, num_cols = A.shape
translated_A = pd.DataFrame( zeros( (num_rows, num_cols) ) )
print('Translated A: \n', translated_A)
for i in range(0, num_cols):
multiply = ret_R.A[:,i] # works up until (not including) i = 3
#IndexError: index 3 is out of bounds for axis 1 with size 3
print('Multiply: \n', multiply)
multiply2 = np.matrix(pd.DataFrame(multiply))
matrix = multiply2 + ret_t #works
matrix2 = pd.DataFrame(matrix) #np.matrix(pd.DataFrame(matrix)) # not working ?
print('Matrix:', matrix2)
translated_A[i] = matrix2[0]
print(translated_A)
The line multiply = ret_R.A[:,i] only works up until and not including i = 3, which suggests that my A matrix is n,3 but I'm sure it's 3,n. I kept switching between matrices and data frames as this seemed to work but it doesn't work past i = 2.
I've realised that I should be using an '#' to find the dot product of the matrices properly rather than a '.' and I had to transpose multiply2 to get an matrix in the form [ [] [] [] ]. I no longer have to keep switching between a data frame and matrix
I have used interp2 in Matlab, such as the following code, that is part of #rayryeng's answer in: Three dimensional (3D) matrix interpolation in Matlab:
d = size(volume_image)
[X,Y] = meshgrid(1:1/scaleCoeff(2):d(2), 1:1/scaleCoeff(1):d(1));
for ind = z
%Interpolate each slice via interp2
M2D(:,:,ind) = interp2(volume_image(:,:,ind), X, Y);
end
Example of Dimensions:
The image size is 512x512 and the number of slices is 133. So:
volume_image(rows, columns, slices in 3D dimenson) : 512x512x133 in 3D dimenson
X: 288x288
Y: 288x288
scaleCoeff(2): 0.5625
scaleCoeff(1): 0.5625
z = 1 up to 133 ,hence z: 1x133
ind: 1 up to 133
M2D(:,:,ind) finally is 288x288x133 in 3D dimenson
Aslo, Matlabs syntax for size: (rows, columns, slices in 3rd dimenson) and Python syntax for size: (slices in 3rd dim, rows, columns).
However, after convert the Matlab code to Python code occurred an error, ValueError: Invalid length for input z for non rectangular grid:
for ind in range(0, len(z)+1):
M2D[ind, :, :] = interpolate.interp2d(X, Y, volume_image[ind, :, :]) # ValueError: Invalid length for input z for non rectangular grid
What is wrong? Thank you so much.
In MATLAB, interp2 has as arguments:
result = interp2(input_x, input_y, input_z, output_x, output_y)
You are using only the latter 3 arguments, the first two are assumed to be input_x = 1:size(input_z,2) and input_y = 1:size(input_z,1).
In Python, scipy.interpolate.interp2 is quite different: it takes the first 3 input arguments of the MATLAB function, and returns an object that you can call to get interpolated values:
f = scipy.interpolate.interp2(input_x, input_y, input_z)
result = f(output_x, output_y)
Following the example from the documentation, I get to something like this:
from scipy import interpolate
x = np.arange(0, volume_image.shape[2])
y = np.arange(0, volume_image.shape[1])
f = interpolate.interp2d(x, y, volume_image[ind, :, :])
xnew = np.arange(0, volume_image.shape[2], 1/scaleCoeff[0])
ynew = np.arange(0, volume_image.shape[1], 1/scaleCoeff[1])
M2D[ind, :, :] = f(xnew, ynew)
[Code not tested, please let me know if there are errors.]
You might be interested in scipy.ndimage.zoom. If you are interpolating from one regular grid to another, it is much faster and easier to use than scipy.interpolate.interp2d.
See this answer for an example:
https://stackoverflow.com/a/16984081/1295595
You'd probably want something like:
import scipy.ndimage as ndimage
M2D = ndimage.zoom(volume_image, (1, scaleCoeff[0], scaleCoeff[1])
I want to use the dendogram of scipy.
I have the following data:
I have a list with seven different means. For example:
Y = [71.407452200146807, 0, 33.700136456196823, 1112.3757110973756, 31.594949722819372, 34.823881975554166, 28.36368420190157]
Each mean is calculate for a different user. For example:
X = ["user1", "user2", "user3", "user4", "user5", "user6", "user7"]
My aim is to display the data described above with the help of a dendorgram.
I tried the following:
Y = [71.407452200146807, 0, 33.700136456196823, 1112.3757110973756, 31.594949722819372, 34.823881975554166, 28.36368420190157]
X = ["user1", "user2", "user3", "user4", "user5", "user6", "user7"]
# Attempt with matrix
#X = np.concatenate((X, Y),)
#Z = linkage(X)
Z = linkage(Y)
# Plot the dendogram with the results above
dendrogram(Z, leaf_rotation=45., leaf_font_size=12. , show_contracted=True)
plt.style.use("seaborn-whitegrid")
plt.title("Dendogram to find clusters")
plt.ylabel("Distance")
plt.show()
But it says:
ValueError: Length n of condensed distance matrix 'y' must be a binomial coefficient, i.e.there must be a k such that (k \choose 2)=n)!
I already tried to convert my data into a matrix. With:
# Attempt with matrix
#X = np.concatenate((X, Y),)
#Z = linkage(X)
But that doesn´t work too!
Are there any suggestions?
Thanks :-)
The first argument of linkage is either an n x m array, representing n points in m-dimensional space, or a one-dimensional array containing the condensed distance matrix. These are two very different meanings! The first is the raw data, i.e. the observations. The second format assumes that you have already computed all the distances between your observations, and you are providing these distances to linkage, not the original points.
It looks like you want the first case (raw data), with m = 1. So you must reshape the input to have shape (n, 1).
Replace this:
Z = linkage(Y)
with:
Z = linkage(np.reshape(Y, (len(Y), 1)))
So you are using 7 observations in Y len(Y) = 7.
But as per documentation of Linkage, the number of observations len(Y) should be such that.
{n \choose 2} = len(Y)
which means
1/2 * (n -1) * n = len(Y)
so length of Y should be such that n is a valid integer.
I want to translate the following group coloring octave function to python and use it with pyplot.
Function input:
x - Data matrix (m x n)
a - A parameter.
index - A vector of size "m" with values in range [: a]
(For example if a = 4, index can be [random.choice(range(4)) for i in range(m)]
The values in "index" indicate the number of the group the "m"th data point belongs to.
The function should plot all the data points from x and color them in different colors (Number of different colors is "a").
The function in octave:
p = hsv(a); % This is a x 3 metrix
colors = p(index, :); % ****This is m x 3 metrix****
scatter(X(:,1), X(:,2), 10, colors);
I couldn't find a function like hsv in python, so I wrote it myself (I think I did..):
p = colors.hsv_to_rgb(numpy.column_stack((
numpy.linspace(0, 1, a), numpy.ones((a ,2)) )) )
But I can't figure out how to do the matrix selection p(index, :) in python (numpy).
Specially because the size of "index" is bigger then "a".
Thanks in advance for your help.
So, you want to take an m x 3 of HSV values, and convert each row to RGB?
import numpy as np
import colorsys
mymatrix = np.matrix([[11,12,13],
[21,22,23],
[31,32,33]])
def to_hsv(x):
return colorsys.rgb_to_hsv(*x)
#Apply the to_hsv function to each matrix row.
print np.apply_along_axis(to_hsv, axis=1, arr=mymatrix)
This produces:
[[ 0.5 0. 13. ]
[ 0.5 0. 23. ]
[ 0.5 0. 33. ]]
Follow through on your comment:
If I understand you have a matrix p that is an a x 3 matrix, and you want to randomly select rows from the matrix over and over again, until you have a new matrix that is m x 3?
Ok. Let's say you have a matrix p defined as follows:
a = 5
p = np.random.randint(5, size=(a, 3))
Now, make a list of random integers between the range 0 -> 3 (index starts at 0 and ends to a-1), That is m in length:
m = 20
index = np.random.randint(a, size=m)
Now access the right indexes and plug them into a new matrix:
p_prime = np.matrix([p[i] for i in index])
Produces a 20 x 3 matrix.