I am new to NumPy and have only limited knowledge of Matlab. I have the following Matlab command to create row and column vectors:
X=(-N:N)
Y=column(X.^2)
I am trying to create the same thing in NumPy but the shape of the vector X and Y are the same despite doing transpose:
import numpy as np
N=10
X=np.arange(-N,N)
Y=X**2.T
print X.shape, Y.shape
Could you please let me know if np.arange() is the equivalent of (-N:N) in matlab and what is the problem with the column vector in NumPy?
It's a little more verbose in python:
import numpy as np
X = np.arange(-10,11) #same as X=-10:10; in matlab
Y = X**2 # same as X.^2 in matlab
Y.shape = (np.size(Y),1) #forces it to be column vec
Related
I have read data frame of sensor data, using pandas read_fwf function.
I need to find covariance matrix of read 928991 x 8 matrix. Eventually,
I want to find eigen vectors and eigen values, using principal component analysis algorithm for this covariance matrix.
First, you need to put the pandas dataframe to a numpy array by using df.values. For example:
A = df.values
It would be much easy to compute either covariance matrix or PCA after you put your data into a numpy array. For more:
# import functions you need to compute covariance matrix from numpy
from numpy import array
from numpy import mean
from numpy import cov
from numpy.linalg import eig
# assume you load your data using pd.read_fwf to variable *df*
df = pd.read_fwf(filepath, widths=col_widths, names=col_names)
#put dataframe values to a numpy array
A = df.values
#check matrix A's shape, it should be (928991, 8)
print(A.shape)
# calculate the mean of each column
M = mean(A.T, axis=1)
print(M)
# center columns by subtracting column means
C = A - M
print(C)
# calculate covariance matrix of centered matrix
V = cov(C.T)
print(V)
# eigendecomposition of covariance matrix
values, vectors = eig(V)
print(vectors)
print(values)
# project data
P = vectors.T.dot(C.T)
print(P.T)
Running the example first prints the original matrix, then the eigenvectors and eigenvalues of the centered covariance matrix followed finally by the projection of the original matrix. Here is a link you may found useful for your PCA task.
Why not just use the pd.DataFrame.cov function?
The answer of this question would be as follows
import pandas as pd
import numpy as np
from numpy.linalg import eig
df_sensor_data = pd.read_csv('HT_Sensor_dataset.dat', delim_whitespace=True)
del df_sensor_data['id']
del df_sensor_data['time']
del df_sensor_data['Temp.']
del df_sensor_data['Humidity']
df = df_sensor_data.notna().astype('float64')
covariance_matrix = df_sensor_data.cov()
print(covariance_matrix)
values, vectors = eig(covariance_matrix)
print(values)
print(vectors)
I am very new to learning python and I am trying to scale a matrix using library np. array n x m.
the question : if a matrix with using library np.array is given as input and I don't know how big the range the matrix, how can I initialize the size of m? Are there certain features or tricks in Python that can be used for this?
import numpy as np
def scaleArray(arr: np.array);
arrayB = np.array([[1,2,4],
[3,4,5],
[2,1,0],
[0,1,0]])
scaleArray(b)
This arrayB is just for example.
Expected output :
3
arr.shape is what you are looking for, it gives you the dimensions of the nD array.
In your case, you want arr.shape[1]
I have a sparse matrix random matrix created as follows:
import numpy as np
from scipy.sparse import rand
foo = rand(100, 100, density=0.1, format='csr')
I would like to get the norm of the vector corresponding to a particular row:
row = foo.getrow(bar)
print(np.linalg.norm(row))
But this code produces an error:
ValueError: dimension mismatch
One approach would be to extract the non-zero data and then compute its L2 norm -
out = np.linalg.norm(row.data)
I need to compute in numpy where $x_i$ and $x_j$ are rows in a matrix $X$. Now I am using loop, which is very slow. Is there any numpy native function allows such computation, like einsum:
n=X.shape[0]
Y=np.zeros((n,n))
for i in range(n):
x=(X-X[i])**2
x=np.sum(x, axis=1)
Y[i]=x
return Y
BTW, I am very confused with einsum. Is there any good material for its introduction. The manual page on numpy was not very clear to me.
Approach #1
You can use broadcasting as a vectorized approach -
import numpy as np
Y = np.sum((X - X[:,None,:])**2,2)
This should be efficient with relatively smaller input arrays.
Approach #2
Seems like you are performing euclidean distance calculations and getting the squared distances. So, you can use distance.cdist like so -
import numpy as np
from scipy.spatial import distance
Y = distance.cdist(X, X, 'sqeuclidean')
This should be efficient with large input arrays.
I have the following code in Python using Numpy:
p = np.diag(1.0 / np.array(x))
How can I transform it to get the sparse matrix p2 with the same values as p without creating p first?
Use scipy.sparse.spdiags (which does a lot, and so may be confusing, at first), scipy.sparse.dia_matrix and/or scipy.sparse.lil_diags. (depending on the format you want the sparse matrix in...)
E.g. using spdiags:
import numpy as np
import scipy as sp
import scipy.sparse
x = np.arange(10)
# "0" here indicates the main diagonal...
# "y" will be a dia_matrix type of sparse array, by default
y = sp.sparse.spdiags(x, 0, x.size, x.size)
Using the scipy.sparse module,
p = sparse.dia_matrix(1.0 / np.array(x), shape=(len(x), len(x)));