Ensuring same dimensions in Python - python

The dimensions of P is (2,3,3). But the dimensions of M is (3,3). How can I ensure that both P and M have the same dimensions i.e. (2,3,3).
import numpy as np
P=np.array([[[128.22918457, 168.52413295, 209.72343319],
[129.01598287, 179.03716051, 150.68633749],
[131.00688309, 187.42601593, 193.68172751]],
[[ 64.11459228, 84.26206648, 104.86171659],
[ 64.50799144, 89.51858026, 75.34316875],
[ 65.50344155, 93.71300796, 96.84086375]]])
for x in range(0,2):
M=P[x]+1
print(M)

Just do
M = P + 1
and that ensures M and P have the same dimensions.

I don't know why you need this and for what (why you have tried to use M = P + 1 for making the shape equal?). But, you can ensure they have same shapes using assert:
assert a.shape == b.shape
It will get error when the shapes are not the same, so you can be sure that dimensions are the same if it didn't stuck and get error.

Related

Matrices help - index 3 is out of bounds for axis 1 with size 3 BUT I'm pretty sure I have a (3,n) matrix

I keep getting error 'index 3 is out of bounds for axis 1 with size 3' but I'm sure that I'm using a (3,n) matrix rather than a (n,3) one. I'm not very familiar with matrices in python so have been using a kind of hacky way of getting them into the shape I want so I can multiply or add them. Can anyone see where I've gone wrong or suggest some better practice?
I'm trying to perform a rotational transform on A, generated via:
A = array(random.rand(3, 9));
where A is containes a set of x,y,z coordinates in every column. E.g:
Matrix A:
[[0.70799333 0.77123425 0.07271538 0.52498025 0.84353825 0.78331767
0.06428417 0.25629863 0.6654734 0.77562903]
[0.34179928 0.83233168 0.3920859 0.19819796 0.22486337 0.09274312
0.49057914 0.69716143 0.613912 0.04940198]
[0.98522559 0.71273242 0.70784866 0.61589377 0.34007973 0.34492078
0.44491238 0.37423906 0.37427018 0.13558728]]
The translated matrix is calculated via A_translated = re_R.(each column of A) + ret_t, where
ret_R:
[[ 0.1928724 0.90776212 0.372516 ]
[ 0.27931303 -0.41473028 0.8660156 ]
[ 0.94062983 -0.06298194 -0.33353981]]
and
ret_t:
[[0.93445859]
[0.59949888]
[0.77385835]]
My attempt was as follows
count = 0
num_rows, num_cols = A.shape
translated_A = pd.DataFrame( zeros( (num_rows, num_cols) ) )
print('Translated A: \n', translated_A)
for i in range(0, num_cols):
multiply = ret_R.A[:,i] # works up until (not including) i = 3
#IndexError: index 3 is out of bounds for axis 1 with size 3
print('Multiply: \n', multiply)
multiply2 = np.matrix(pd.DataFrame(multiply))
matrix = multiply2 + ret_t #works
matrix2 = pd.DataFrame(matrix) #np.matrix(pd.DataFrame(matrix)) # not working ?
print('Matrix:', matrix2)
translated_A[i] = matrix2[0]
print(translated_A)
The line multiply = ret_R.A[:,i] only works up until and not including i = 3, which suggests that my A matrix is n,3 but I'm sure it's 3,n. I kept switching between matrices and data frames as this seemed to work but it doesn't work past i = 2.
I've realised that I should be using an '#' to find the dot product of the matrices properly rather than a '.' and I had to transpose multiply2 to get an matrix in the form [ [] [] [] ]. I no longer have to keep switching between a data frame and matrix

Stacking numpy arrays with unknown dimensions

Im looking for a way to stack numpy arrays from a source that can have dynamic dimensions on the zero axes.
stack_arrays = np.array([], dtype=np.float32)
sources = ["source_1", "source_2"]
for source in sources:
//return 3D array in the form of (N,W,H) where W and H are fixed but you dont know the size of W and H
new_arrays = get_arrays(source)
stack_arrays = np.append(stack_arrays , new_arrays , axis=0)
When I try to run this code I get an error:
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 1 dimension(s) and the array at index 1 has 3 dimension(s)
How can I make the np array to be able to take any kind of 2D shape and to stack it.
EDIT:
I managed to solve it by using reshape at the end.
stack_arrays = np.array([], dtype=np.float32)
dim_w, dim_h, rows = 0, 0, 0
sources = ["source_1", "source_2"]
for source in sources:
//return 3D array in the form of (N,W,H) where W and H are fixed but you dont know the size of W and H
new_arrays = get_arrays(source)
dim_w, dim_h = new_arrays.shape[1], new_arrays.shape[2]
rows = rows + new_arrays.shape[0]
stack_arrays = np.append(stack_arrays , new_arrays , axis=0)
new_arrays = new_arrays.reshape(rows, dim_w, dim_h)
np.concatenate([getarray(source) for source in sources], axis=0)
is simpler and faster.

Numpy.dot dot product function for statsmodels

I am learning statsmodels.api module to use python for regression analysis. So I started from the simple OLS model.
In econometrics, the function is like: y = Xb + e
where X is NxK dimension, b is Kx1, e is Nx1, so adding together y is Nx1. This is perfectly fine from linear algebra point of view.
But I followed the tutorial from Statsmodels as the following:
import numpy as np
nsample = 100 # total obs is 100
x = np.linspace(0, 10, 100) # using np.linspace(start, stop, number)
X = np.column_stack((x, x**2))
beta = np.array([1, 0.1, 10])
e = np.random.normal(size = nsample) # draw numbers from normal distribution
default at mu = 0, and std.dev = 1, size = set by user
# e is n x 1
# Now, we add the constant/intercept term to X
X = sm.add_constant(X)
# Now, we compute the y
y = np.dot(X, beta) + e
So this generates the correct answer. But I have a question about the generation of beta = np.array([1,0.1,10]). This beta, if we use:
beta.shape
(3,)
It has a dimension of (3,), the same goes with y and e except X:
X.shape
(100,3)
e.shape
(100,)
y.shape
(100,)
So I guess initiating array using the following three ways
o = array([1,2,3])
o1 = array([[1],[2],[3]])
o2 = array([[1,2,3]])
print(o.shape)
print(o1.shape)
print(o2.shape)
----------------
(3,)
(3, 1)
(1, 3)
If I use beta = array([[1],[2],[3]]), which is a (3,1), and np.dot(X, beta) gets me a wrong answer, although the dimension seems to work.
If I use array([[1,2,3]]), which is a row vector, the dimension doesn't match for dot product in numpy, neither in linear algebra.
So, I am wondering why for a NxK dot Kx1 numpy dot product, we have to use a (N,K) dot (K,) instead of (N,K) dot (K,1) matrices. What operation makes only np.array([1, 0.1, 10]) works for numpy.dot() while np.array([[1], [0.1], [10]]) doesn't.
Thank you very much.
Some update
Sorry about the confusion, the codes in Statsmodels are randomly generated so I tried to fix the X and get the following input:
f = array([[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15]])
o = array([1,2,3])
o1 = array([[1],[2],[3]])
o2 = array([[1,2,3]])
print(o.shape)
print(o1.shape)
print(o2.shape)
print("---------")
print(np.dot(f,o))
print(np.dot(f,o1))
r1 = np.dot(f,o)
r2 = np.dot(f,o1)
type1 = type(np.dot(f,o))
type2 = type(np.dot(f,o1))
tf = type1 is type2
tf2 = type1 == type2
print(type1)
print(type2)
print(tf)
print(tf2)
-------------------------
(3,)
(3, 1)
(1, 3)
---------
[14 32 50 68 86]
[[14]
[32]
[50]
[68]
[86]]
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
True
True
Sorry again for the confusion and inconvenience, they worked fine.
python/numpy is not a matrix-based language as it is Matlab or Octave or Scilab. These follow the rules of matrix multplication strictly. So
np.dot(f,o) ---------> f*o in Matlab/Octave/Scilab
np.dot(f,o1) ---------> f*o1 does not work in Matlab/Octave/Scilab
python/numpy has the 'broadcasting' which are the rules how the different data types and operations give together a result. It's not obvious why np.dot(f,o1) even should work, but the broadcasting defines some usefull results. You will have to consult the docs for that.
In python/numpy the * is not a matrix operator. You can find out what the broadcasting gives for
print(f*o)
print(f*o1)
print(f*o2)
Rather recently python/numpy has introduced the matrix operator #. You might find out what happens with
print(f#o)
print(f#o1)
print(f#o2)
Does this give some impressions ?

SVD command in Python v/s MATLAB

I have m = 10, n = 5, A=randn(m,n);[U,S,V]=svd(A); This returns a full 10x5 S matrix in MATLAB whereas Python only returns S as a 5x1 array. How do I recover the complete S matrix in Python? I have tried looking up several StackOverflow posts online but surprisingly doesn't shed light on this.
Also, how much does a Python IDE matter? I use Spyder but have been told that Vim is perhaps the most common.
Thanks a lot.
To recover the Complete matrix you can do as follow :
import numpy as np
m = 10
n = 5
A=np.random.randn(m,n)
U,S,V =np.linalg.svd(A)
It's right that S.shape = (5,).
You want something similar to https://www.mathworks.com/help/matlab/ref/svd.html with A = 4x2 where final S = 4×2 too.
To do that you define a matrix B = np.zeros(A.shape). And you fill its diagonal with the element of S. By diagonal I mean where i==j as follow :
B = np.zeros(A.shape)
for i in range(m) :
for j in range(n) :
if i == j : B[i,j] = S[j]
Now B.shape = (10,5) as expected
Or in a more compact form :
C = np.array([[S[j] if i==j else 0 for j in range(n)] for i in range(m)])
I hope it helps
For the second question, I use gedit (standard text editor) running the code in ipython shell.
You can have a look to jupyter too
The SVD of a matrix can be written as
A = U S V^H
Where the ^H signifies the conjugate transpose. Matlab's svd command returns U, S and V, while numpy.linalg.svd returns U, the diagonal of S, and V^H. Thus, to get the same S and V as in Matlab you need to reconstruct the S and also get the V:
import numpy
m = 10
n = 5
A = numpy.random.randn(m, n)
U, sdiag, VH = numpy.linalg.svd(A)
S = numpy.zeros((m, n))
numpy.fill_diagonal(S, sdiag)
V = VH.T.conj() # if you know you have real values only you can leave out the .conj()

Why does numpy.random.dirichlet() not accept multidimensional arrays?

On the numpy page they give the example of
s = np.random.dirichlet((10, 5, 3), 20)
which is all fine and great; but what if you want to generate random samples from a 2D array of alphas?
alphas = np.random.randint(10, size=(20, 3))
If you try np.random.dirichlet(alphas), np.random.dirichlet([x for x in alphas]), or np.random.dirichlet((x for x in alphas)), it results in a
ValueError: object too deep for desired array. The only thing that seems to work is:
y = np.empty(alphas.shape)
for i in xrange(np.alen(alphas)):
y[i] = np.random.dirichlet(alphas[i])
print y
...which is far from ideal for my code structure. Why is this the case, and can anyone think of a more "numpy-like" way of doing this?
Thanks in advance.
np.random.dirichlet is written to generate samples for a single Dirichlet distribution. That code is implemented in terms of the Gamma distribution, and that implementation can be used as the basis for a vectorized code to generate samples from different distributions. In the following, dirichlet_sample takes an array alphas with shape (n, k), where each row is an alpha vector for a Dirichlet distribution. It returns an array also with shape (n, k), each row being a sample of the corresponding distribution from alphas. When run as a script, it generates samples using dirichlet_sample and np.random.dirichlet to verify that they are generating the same samples (up to normal floating point differences).
import numpy as np
def dirichlet_sample(alphas):
"""
Generate samples from an array of alpha distributions.
"""
r = np.random.standard_gamma(alphas)
return r / r.sum(-1, keepdims=True)
if __name__ == "__main__":
alphas = 2 ** np.random.randint(0, 4, size=(6, 3))
np.random.seed(1234)
d1 = dirichlet_sample(alphas)
print "dirichlet_sample:"
print d1
np.random.seed(1234)
d2 = np.empty(alphas.shape)
for k in range(len(alphas)):
d2[k] = np.random.dirichlet(alphas[k])
print "np.random.dirichlet:"
print d2
# Compare d1 and d2:
err = np.abs(d1 - d2).max()
print "max difference:", err
Sample run:
dirichlet_sample:
[[ 0.38980834 0.4043844 0.20580726]
[ 0.14076375 0.26906604 0.59017021]
[ 0.64223074 0.26099934 0.09676991]
[ 0.21880145 0.33775249 0.44344606]
[ 0.39879859 0.40984454 0.19135688]
[ 0.73976425 0.21467288 0.04556287]]
np.random.dirichlet:
[[ 0.38980834 0.4043844 0.20580726]
[ 0.14076375 0.26906604 0.59017021]
[ 0.64223074 0.26099934 0.09676991]
[ 0.21880145 0.33775249 0.44344606]
[ 0.39879859 0.40984454 0.19135688]
[ 0.73976425 0.21467288 0.04556287]]
max difference: 5.55111512313e-17
I think you're looking for
y = np.array([np.random.dirichlet(x) for x in alphas])
for your list comprehension. Otherwise you're simply passing a python list or tuple. I imagine the reason numpy.random.dirichlet does not accept your list of alpha values is because it's not set up to - it already accepts an array, which it expects to have a dimension of k, as per the documentation.

Categories