Matrix multiplication strange error - python

This is the problem I'm struggling from like 3 hours now ;/
In python with numpy I do simple multiplication like:
matrix.T * matrix
, where m is my matrix
but even if in my brain everything is ok ( sizes match properly) I keep on getting error message:
operands could not be broadcast together with shapes (5,20) (20,5)
Why is that? Doesn't 20 match 20 ? What's wrong with me ;D ?
Thanks in advance

Matrix multiplication is the dot method in NumPy, or the # operator if you're on sufficiently recent Python and NumPy:
matrix.T.dot(matrix)
or
matrix.T # matrix
or (if you have sufficiently recent NumPy but insufficiently recent Python)
np.matmul(matrix.T, matrix)
Note that NumPy has a matrix class that behaves differently, but you should never use it.

Your matrix variable is a misnomer. What you have is a multidimensional array.
You can simply use np.dot to multiply your arrays:
matrix.T.dot(matrix)
If you actually had matrices created with np.matrix, that multiplication will work without problems

Related

Convert statement from Numpy to Matlab

I'm translating a Python class to Matlab. Most of it is straightforward, but I'm not so good with Python syntax (I hardly ever use it). I'm stuck on the following:
# find the basis that will be uncorrelated using the covariance matrix
basis = (sqrt(eigenvalues)[newaxis,:] * eigenvectors).transpose()
Can someone help me figure out what the equivalent Matlab syntax would be?
I've found via Google that np.newaxis increases the dimensionality of the array, and transpose is pretty self explanatory. So for newaxis, something involving cat in matlab would probably do it, but I'm really not clear on how Python handles arrays TBH.
Assuming eigenvalues is a 1D array of length N in Python, then sqrt(eigenvalues)[newaxis,:] would be a 1xN array. This is translated to MATLAB as either sqrt(eigenvalues) or sqrt(eigenvalues).', depending on the orientation of the eigenvalues array in MATLAB.
The * operation then does broadcasting (in MATLAB this is called singleton expansion). It looks like the operation multiplies each eigenvector by the square root of the corresponding eigenvalue (assuming eigenvectors are the columns).
If in MATLAB you computed the eigendecomposition like this:
[eigenvectors, eigenvalues] = eig(A);
then you’d just do:
basis = sqrt(eigenvalues) * eigenvectors.';
or
basis = (eigenvectors * sqrt(eigenvalues)).';
(note the parentheses) because eigenvalues is a diagonal matrix.

How to avoid memory error when using np.kron to generate a big matrix

I try to write a matrix consisting of kronecker-products
def kron_sparse_2(a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p):
kron= sparse.kron(sparse.kron(sparse.kron(sparse.kron(sparse.kron(sparse.kron(sparse.kron(sparse.kron(sparse.kron(sparse.kron(sparse.kron(sparse.kron(sparse.kron(sparse.kron(sparse.kron(a,b),c),d),e),f),g),h),i),j),k),l),m),n),o),p)
return kron
res = 0
for i in sd:
res= res +( kron_sparse_2(i,i,I,I,I,I,I,I,I,I,I,I,I,I,I,I))
The i's in sd are 2x2 matrices.
Is there anything I can do further to calculate this without the memory problem?
The error I get is: MemoryError: Unable to allocate 16.0 GiB for an array with shape (536870912, 2, 2) and data type float64
If I understand correctly (I think you are trying to form the Hamiltonian for some spin problem, and you should be able to go up to 20 spins with ease. Also if it is indeed the case try using np.roll and reduce functions to rewrite your methods efficiently), you might try converting all of your matrices (even with dims. 2X2) to sparse format (say csr or csc) and use scipy kron function with the format specified as the sparse matrix format you used to construct all of your matrices. Because as far as I remember kron(format=None) uses an explicit representation of matrices which causes memory problems, try format='csc' for instance.

Is there any way to solve two matrices in SymPy symbolically by comparing them?

I have two matrices T and T_given. Both of them are symbolic matrices in SymPy as shown below:
I want to solve the equation T = T_given, that is, T - T_given = 0 to get the values (albeit symbolic) for
I have tried SymPy's solve function but it throws an error saying "Mix of Matrix and Scalar Symbols"

TypeError from hstack on sparse matrices

I have two csr sparse matrices. One contains the transform from a sklearn.feature_extraction.text.TfidfVectorizer and the other converted from a numpy array. I am trying to do a scipy.sparse.hstack on the two to increase my feature matrix but I always get the error:
TypeError: 'coo_matrix' object is not subscriptable
Below is the code:
vectorizer = TfidfVectorizer(analyzer="char", lowercase=True, ngram_range=(1, 2), strip_accents="unicode")
ngram_features = vectorizer.fit_transform(df["strings"].values.astype(str))
list_other_features = ["entropy", "string_length"]
other_features = csr_matrix(df[list_other_features].values)
joined_features = scipy.sparse.hstack((ngram_features, other_features))
Both feature matrices are scipy.sparse.csr_matrix objects and I have also tried not converting other_features, leaving it as a numpy.array, but it results in the same error.
Python package versions:
numpy == 1.13.3
pandas == 0.22.0
scipy == 1.1.0
I can not understand why it is talking about coo_matrix object in this case, especially when I have both matrices converted to csr_matrix. Looking at the scipy code I understand it will not do any conversion if the input matrices are csr_matrix objects.
In the source code of scipy.sparse.hstack, it calls bmat, where it potentially converts matrices into coo_matrix if fast path cases are not established.
Diagnosis
Looking at the scipy code I understand it will not do any conversion
if the input matrices are csr_matrix objects.
In bat's source code, There are actually more conditions besides two matrices being csr_matrix before it will not be turned into coo_matrix objects. Seeing the source code, one of the following 2 conditions need to be met
# check for fast path cases
if (N == 1 and format in (None, 'csr') and all(isinstance(b, csr_matrix)
for b in blocks.flat)):
...
elif (M == 1 and format in (None, 'csc')
and all(isinstance(b, csc_matrix) for b in blocks.flat)):
...
before line 573 A = coo_matrix(blocks[i,j]) to be called.
Suggestion
To resolve the issue, I would suggest you make one more check to see whether you meet the fast path case for either csr_matrix or csc_matrix (the two condition listed above). Please see the whole source code for bat to gain a better understanding. If you do not meet the conditions, you will be forwarded to transform matrices into coo_matrix.
It's a little unclear whether this error occurs in the hstack or after when you use the result.
If it's in the hstack you need to provide a traceback so we can see what's going on.
hstack, using bmat, normally collects the coo attributes of all inputs, and combines them to make a new coo matrix. So regardless of inputs (except the special cases), the result will be coo. But hstack also accepts a fmt parameter.
Or you can add a .tocsr(). There's no extra cost if the matrix is already csr.

Generate a numpy array from a python function

I have what I thought would be a simple task in numpy, but I'm having trouble.
I have a function which takes an index in the array and returns the value that belongs at that index. I would like to, efficiently, write the values into a numpy array.
I have found numpy.fromfunction, but it doesn't behave remotely like the documentation suggests. It seems to "vectorise" the function, which means that instead of passing the actual indices it passes a numpy array of indices:
def vsin(i):
return float(round(A * math.sin((2 * pi * wf) * i)))
numpy.fromfunction(vsin, (len,), dtype=numpy.int16)
# TypeError: only length-1 arrays can be converted to Python scalars
(if we use a debugger to inspect i, it is a numpy.array instance.)
So, if we try to use numpy's vectorised sin function:
def vsin(i):
return (A * numpy.sin((2 * pi * wf) * i)).astype(numpy.int16)
numpy.fromfunction(vsin, (len,), dtype=numpy.int16)
We don't get a type error, but if len > 2**15 we get discontinuities chopping accross our oscillator, because numpy is using int16_t to represent the index!
The point here isn't about sin in particular: I want to be able to write arbitrary python functions like this (whether a numpy vectorised version exists or not) and be able to run them inside a tight C loop (rather than a roundabout python one), and not have to worry about integer wraparound.
Do I really have to write my own cython extension in order to be able to do this? Doesn't numpy have support for running python functions once per item in an array, with access to the index?
It doesn't have to be a creation function: I can use numpy.empty (or indeed, reuse an existing array from somewhere else.) So a vectorised transformation function would also do.
I think the issue of integer wraparound is unrelated to numpy's vectorized sin implementation and even the use of python or C.
If you use a 2-byte signed integer and try to generate an array of integer values ranging from 0 to above 32767, you will get a wrap-around error. The array will look like:
[0, 1, 2, ... , 32767, -32768, -32767, ...]
The simplest solution, assuming memory is not too tight, is to use more bytes for your integer array generated by fromfunction so you don't have a wrap-around problem in the first place (up to a few billion):
numpy.fromfunction(vsin, (len,), dtype=numpy.int32)
numpy is optimized to work fast on arrays by passing the whole array around between vectorized functions. I think in general the numpy tools are inconvenient for trying to run scalar functions once per array element.

Categories