Vectorize eigenvalue calculation in Numpy - python

I would like a numpy-sh way of vectorizing the calculation of eigenvalues, such that I can feed it a matrix of matrices and it would return a matrix of the respective eigenvalues.
For example, in the code below, B is the block 6x6 matrix composed of 4 copies of the 3x3 matrix A.
C is what I would like to see as output, i.e. an array of dimension (2,2,3) (because A has 3 eigenvalues).
This is of course a very simplified example, in the general case the matrices A can have any size (although they are still square), and the matrix B is not necessarily formed of copies of A, but different A1, A2, etc (all of same size but containing different elements).
import numpy as np
A = np.array([[0, 1, 0],
[0, 2, 0],
[0, 0, 3]])
B = np.bmat([[A, A], [A,A]])
C = np.array([[np.linalg.eigvals(B[0:3,0:3]),np.linalg.eigvals(B[0:3,3:6])],
[np.linalg.eigvals(B[3:6,0:3]),np.linalg.eigvals(B[3:6,3:6])]])

Edit: if you're using a version of numpy >= 1.8.0, then np.linalg.eigvals operates over the last two dimensions of whatever array you hand it, so if you reshape your input to an (n_subarrays, nrows, ncols) array you'll only have to call eigvals once:
import numpy as np
A = np.array([[0, 1, 0],
[0, 2, 0],
[0, 0, 3]])
# the input needs to be an array, since matrices can only be 2D.
B = np.repeat(A[np.newaxis,...], 4, 0)
# for arbitrary input arrays you could do something like:
# B = np.vstack(a[np.newaxis,...] for a in input_arrays)
# but for this to work it will be necessary for each element in
# 'input_arrays' to have the same shape
# eigvals will operate over the last two dimensions of the array and return
# a (4, 3) array of eigenvalues
C = np.linalg.eigvals(B)
# reshape this output so that it matches your original example
C.shape = (2, 2, 3)
If your input arrays don't all have the same dimensions, e.g. input_arrays[0].shape == (2, 2), input_arrays[1].shape == (3, 3) etc. then you could only vectorize this calculation across subsets with matching dimensions.
If you're using an older version of numpy then unfortunately I don't think there's any way to vectorize the calculation of the eigenvalues over multiple input arrays - you'll just have to loop over your inputs in Python instead.

You could just do something like this
C = np.array([[np.linalg.eigvals(B[i:i+3, j:j+3])
for i in xrange(0, B.shape[0], 3)]
for j in xrange(0, B.shape[1], 3)])
Perhaps a nicer approach is to use the block_view function from https://stackoverflow.com/a/5078155/1352250:
B_blocks = block_view(B)
C = np.array([[np.linalg.eigvals(m) for m in v] for v in B_blocks])
Update
As ali_m points out, this method is a form of syntactic sugar that will not reduce the overhead incurred from calling eigvals a large number of times. While this overhead should be small if each matrix it is applied to is large-ish, for the 6x6 matrices that the OP is interested in, it is not trivial (see the comments below; according to ali_m, there might be a factor of three difference between the version I give above, and the version he posted that uses Numpy >= 1.8.0).

Related

Is there a way to apply a numpy function that takes two 1d arrays as arguments on each row of two 2d arrays together?

I am trying to run something like:
np.bincount(array1, weights = array2, minlength=7)
where both array1 and array2 are 2d n numpy arrays of shape (m,n). My desired goal is that np.bincount() is run n times with each row of array1 and array2
I have tried using np.apply_along_axis() but as far as I can tell this only allows for the function to be run on each row of array1 without using each row of array2 as arguments for np.bincount. I was hoping to find a way to do this cleanly with a numpy function rather than iteration as this is a performance critical function but so far can't find another way.
For Example, given these arrays:
array1 = [[1,2,3],[4,5,6]]
array2 = [[7,8,9],[10,11,12]]
I would want to compute:
[np.bincounts([1,2,3], weights = [7,8,9],minlength=7), np.bincounts([4,5,6], weights = [10,11,12], minlength=7)]
A simple solution is simply to use comprehension lists:
result = [np.bincount(v, weights=w) for v,w in zip(array1, array2)]
Because the resulting arrays can have a different size (and actually do have a different size in your example), the result cannot be a Numpy array but a regular list. Most Numpy function are not able to work on a list of variable-sized arrays or even produce them.
If you have a lot of row in the arrays, you can mitigate the cost of the CPython interpreter loops using the Numba's JIT (or eventually Cython in this case). Note that the input arrays must be converted in Numpy arrays before calling the Numba function for sake of performance. If you know that all the arrays are of the same size, you can write a more efficient implementation using Numba (by preallocating the resulting array and doing the bincount yourself).
Update
With fixed-size arrays, here is a fast implementation in Numba:
import numpy as np
import numba as nb
array1 = np.array([[1,2,3],[4,5,6]], dtype=np.int32)
array2 = np.array([[7,8,9],[10,11,12]], dtype=np.int32)
#nb.njit('i4[:,::1](i4[:,::1],i4[:,::1])')
def compute(array1, array2):
assert array1.shape == array2.shape
n, m = array1.shape
res = np.zeros((n, 7), dtype=np.int32)
for i in range(n):
for j in range(m):
v = array1[i, j]
assert v>=0 and v<7 # Can be removed if the input is safe
res[i, v] += array2[i, j]
return res
result = compute(array1, array2)
# result is
# array([[ 0, 7, 8, 9, 0, 0, 0],
# [ 0, 0, 0, 0, 10, 11, 12]])

Slice a 3d numpy array using a 1d lookup between indices

Slice a 3d numpy array using a 1d lookup between indices
import numpy as np
a = np.arange(12).reshape(2, 3, 2)
b = np.array([2, 0])
b maps i to j where i and j are the first 2 indexes of a, so ​a[i,j,k]
Desired result after applying b to a is:
[[4 5]
​ [6 7]]
Naive solution:
c = np.empty(shape=(2, 2), dtype=int)
for i in range(2):
​j = b[i]
​c[i, :] = a[i, j, :]
Question: Is there a way to do this using a numpy or scipy routine or routines or fancy indexing?
Application: Reinforcement Learning finite MDPs where b is a deterministic policy vector pi(a|s), a is the state transition probabilities p(s'|s,a) and c is the state transition matrix for that policy vector p(s'|s). The arrays will be large and this operation will be repeated a large number of times so needs to be scaleable and fast.
What I have tried:
Compiling using numba but line profiler suggests my code is slower compared to a similarly sized numpy routine. Also numpy is more widely understood and used.
Maintaining pi(a|s) as a sparse matrix (all zero except one 1 per row) b_as_a_matrix and then using einsum but this involves storing and updating the matrix and creates more work (an extra loop over j and sum operation).
c = np.einsum('ij,ijk->ik', b_as_a_matrix, a)
Numpy arrays can be indexed using other arrays as indices. See also: NumPy selecting specific column index per row by using a list of indexes.
With that in mind, we can vectorize your loop to simply use b for indexing:
>>> import numpy as np
>>> a = np.arange(12).reshape(2, 3, 2)
>>> b = np.array([2, 0])
>>> i = np.arange(len(b))
>>> i
array([0, 1])
>>> a[i, b, :]
array([[4, 5],
[6, 7]])

numpy's transpose method can't convert 1D row ndarray to a column one [duplicate]

This question already has answers here:
Transposing a 1D NumPy array
(15 answers)
Closed 3 years ago.
Let's consider a as an 1D row/horizontal array:
import numpy as np
N = 10
a = np.arange(N) # array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
a.shape # (10,)
now I want to have b a 1D column/vertical array transposed of a:
b = a.transpose() # array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
b.shape # (10,)
but the .transpose() method returns an identical ndarray whith the exact same shape!
What I expected to see was
np.array([[0], [1], [2], [3], [4], [5], [6], [7], [8], [9]])
which can be achieved by
c = a.reshape(a.shape[0], 1) # or c = a; c.shape = (c.shape[0], 1)
c.shape # (10, 1)
and to my surprise, it has a shape of (10, 1) instead of (1, 10).
In Octave/Scilab I could do:
N = 10
b = 0:(N-1)
a = b'
size(b) % ans = 1 10
size(a) % ans = 10 1
I understand that numpy ndarrays are not matrices (as discussed here), but the behavior of the numpy's transpose function just doesn't make sense to me! I would appreciate it if you could help me understand how this behavior makes sense and what am I missing here.
P.S. So what I have understood so far is that b = a.transpose() is the equivalent of b = a; b.shape = b.shape[::-1] which if you had a "2D array" of (N, 1) would return a (1, N) shaped array, as you would expect from a transpose operator. However, numpy seems to treat the "1D array" of (N,) as a 0D scalar. I think they should have named this method something else, as this is very misleading/confusing IMHO.
To understand the numpy array better, you should take a look at this review paper: The NumPy array: a structure for efficient numerical computation
In short, numpy ndarrays have this attribute called the stride, which is
the number of bytes to skip in memory to proceed to the next element.
For a (10, 10) array of bytes, for example, the strides may be (10,
1), in other words: proceed one byte to get to the next column and ten
bytes to locate the next row.
For your ndarray a, a.stride = (8,), which shows that it is only 1 dimensional, and that to get to the next element on this single dimension, you need to advance 8 bytes in memory (each int is 64-bit).
Strides are useful for representing transposes:
By modifying strides, for example, an array can be transposed or
reshaped at zero cost (no memory needs to be copied).
So if there was a 2-dimensional ndarray, say b = np.ones((3,5)) for example, then b.strides = (40, 8), while b.transpose().strides = (8, 40). So as you see a transposed 2D-ndarray is simply the exact same array, whose strides have been reordered. And since your 1D ndarray has only 1 dimension, swapping the the values of its strides (i.e. taking its transpose), doesn't do anything.
As you already mentioned that numpy array are not matrix. The defination of transpose function is like below
Permute the dimensions of an array.
Which means that numpy's transpose method will move data from one dimension to another. As 1D array has only one dimension there is no other dimension to move the data t0. So you need add a dimension before transpose has any effect. This behavior make sense also to be consistent with higher dimensional array (3D, 4D ...) array.
There is a clean way to achive what you want
N = 10
a = np.arange(N)
a[ :, np.newaxis]

Python numpy array indexing. How is this working?

I came across this python code (which works) and to me it seems amazing. However, I am unable to figure out what this code is doing. To replicate it, I sort of wrote a test code:
import numpy as np
# Create a random array which represent the 6 unique coeff.
# of a symmetric 3x3 matrix
x = np.random.rand(10, 10, 6)
So, I have 100 symmetric 3x3 matrices and I am only storing the unique components. Now, I want to generate the full 3x3 matrix and this is where the magic happens.
indices = np.array([[0, 1, 3],
[1, 2, 4],
[3, 4, 5]])
I see what this is doing. This is how the 0-5 index components should be arranged in the 3x3 matrix to have a symmetric matrix.
mat = x[..., indices]
This line has me lost. So, it is working on the last dimension of the x array but it is not at all clear to me how the rearrangement and reshaping is done but this indeed returns an array of shape (10, 10, 3, 3). I am amazed and confused!
From the advanced indexing documentation - bi rico's link.
Example
Suppose x.shape is (10,20,30) and ind is a (2,3,4)-shaped indexing intp array, thenresult = x[...,ind,:] has shape (10,2,3,4,30) because the (20,)-shaped subspace has been replaced with a (2,3,4)-shaped broadcasted indexing subspace. If we let i, j, kloop over the (2,3,4)-shaped subspace then result[...,i,j,k,:] =x[...,ind[i,j,k],:]. This example produces the same result as x.take(ind, axis=-2).

How to perform iterative 2D operation on 4D numpy array

Let me preface this post by saying that I'm pretty new to Python and NumPy, so I'm sure I'm overlooking something simple. What I'm trying to do is image processing over a PGM (grayscale) file using a mask (a mask convolution operation); however, I don't want to do it using the SciPy all-in-one imaging processing libraries that are available—I'm trying to implement the masking and processing operations myself. What I want to do is the following:
Iterate a 3x3 sliding window over a 256x256 array
At each iteration, I want to perform an operation with a 3x3 image mask (array that consists of fractional values < 1 ) and the 3x3 window from my original array
The operation is that the image mask gets multiplied by the 3x3 window, and that the results get summed up into one number, which represents a weighted average of the original 3x3 area
This sum should get inserted back into the center of the 3x3 window, with the original surrounding values left untouched
However, the output of one of these operations shouldn't be the input of the next operation, so a new array should be created or the original 256x256 array shouldn't be updated until all operations have completed.
The process is sort of like this, except I need to put the result of the convolved feature back into the center of the window it came from:
(source: stanford.edu)
So, in this above example, the 4 would go back into the center position of the 3x3 window it came from (after all operations had concluded), so it would look like [[1, 1, 1], [0, 4, 1], [0, 0, 1]] and so on for every other convolved feature obtained. A non-referential copy could also be made of the original and this new value inserted into that.
So, this is what I've done so far: I have a 256x256 2D numpy array which is my source image. Using as_strided, I convert it into a 4D numpy array of 3x3 slices. The main problem I'm facing is that I want to execute the operation I've specified over each slice. I'm able to perform it on one slice, but in npsum operations I've tried, it adds up all the slices' results into one value. After this, I either want to create a new 256x256 array with the results, in the fashion that I've described, or iterate over the original, replacing the middle values of each 3x3 window as appropriate. I've tried using ndenumerate to change just the same value (v, x, 1, 1) of my 4D array each time, but since the index returned from my 4D array is of the form (v, x, y, z), I can't seem to figure out how to only iterate through (v, x) and leave the last two parts as constants that shouldn't change at all.
Here's my code thus far:
import numpy as np
from numpy.lib import stride_tricks
# create 256x256 NumPy 2D array from image data and image size so we can manipulate the image data, then create a 4D array of strided windows
# currently, it's only creating taking 10 slices to test with
imageDataArray = np.array(parsedPGMFile.imageData, dtype=int).reshape(parsedPGMFile.numRows, parsedPGMFile.numColumns)
xx = stride_tricks.as_strided(imageDataArray, shape=(1, 10, 3, 3), strides=imageDataArray.strides + imageDataArray.strides)
# create the image mask to be used
mask = [1,2,1,2,4,2,1,2,1]
mask = np.array(mask, dtype=float).reshape(3, 3)/16
# this will execute the operation on just the first 3x3 element of xx, but need to figure out how to iterate through all elements and perform this operation individually on each element
result = np.sum(mask * xx[0,0])
Research from sources like http://wiki.scipy.org/Cookbook/GameOfLifeStrides, http://www.johnvinyard.com/blog/?p=268, and http://chintaksheth.wordpress.com/2013/07/31/numpy-the-tricks-of-the-trade-part-ii/ were very helpful (as well as SO), but they don't seem to address what I'm trying to do exactly (unless I'm missing something obvious). I could probably use a ton of for loops, but I'd rather learn how to do it using these awesome Python libraries we have. I also realize I'm combining a few questions together, but that's only because I have the sneaking suspicion that this can all be done very simply! Thanks in advance for any help!
When you need to multiply element-wise, then reduce with addition, think np.dot or np.einsum:
from numpy.lib.stride_tricks import as_strided
arr = np.random.rand(256, 256)
mask = np.random.rand(3, 3)
arr_view = as_strided(arr, shape=(254, 254, 3, 3), strides=arr.strides*2)
arr[1:-1, 1:-1] = np.einsum('ijkl,kl->ij', arr_view, mask)
Based on the example illustration:
In [1]: import numpy as np
In [2]: from scipy.signal import convolve2d
In [3]: image = np.array([[1,1,1,0,0],[0,1,1,1,0],[0,0,1,1,1],[0,0,1,1,0],[0,1,1,0,0]])
In [4]: m = np.array([[1,0,1],[0,1,0],[1,0,1]])
In [5]: convolve2d(image, m, mode='valid')
Out[5]:
array([[4, 3, 4],
[2, 4, 3],
[2, 3, 4]])
And putting it back where it came from:
In [6]: image[1:-1,1:-1] = convolve2d(image, m, mode='valid')
In [7]: image
Out[7]:
array([[1, 1, 1, 0, 0],
[0, 4, 3, 4, 0],
[0, 2, 4, 3, 1],
[0, 2, 3, 4, 0],
[0, 1, 1, 0, 0]])

Categories