Python numpy array indexing. How is this working? - python

I came across this python code (which works) and to me it seems amazing. However, I am unable to figure out what this code is doing. To replicate it, I sort of wrote a test code:
import numpy as np
# Create a random array which represent the 6 unique coeff.
# of a symmetric 3x3 matrix
x = np.random.rand(10, 10, 6)
So, I have 100 symmetric 3x3 matrices and I am only storing the unique components. Now, I want to generate the full 3x3 matrix and this is where the magic happens.
indices = np.array([[0, 1, 3],
[1, 2, 4],
[3, 4, 5]])
I see what this is doing. This is how the 0-5 index components should be arranged in the 3x3 matrix to have a symmetric matrix.
mat = x[..., indices]
This line has me lost. So, it is working on the last dimension of the x array but it is not at all clear to me how the rearrangement and reshaping is done but this indeed returns an array of shape (10, 10, 3, 3). I am amazed and confused!

From the advanced indexing documentation - bi rico's link.
Example
Suppose x.shape is (10,20,30) and ind is a (2,3,4)-shaped indexing intp array, thenresult = x[...,ind,:] has shape (10,2,3,4,30) because the (20,)-shaped subspace has been replaced with a (2,3,4)-shaped broadcasted indexing subspace. If we let i, j, kloop over the (2,3,4)-shaped subspace then result[...,i,j,k,:] =x[...,ind[i,j,k],:]. This example produces the same result as x.take(ind, axis=-2).

Related

How numpy dot works with broadcasting

I have two numpy arrays. When I used numpy dot function I got different results. I couldn't understand how dot function worked along with broadcasting to produce these outputs.
Can someone explain me the difference between these two.
A = np.array([[2,4,6]])
Y = np.array([[1,0,1]])
np.dot(A,Y.T) = array([8])
np.dot (Y.T, A) = array([[2, 4, 6],
[0, 0, 0],
[2, 4, 6]])
The dot function is matrix multiplication, there's no broadcasting involved.
Using np.dot(A,Y.T) is the same as A#Y.T in python 3.5+.
Matrix multiplication is not commutative (the order of arguments matters).
In the first usage, A is a row vector, Y.T is a column vector. This results in a single value.
In the second example, Y.T is a column vector, while A is a row vector. This results in a matrix.

Multiplying an array by a designated row vector of another matrix

Good afternoon all relatively simple`question here from a mechanical standpoint.
I'm currently performing PCA and have successfully written a code that computes the covariance matrix and correlation matrix, and the associated eigenspectrum.
Now, I have created an array that represents the eigenvectors row wise, and i would like to compute the transformation C*v^t, where c is the observation matrix and v^t is the element wise entries of the eigen vector transposed.
Now, since some of these matrices are pretty big-i'd like to be able to tell python which row of the eigenvector matrix to mulitply C by. So far I have tried some of the numpy functions, but to no avail.
(for those of you wondering, i don't want to compute the matrix product of all the eigen vecotrs, i only need to multiply by a small subset of them-the ones associated with the largest eigenvalues)
Thanks!
To "slice" a vector of row n out of 2-dimensional array A, you use a syntax like A[n]. If it's slicing columns you wanted instead, the syntax is A[:,n].
For transformations with numpy arrays and vectors, the syntax is with matrix multiplication operator:
>>> A = np.array([[0, -1], [1, 0]])
>>> vs = np.array([[1, 2], [3, 4]])
>>> A # vs[0] # this is a rotation of the first row of vs by A
array([-2, 1])
>>> A # vs[1] # this is a rotation of second row of vs by A
array([-4, 3])
Note: If you're on older python version (< 3.5), you might not have # available yet. Then you'll have to use a function np.dot(array, vector) instead of the operator.

How to perform iterative 2D operation on 4D numpy array

Let me preface this post by saying that I'm pretty new to Python and NumPy, so I'm sure I'm overlooking something simple. What I'm trying to do is image processing over a PGM (grayscale) file using a mask (a mask convolution operation); however, I don't want to do it using the SciPy all-in-one imaging processing libraries that are available—I'm trying to implement the masking and processing operations myself. What I want to do is the following:
Iterate a 3x3 sliding window over a 256x256 array
At each iteration, I want to perform an operation with a 3x3 image mask (array that consists of fractional values < 1 ) and the 3x3 window from my original array
The operation is that the image mask gets multiplied by the 3x3 window, and that the results get summed up into one number, which represents a weighted average of the original 3x3 area
This sum should get inserted back into the center of the 3x3 window, with the original surrounding values left untouched
However, the output of one of these operations shouldn't be the input of the next operation, so a new array should be created or the original 256x256 array shouldn't be updated until all operations have completed.
The process is sort of like this, except I need to put the result of the convolved feature back into the center of the window it came from:
(source: stanford.edu)
So, in this above example, the 4 would go back into the center position of the 3x3 window it came from (after all operations had concluded), so it would look like [[1, 1, 1], [0, 4, 1], [0, 0, 1]] and so on for every other convolved feature obtained. A non-referential copy could also be made of the original and this new value inserted into that.
So, this is what I've done so far: I have a 256x256 2D numpy array which is my source image. Using as_strided, I convert it into a 4D numpy array of 3x3 slices. The main problem I'm facing is that I want to execute the operation I've specified over each slice. I'm able to perform it on one slice, but in npsum operations I've tried, it adds up all the slices' results into one value. After this, I either want to create a new 256x256 array with the results, in the fashion that I've described, or iterate over the original, replacing the middle values of each 3x3 window as appropriate. I've tried using ndenumerate to change just the same value (v, x, 1, 1) of my 4D array each time, but since the index returned from my 4D array is of the form (v, x, y, z), I can't seem to figure out how to only iterate through (v, x) and leave the last two parts as constants that shouldn't change at all.
Here's my code thus far:
import numpy as np
from numpy.lib import stride_tricks
# create 256x256 NumPy 2D array from image data and image size so we can manipulate the image data, then create a 4D array of strided windows
# currently, it's only creating taking 10 slices to test with
imageDataArray = np.array(parsedPGMFile.imageData, dtype=int).reshape(parsedPGMFile.numRows, parsedPGMFile.numColumns)
xx = stride_tricks.as_strided(imageDataArray, shape=(1, 10, 3, 3), strides=imageDataArray.strides + imageDataArray.strides)
# create the image mask to be used
mask = [1,2,1,2,4,2,1,2,1]
mask = np.array(mask, dtype=float).reshape(3, 3)/16
# this will execute the operation on just the first 3x3 element of xx, but need to figure out how to iterate through all elements and perform this operation individually on each element
result = np.sum(mask * xx[0,0])
Research from sources like http://wiki.scipy.org/Cookbook/GameOfLifeStrides, http://www.johnvinyard.com/blog/?p=268, and http://chintaksheth.wordpress.com/2013/07/31/numpy-the-tricks-of-the-trade-part-ii/ were very helpful (as well as SO), but they don't seem to address what I'm trying to do exactly (unless I'm missing something obvious). I could probably use a ton of for loops, but I'd rather learn how to do it using these awesome Python libraries we have. I also realize I'm combining a few questions together, but that's only because I have the sneaking suspicion that this can all be done very simply! Thanks in advance for any help!
When you need to multiply element-wise, then reduce with addition, think np.dot or np.einsum:
from numpy.lib.stride_tricks import as_strided
arr = np.random.rand(256, 256)
mask = np.random.rand(3, 3)
arr_view = as_strided(arr, shape=(254, 254, 3, 3), strides=arr.strides*2)
arr[1:-1, 1:-1] = np.einsum('ijkl,kl->ij', arr_view, mask)
Based on the example illustration:
In [1]: import numpy as np
In [2]: from scipy.signal import convolve2d
In [3]: image = np.array([[1,1,1,0,0],[0,1,1,1,0],[0,0,1,1,1],[0,0,1,1,0],[0,1,1,0,0]])
In [4]: m = np.array([[1,0,1],[0,1,0],[1,0,1]])
In [5]: convolve2d(image, m, mode='valid')
Out[5]:
array([[4, 3, 4],
[2, 4, 3],
[2, 3, 4]])
And putting it back where it came from:
In [6]: image[1:-1,1:-1] = convolve2d(image, m, mode='valid')
In [7]: image
Out[7]:
array([[1, 1, 1, 0, 0],
[0, 4, 3, 4, 0],
[0, 2, 4, 3, 1],
[0, 2, 3, 4, 0],
[0, 1, 1, 0, 0]])

Vectorize eigenvalue calculation in Numpy

I would like a numpy-sh way of vectorizing the calculation of eigenvalues, such that I can feed it a matrix of matrices and it would return a matrix of the respective eigenvalues.
For example, in the code below, B is the block 6x6 matrix composed of 4 copies of the 3x3 matrix A.
C is what I would like to see as output, i.e. an array of dimension (2,2,3) (because A has 3 eigenvalues).
This is of course a very simplified example, in the general case the matrices A can have any size (although they are still square), and the matrix B is not necessarily formed of copies of A, but different A1, A2, etc (all of same size but containing different elements).
import numpy as np
A = np.array([[0, 1, 0],
[0, 2, 0],
[0, 0, 3]])
B = np.bmat([[A, A], [A,A]])
C = np.array([[np.linalg.eigvals(B[0:3,0:3]),np.linalg.eigvals(B[0:3,3:6])],
[np.linalg.eigvals(B[3:6,0:3]),np.linalg.eigvals(B[3:6,3:6])]])
Edit: if you're using a version of numpy >= 1.8.0, then np.linalg.eigvals operates over the last two dimensions of whatever array you hand it, so if you reshape your input to an (n_subarrays, nrows, ncols) array you'll only have to call eigvals once:
import numpy as np
A = np.array([[0, 1, 0],
[0, 2, 0],
[0, 0, 3]])
# the input needs to be an array, since matrices can only be 2D.
B = np.repeat(A[np.newaxis,...], 4, 0)
# for arbitrary input arrays you could do something like:
# B = np.vstack(a[np.newaxis,...] for a in input_arrays)
# but for this to work it will be necessary for each element in
# 'input_arrays' to have the same shape
# eigvals will operate over the last two dimensions of the array and return
# a (4, 3) array of eigenvalues
C = np.linalg.eigvals(B)
# reshape this output so that it matches your original example
C.shape = (2, 2, 3)
If your input arrays don't all have the same dimensions, e.g. input_arrays[0].shape == (2, 2), input_arrays[1].shape == (3, 3) etc. then you could only vectorize this calculation across subsets with matching dimensions.
If you're using an older version of numpy then unfortunately I don't think there's any way to vectorize the calculation of the eigenvalues over multiple input arrays - you'll just have to loop over your inputs in Python instead.
You could just do something like this
C = np.array([[np.linalg.eigvals(B[i:i+3, j:j+3])
for i in xrange(0, B.shape[0], 3)]
for j in xrange(0, B.shape[1], 3)])
Perhaps a nicer approach is to use the block_view function from https://stackoverflow.com/a/5078155/1352250:
B_blocks = block_view(B)
C = np.array([[np.linalg.eigvals(m) for m in v] for v in B_blocks])
Update
As ali_m points out, this method is a form of syntactic sugar that will not reduce the overhead incurred from calling eigvals a large number of times. While this overhead should be small if each matrix it is applied to is large-ish, for the 6x6 matrices that the OP is interested in, it is not trivial (see the comments below; according to ali_m, there might be a factor of three difference between the version I give above, and the version he posted that uses Numpy >= 1.8.0).

Convert 1D array into numpy matrix

I have a simple, one dimensional Python array with random numbers. What I want to do is convert it into a numpy Matrix of a specific shape. My current attempt looks like this:
randomWeights = []
for i in range(80):
randomWeights.append(random.uniform(-1, 1))
W = np.mat(randomWeights)
W.reshape(8,10)
Unfortunately it always creates a matrix of the form:
[[random1, random2, random3, ...]]
So only the first element of one dimension gets used and the reshape command has no effect. Is there a way to convert the 1D array to a matrix so that the first x items will be row 1 of the matrix, the next x items will be row 2 and so on?
Basically this would be the intended shape:
[[1, 2, 3, 4, 5, 6, 7, 8],
[9, 10, 11, ... , 16],
[..., 800]]
I suppose I can always build a new matrix in the desired form manually by parsing through the input array. But I'd like to know if there is a simpler, more eleganz solution with built-in functions I'm not seeing. If I have to build those matrices manually I'll have a ton of extra work in other areas of the code since all my source data comes in simple 1D arrays but will be computed as matrices.
reshape() doesn't reshape in place, you need to assign the result:
>>> W = W.reshape(8,10)
>>> W.shape
(8,10)
You can use W.resize(), ndarray.resize()

Categories