Python: Get median in 3-dimensional numpy array - python

I have a 3-dimensional numpy array, where the first two dimensions form a grid, and the third dimension (let's call it cell) is a vector of attributes. Here is an example for array x (a 2x3 grid with 4 attributes in each cell):
[[[1 2 3 4][5 6 7 8][9 8 7 6]]
[[9 8 7 6][5 4 3 2][1 2 3 4]]]
for which I want to get the median of the 8 neighbors of each cell in array x, e.g. for x[i,j,:] it would be the median of all cells with an index combined of i-1, i+1, j-1, j+1. It is clear how to do that, but for the borders the index would get out of range (e.g. if i=0, a general solution where I take x[i-1,j,:] into the calculation wouldn't work).
Now the simple solution would be (simple in the sense of not thought through) to separately treat the 4 corners (e.g. where i=j=0), borders (e.g. where i=0 and j!=0) and the default case for cells in the middle with if statements, but I would hope that there is a more elegant solution for this problem. I thought to extend the n*m grid to a (n+2)*(m+2) grid and fill the border cells on all sides with 0 values, but that would distort the median computation.
I hope I was able to kind of clarify the problem. Thanks in advance for any suggestions for a more elegant way to solve this.

Related

How to solve/what is the best way to approach it?

I have confusion/difficulties related to this code: the code didn't print out anything yet there is no error message. The context is, I want to change some components of the main random matrix (psi) to the new matrix (psiy) and I want to check whether my code is correctly with regard to the component's index (hence the print(init,sy)) but nothing comes out and no error message either. Anyone has any idea? Thank you very much in advance.
The full problem is here:
I have a 3D matrix (nx X ny X nz), with nx being the number of components in the x-axis (say, horizontal), ny being the number of components in the y-axis (say, vertical), and nz being the number of component in the z-axis (say, out of the plane). So the total of components is A. We can also see it as a 2x2 matrix with many layers (nz total of layers). The index of each component started from the top left of the first layer to the bottom right of the z-th layer.
So for a 4x4x4 matrix, we will have 0 to 63 indexes with 0 to 15 in the first layer, 16 to 31 in the second layer, and so on. And for the first layer, since nx and ny is 4, there are a total of 4 indexes in each row and column (0 to 3 for the first row, 4 to 7 for the second row, and 12 to 15 for the fourth row), the same order for the other layers.
I want to change my initial random matrix psi to a new matrix psiy with these conditions:
The init-th components of the new matrix psiy will be the init-h components of the old matrix psi times 3
The sy-th components of the new matrix psiy will be the sy-th components of the old matrix psi
Now, how do we know what are the init and sy components? We go back to the description of components of my 3D matrix, if we group it with respect to the column, for the first layer (index 0 to 15), there are 4 columns since ny is 4 and what I mean by init is the top index of the column, in this case the 0, 1, 2, and 3 indexes. So for the second layer init would be 16, 17, 18, and 19 indexes and so on for the rest of the layers.
While for sy, the definition is all the indexes in the even-th rows except for the last row. So for the case of nx, ny and nz is 4, sy indexes would be:
4 to 7 for the first layer (since there are only 4 rows (nx) and the only even-th row is the 2nd row (not including the last row)), and
20 to 23 for the second layer, 36 to 39 for the third layer, and 52 to 55 for the last layer.
So for ny is for example 8, the sy would be all the indexes in the 2nd, 4th, and 6th rows in each layer and so on for any (even) number of ny.
Thank you.
import numpy as np
nx=4
ny=4
nz=4
A=nx*ny*nz
HH=ny-2
H=int(HH/2)
psiy=np.zeros(A)
psi=np.random.randint(1,10,nx)
for i in range (0,nz):
for m in range (1,H):
for k in range (0,nx):
init=i*nx*ny+k
sy=init+(2*m-1)*nx
psiy[init]=3*psi[init]
psiy[sy]=1*psi[sy]
print(init,sy)

Python - Find closest indices from 2 sets

I have 2 sets of indices (i,j).
What I need to get is the 2 indices that are closest from the 2 sets.
It is easier to explain graphically:
Assuming I have all the indices that make the first black shape, and all the indices that make the second black shape, how do I find the closest indices (the red points in the figure) between those 2 shapes, in an efficient way (built in function in Python, not by iterating through all the possibilities)?
Any help will be appreciated!
As you asked about a built in function rather than looping through all combinations, there's a method in scipy.spacial.distance that does just that - it outputs a matrix of distances between all pairs of 2 inputs. If A and B are collections of 2D points, then:
from scipy.spatial import distance
dists = distance.cdist(A,B)
Then you can get the index of the minimal value in the matrix.

How to broadcast correctly subtracting 2 different matrices in Numpy

I am trying to subtract two matrices of different shapes using broadcasting. However I am stuck on some point. Need simple solution of how to solve the problem.
Literally I am evaluating data on a grid (first step is subtracting). For example I have 5 grid points grid = (-20,-10, 0, 10, 20) and array of data of length 100.
Line:
u = grid.reshape((ngrid, 1)) - data
works perfectly fine. ngrid = 5 in this trivial example.
Output is matrix of 5 rows and 100 columns, so each point of data is evaluated on each point of grid.
Next I want to do it for 2 grids and 2 data sets simultaneously (data is of size (2x100, e.g. 2 randn arrays). I have already succeeded in subtracting two data sets from one grid, but using two grids throws an error.
In the example below a is vertical array of the grid, length 5 points and data is array of random data of the shape (100,2).
In this case u is is tuple (2,5,100), so u[0] and u[1] has 5 rows and 100 columns, meaning that data was subtracted correctly from the grid.
Second line of the code is what I am trying to do. The error is following:
ValueError: operands could not be broadcast together with shapes (5,2) (2,1,100)
u = a - data.T[:, None] # a is vertical grid of 5 elements. Works ok.
u = grid_test - data.T[:, None] # grid_test is 2 column 5 row matrix of 2 grids. Error.
What I need is kind of same line of code as above, but it should work if "a" contains 2 columns, e.g. two different grids. So in the end expected result is "u", which contains in addition to above described results another two matrices where same data (both arrays) evaluated on the second grid.
Unfortunately I cannot use any loops - only vectorization and broadcasting.
Thanks in advance.

Cosine similarity between two ndarrays

I have two numpy arrays, first array is of size 100*4*200, and second array is of size 150*6*200. In fact, I am storing the 100 samples of 200 dimensional vector representations of 4 fields in array 1 and 140 samples of 200 dimensional vectors of 6 fields in array 2.
Now I want to compute the similarity vector between the samples and create a similarity matrix. For each sample, I would like to calculate the similarity between the each combination of fields and store it such that I get a 15000*24 dimensional array.
First 150 rows will be the similarity vector between 1st row of array 1 and 150 rows of array 2, next 150 rows will be the similarity vector between the 2nd row of array 1 and 150 rows of array 2 etc.
Each similarity vector is # fields in array 1 * # fields in array 2 i.e. 1st element of the similarity vector is cosine similarity between field 1 of array 1 and field 1 of array 2, 2nd element will be the similarity between field 1 of array 1 and field 2 of array 2 and so on with last element is the similarity between last field of array 1 and last field of array 2.
What is the best way to do this using numpy arrays ?
So every "row" (i assume the first axis, that I'll call axis 0) is the sample axis. That means you have 100 samples from one vector, each with fieldsxdimentions 4x200.
Doing this the way you describe, then the first row of the first array would have (4,200) and the second one would then have (150,6,200). Then you'd want to do a cos distance between an (m,n), and (m,n,k) array, which does not make sense (the closest you have to a dot product here would be the tensor product, which I'm fairly sure is not what you want).
So we have to extract these first and then iterate over all the others.
To do this I actually recomend just splitting the array with np.split and iterate over both of them. This is just because I've never come across a faster way in numpy. You could use tensorflow to gain efficiency, but I'm not going into that here in my answer.
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
a = np.random.rand(100, 4, 200)
b = np.random.rand(150, 6, 200)
# We know the output will be 150*100 x 6*4
c = np.empty([15000, 24])
# Make an array with the rows of a and same for b
a_splitted=np.split(a, a.shape[0], 0)
b_splitted=np.split(b, b.shape[0], 0)
i=0
for alpha in a_splitted:
for beta in b_splitted:
# Gives a 4x6 matrix
sim=cosine_similarity(alpha[0],beta[0])
c[i,:]=sim.ravel()
i+=1
For the similarity-function above I just chose what #StefanFalk sugested: sklearn.metrics.pairwise.cosine_similarity. If this similarity measure is not sufficient, then you could either write your own.
I am not at all claiming that this is the best way to do this in all of python. I think the most efficient way is to do this symbolically using, as mentioned, tensorflow.
Anyways, hope it helps!

Axis elimination

I'm having a trouble understanding the concept of Axis elimination in numpy. Suppose I have the following 2D matrix:
A =
1 2 3
3 4 5
6 7 8
Ok I understand that sum(A, axis=0) will sum each column down and will give a 1D array with 3 elements. I also understand that sum(A, axis=1) will sum each row.
But my trouble is when I read that axis=0 eliminates the 0th axis and axis=1 eliminates the 1th axis. Also sometime people mention "reduce" instead of "eliminate". I'm unable to understand what does that eliminate. For example sum(A, axis=0) will sum each column from top to bottom, but I don't see elimination or reduction here. What's the point? The same also for sum(A,axis=1).
AND how is it for higher dimensions?
p.s. I always confused between matrix dimensions and array dimensions. I wished that people who write the numpy documentation makes this distinction very clear.
http://docs.scipy.org/doc/numpy/reference/generated/numpy.ufunc.reduce.html
Reduces a‘s dimension by one, by applying ufunc along one axis.
For example, add.reduce() is equivalent to sum().
In numpy, the base class is ndarray - a multidimensional array (can 0d, 1d, or more)
http://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html
Matrix is a subclass of array
http://docs.scipy.org/doc/numpy/reference/arrays.classes.html
Matrix objects are always two-dimensional
The history of the numpy Matrix is old, but basically it's meant to resemble the MATLAB matrix object. In the original MATLAB nearly everything was a matrix, which was always 2d. Later they generalized it to allow more dimensions. But it can't have fewer dimensions. MATLAB does have 'vectors', but they are just matrices with one dimension being 1 (row vector versus column vector).
'axis elimination' is not a common term when working with numpy. It could, conceivably, refer to any of several ways that reduce the number of dimensions of an array. Reduction, as in sum(), is one. Indexing is another: a[:,0,:]. Reshaping can also change the number of dimensions. np.squeeze is another.

Categories