Examples on N-D arrays usage - python

I was surprised when I started learning numpy that there are N dimensional arrays. I'm a programmer and all I thought that nobody ever use more than 2D array. Actually I can't even think beyond a 2D array. I don't know how think about 3D, 4D, 5D arrays or more. I don't know where to use them.
Can you please give me examples of where 3D, 4D, 5D ... etc arrays are used? And if one used numpy.sum(array, axis=5) for a 5D array would what happen?

A few simple examples are:
A n x m 2D array of p-vectors represented as an n x m x p 3D matrix, as might result from computing the gradient of an image
A 3D grid of values, such as a volumetric texture
These can even be combined in the case of a gradient of a volume in which case you get a 4D matrix
Staying with the graphics paradigm, adding time adds an extra dimension, so a time-variant 3D gradient texture would be 5D
numpy.sum(array, axis=5) is not valid for a 5D-array (as axes are numbered starting at 0)

Practical applications are hard to come up with but I can give you a simple example for 3D.
Imagine taking a 3D world (a game or simulation for example) and splitting it into equally sized cubes. Each cube could contain a specific value of some kind (a good example is temperature for climate modelling). The matrix can then be used for further operations (simple ones like calculating its Transpose, its Determinant etc...).
I recently had an assignment which involved modelling fluid dynamics in a 2D space. I could have easily extended it to work in 3D and this would have required me to use a 3D matrix instead.
You may wish to also further extend matrices to cater for time, which would make them 4D. In the end, it really boils down to the specific problem you are dealing with.
As an end note however, 2D matrices are still used for 3D graphics (You use a 4x4 augmented matrix).

There are so many examples... The way you are trying to represent it is probably wrong, let's take a simple example:
You have boxes and a box stores N items in it. You can store up to 100 items in each box.
You've organized the boxes in shelves. A shelf allows you to store M boxes. You can identify each box by a index.
All the shelves are in a warehouse with 3 floors. So you can identify any shelf using 3 numbers: the row, the column and the floor.
A box is then identified by: row, column, floor and the index in the shelf.
An item is identified by: row, column, floor, index in shelf, index in box.
Basically, one way (not the best one...) to model this problem would be to use a 5D array.

For example, a 3D array could be used to represent a movie, that is a 2D image that changes with time.
For a given time, the first two axes would give the coordinate of a pixel in the image, and the corresponding value would give the color of this pixel, or a grey scale level. The third axis would then represent time. For each time slot, you have a complete image.
In this example, numpy.sum(array, axis=2) would integrate the exposure in a given pixel. If you think about a film taken in low light conditions, you could think of doing something like that to be able to see anything.

They are very applicable in scientific computing. Right now, for instance, I am running simulations which output data in a 4D array: specifically
| Time | x-position | y-position | z-position |.
Almost every modern spatial simulation will use multidimensional arrays, along with programming for computer games.

Related

On Creating Random Walkers in 3D elements of 3 different 4D arrays

I am currently working with 2 different 4D arrays (x,y,z,l) containing voxel data of a brain scan (x,y,z) and a label (l). Every voxel holds the probability of that voxel to contain that label.
The first array, represents the brain tissue in 3D space, has 3 labels, white matter, grey matter, and the CSF. My only area of interest is grey matter.
The second aray, represents a pre defined probability distribution of 360 labels, all of which correspond to a portion of the grey matter.
The "shape" of the probability distributions are in 3D is similar, but not exactly the same. But they are already aligned in 3D space with a combination of rigid and affine transformations, and it is "as similar as possible".
What I want to do, is to map this 360 3D elements, onto the second 3D element of first array and get a best fit.
I am not an expert in machine learning, so I tried to come up with my own idea;
Pick every voxel from grey matter probability map
Generate 1000 random walks from each point.
Which territory is walked most? - Label the point
What I explained above, I havent been able to code yet, therefore I am unable to offer any code.
I am still trying to create "the random walking sampler" but I quickly realized I need a lot of nested for loops work on a lot of different 3D arrays (1363 to be exact).
Is there a better way to do it?
Is there a better way to do this?

Sorting large numbers of galaxies into spheres of a certain radius

I have a large number of galaxies. I need to sort these galaxies into a spheres of radius N, calculate the average numbers of galaxies in each sphere, and plot a graph of this against radius N.
The galaxies are stored in a .fits file as radial coordinates (right ascension, declination and redshift). I'm using pyFITS and astropy to convert the galaxies coordinates into cartesian coordinates with Earth at (0,0,0) and then store the coordinates in a numpy array with the structure: ((x,y,z),(x1,y1,z1),etc.)
In order to seperate the galaxies into spheres of radius N, I am randomly selecting a galaxy from the array, then iterating through the array calcuating the distance between the randomly selected galaxy and the current galaxy. If the distance is less than or equal to the radius, it is added to the sphere. This is repeated as many times as the number of bubbles that need to be calculated.
My current method for this is really slow. I'm unfamiliar with numpy (I've been figuring things out as I'm going along), and I can't really see a better method than just iterating through all the galaxies.
Is there a way to do this any faster (something to do with numpy arrays - I'm converting them to a normal python list right now)? This is what I'm doing right now (https://github.com/humz2k/EngineeringProjectBethe/blob/humza/bubbles.py).
First, it's generally better to post samples of your code in your question where your issue is (such as the part where you select the radii you want to keep), rather than links to your entire script :)
Second, numpy arrays are great for scientific programming! They allow you to easily store data and perform matrix operations on that data without have to loop through the native Python lists. If you know MATLAB, they basically allow you to do most of the same things MATLAB's arrays do. Some more information can be found here and here. pandas dataframes are also good to use.
On to your code. At the end of your read_data function, you can combine some of those coordinates statements and probably don't need to add the tolist() because it's a numpy.array (which is faster and uses less memory, see the links above).
In your get_bubbles function, I don't think you need to make copies of the data. The copies will also take up memory. The biggest issue I see here is using the variable i twice in your loops. That's bad because i is replaced in the second loop. For example,
for i in [1, 2, 3, 4]:
for i in np.array([5, 6, 7, 8]):
print(i)
print 5, 6, 7, 8 four times. It's also bad because we can't tell which i does what you want (having no comments doesn't help either ;) ). Replace the i variable in the second loop with another variable, like j.
Here are two options to make lists faster: list comprehensions and initializing numpy.arrays. You can read about list comprehensions here. An example of initializing numpy.arrays is
new_data = np.zeros(len(data))
for i in range(len(data)):
new_data[i] = data[i]
Finally, you could create a separate array for the radii and look into using numpy.where to select the indexes of the radii that match your criteria.
That was kind of a lot, hope it helps.

Numpy function over matrix

So my question is quite similar to this post: Most efficient way to map function over numpy array, but I have some additional questions to add along.
Right now, I'm taking in an image represented by a 2-D array, and for each pixel in the image, I am doing some computation that involves convolving the nxn neighboring pixels with a Gaussian kernel to find a "weight" for each pixel. My end goal is to return a 2-D array of the same size as the input, with the calculated weight in place of each pixel.
So what I did was to first create a function getWeight that, given a pixel, does the necessary computation using its neighbors and a Gaussian kernel to find its corresponding weight.
So my question is: given getWeight is using a for-loop, or the numpy.fromiter, to apply this function to every pixel in the 2-D array
the best way to go about solving this problem?
Or could there be a way to use built-in np functions to apply this sort of operation on the entirety of the array at once? (This question is kind of vague, but what I am trying to get at is that since numpy operations on arrays are not actually done by "using a for loop for every pixel", whether there could be something I could use to optimize my problem).

Compute distance between combinations of points in a grid

I am looking for an efficient solution to the following problem. This should work with python, but does not have to be in python.
I have a 2D matrix, each element of the matrix represents a point in a 2D, orthogonal grid. I want to compute the shortest distance between couples of points in the grid. This would be trivial if there were no "obstacles" in the grid.
A figure helps explaining:
Each cell in the figure is one element of the matrix (the matrix is square, but it could be rectangular). Gray cells are obstacles, any path between two points must go around them. The green cells are those I am interested in. I am not interested in red cells, but a path can go trough them.
The distance between points like A and B is trivial to compute, but how to compute the path between A and C as shown in the figure?
I have read about the A* algorithm, but since I am working with a rather big grid, generally (few hundred) x (few hundred), I was wondering if there is a smarter alternative. Remember: I have to find the distance between all couples of "green cells", not just between two of them. If I have n green cells, I will have a number of combinations equal to the binomial coefficient (n 2).
The grid is fixed, I have to compute all the distances once and them use them in further calculations, say accessing them based on the relevant indices in the matrix.
Note: the problem is NOT this one, were coordinates are in a list. My 2D coordinates are organised in a 2D grid and the question is about exploiting this aspect for having a more efficient algorithm.
I suppose the most straightforward solution would be the Floyd-Warshall algorithm, which computes the shortest distances between all pairs of nodes in a graph. This doesn't necessarily exploit the fact that you happen to have a 2D grid (it could work on other kinds of graphs too), but it should work fine. The fact that you do have a 2D grid may enable you to implement it more efficiently than if you had to write an implementation for any arbitrary graph (e.g. you can just store distances as they're computed in a matrix, instead of some less efficient data structure).
The regular version only produces the distances of all shortest paths as output, and doesn't actually store the paths themselves as output. There's additional info on the wikipedia page on how to modify the algorithm to enable you to efficiently reconstruct paths if necessary.
Intuitively, I suspect more efficient implementations may be possible which do exploit the fact that you have a 2D grid, probably using ideas from Rectangular Symmetry Reduction and/or Jump Point Search. Both of those ideas are traditionally used with A* for single-pair pathfinding queries though, I'm not aware of any work using them for all-pair shortest path computations. My intuition says they can be exploited there too, but in the time it'll take to figure that out exactly and implement it correctly, you can probably much more easily implement and run Floyd-Warshall.

Visualize high dimensional field arrows?

I have a big list of tuples (a, b), where both a and b are 9-dimensional vectors from the same space. This essentially encodes states of a system and some transitions. I would like to visualize the field described by these tuples, as arrows pointing from a->b, either in 2D or 3D. One of my problems however is that this is not a well-behaved vector field (not continuous) but I have reasons to believe that it can probably be laid out nicely, even in 2D.
Does anyone know of a toolbox (for matlab/python) or program that can do this? This would presumably first involve some kind of dimensionality reduction on a and b and then plot little arrows from one point to another.
Thank you for your help!
I'm not 100% sure if this answers your question or not, but you may want to look at Recurrence Plots. If this is what you're after, then you wont need any additional Matlab toolboxes.
Okay, turns out MATLAB can do this but it's not very pretty.
It basically boils down to doing PCA, and then using the quiver function to do the plotting:
My matrix X here contains starting points of my high dimensional nodes in odd rows, and ending points in even rows. Then:
[COEFF, SCORE]= princomp(zscore(X));
x=SCORE(1:2:end,1);
y=SCORE(1:2:end,2);
z=SCORE(1:2:end,3);
u=SCORE(2:2:end,1);
v=SCORE(2:2:end,2);
w=SCORE(2:2:end,3);
quiver3(x,y,z,u-x,v-y,w-z,0);
The downside is that I can't find a good way to color the edges, so I get a huge mess if I just do it trivially. Ah well, good enough for now!
Here's a Matlab toolbox of dimension reduction algorithms. I haven't worked with it, but I have worked with dimension reduction, and it seems like a manifold charting/local coordinates algorithm would be able to extract a low-dimensional representation.
TU Delft Dim. Red. Toolbox

Categories